Tagger accuracy is, among others, influenced by parameters such as lexicon accuracy and lexical ambiguity of words. We need to have a uniform test environment for the different tagging methods; therefore, to cope with the lexicon problems, we use identical (or at least comparable) lexicons for all taggers. Comparable test conditions may be set up by using the same lexicon (apart from differences of the internal format), the same tagset and the same test corpora for different taggers.
The taggers to be evaluated are all trained on the same manually tagged corpus. The list of annotated word forms used as ``tagger lexicon'' for each tagger, is built from a list of word/tag pairs, which is derived from the word forms contained in the training and in the test corpus. This full form list is computed by means of a morphological analyser ([&make_named_href('', "node40.html#DMOR","[Schiller 1995]")]) and of mapping rules to transform the morphological categories into the appropriate tags. These mapping rules may remove or add readings for specific word forms, according to the guidelines for manual tagging ([&make_named_href('', "node40.html#Schiller+al:95","[Schiller et al 1995]")]); however, in the process of word list creation, no corpus-dependent modifications of the lexicon are introduced:
The following definitions will be used for statistics concerning ambiguity of word forms:
We will divide the evaluation statistics for a given test corpus into two parts: corpus statistics and error statistics.