The Xerox tagger is based on Hidden Markov Models (HMM). It applies the Baum-Welch (or Forward-Backward) algorithm for training and the Viterbi algorithm for tagging. The tagger needs as input a finite-state lexicon which contains surface word forms and their corresponding tags and a finite-state guesser for word forms which are not in the lexicon.
The Xerox HMM Tagger ([&make_named_href('', "node40.html#Cutting+al:92","[Cutting et al. 1992]")]) was designed to be trained on untagged corpora, but a more recent version ([&make_named_href('', "node40.html#Wilkens+Kupiec:95","[Wilkens, Kupiec 1995]")]) allows initialization with untagged and/or tagged data. However, when enough tagged training material is available, it turned out (cf. [&make_named_href('', "node40.html#Schiller:96","[Schiller 1996]")]) that the best tagging results were obtained when the tagger was just initialized on a tagged corpus and no further training was added.
For our experiments we chose among the available taggers those which can easily be adapted to a specific language or tagset. This allows to use a uniform lexicon and tagset for all different methods and compare the results (cf. 5.1), and it is necessary for tests on tagset variations (cf. 5.3).