The goal of the experiments below is to see what differences in tagger accuracy show up, when training and testing is done on different types of texts.
One major problem for validating tagger accuracy depending on text types is the lack of an objective measure for the difference of text types. The choice of texts for training and testing is based on the assumption that the structure of newspaper texts is different from fiction or poetry (eg. fairy tales).
For the tests we used the tagged German corpora available at RXRC. The tagset differs slightly from the STTS-tagset: The main difference is that there is no distinction between common nouns (NN) and proper names (NE), and there are some minor differences within function word classes.
Test and training corpora were collected from ECI-CD and from the WWW. In a first step they were tagged automatically with an available tagger. In an iterative process, we corrected the lexicon and the corpus tags, in order to obtain reliable data and similar conditions for the different types of text. Lexicon updates, however, did not include text specific proper names or other uncommon word forms, such as some antiquated words in fairy tales.
Finally, we assembled three different training corpora out of the manual tagged texts and used them to train three different HMM-Models. The resulting HMM-taggers were finally evaluated on each of the text (including the corpora which were used for training).
In the table below we list the texts used for training and testing. There are three training corpora TRAIN1, TRAIN2 and TRAIN3 built form the texts which are marked with the corresponding numbers. Unmarked texts are used for testing only.