For the tests with the TreeTagger the extended lexicon was applied. Therefore the training corpus does not contain any non-lexicalized word form.
Corpus statistics | ||
Tokens | 62860 | 13416 |
Tags | 51 | 46 |
Lexicon gaps | 0 | 240 |
Lexical errors | 0 | 62 |
Ambiguity classes | ||
Ambiguity rate | 1.58 |
Error statistics | ||||||||
ambiguity | tokens | in % | correct | in % | LE | in % | DE | in % |
1 | 8138 | 60.6 | 8088 | 99.4 | 50 | 0.6 | - | - |
2 | 3275 | 24.4 | 3084 | 94.2 | 7 | 0.2 | 184 | 5.6 |
3 | 1533 | 11.4 | 1432 | 93.4 | 2 | 0.1 | 99 | 6.5 |
4 | 424 | 3.2 | 376 | 88.7 | 1 | 0.2 | 47 | 11.1 |
5 | 14 | 0.1 | 9 | 64.3 | 0 | - | 5 | 35.7 |
6 | 9 | 0.1 | 6 | 66.7 | 0 | - | 3 | 33.3 |
7 | 15 | 0.1 | 9 | 60.0 | 2 | 13.3 | 4 | 26.7 |
8 | 6 | 0.0 | 4 | 66.7 | 0 | - | 2 | 33.3 |
9 | 0 | - | 0 | - | 0 | - | 0 | - |
10 | 2 | 0.0 | 0 | - | 0 | - | 2 | 100.0 |
total | 13416 | 100.0 | 13008 | 97.0 | 62 | 0.5 | 346 | 2.6 |
Most frequent errors (by word form) | |||
number | word | correct tag | tagger tag |
13 | um | KOUI | APPR |
9 | werden | VAINF | VAFIN |
9 | Osthold | NE | ADJD |
8 | Aber | KON | ADV |
8 | der | PRELS | ART |
8 | das | PDS | ART |
7 | dem | PRELS | ART |
6 | Stänner | NE | NN |
6 | Brück | NE | NN |
5 | die | PRELS | ART |
Most frequent errors (by tags) | ||
number | correct tag | tagger tag |
112 | NE | NN |
24 | VVINF | VVFIN |
21 | VVFIN | VVINF |
21 | PRELS | ART |
18 | KON | ADV |
17 | KOUI | APPR |
12 | VAINF | VAFIN |
11 | VVFIN | VVPP |
11 | NN | NE |
11 | NE | ADJD |