The error statistics are computed by comparing the tagger output with the manual tagging.
Example:
line word form manual tag
tagger tag lex lexical tags 1 Ilka NE NE - ADJD ADV NE NN
2 Sperber NE + NN
3 war VAFIN VAFIN + VAFIN
4 schon ADV ADV + ADV VVIMP
5 bei APPR APPR + APPR PTKVZ
6 den ART ART + ART PDS PRELS
7 meisten PIDAT PIDAT + PIDAT PIS
8 Stars NN NN + NN
9 zu APPR APPR + ADV APPR PTKA PTKVZ PTKZU
10 Gast NN NN + NN VVFIN VVIMP
11 , $, $, + $,
2 die PRELS PRELS + ART PDS PRELS
13 in APPR APPR + APPR
14 Berlin NE NE + NE
15 wohnen VVFIN + VVFIN VVINF
16 . $. $. + $.
For the above sample test corpus we get the following numbers:
Number of tokens: | 16 | |
Number of tags: | 11 | { $, $. ADV APPR ART NE NN |
PIDAT PRELS VAFIN VVFIN } | ||
Lexicon gaps: | 1 | ``Ilka'' |
Lexical errors: | 1 | ``Sperber'' |
Ambiguity classes | 13 | {APPR}, {NE}, {NN}, {VAFIN}, {$,}, {$.}, |
{ADV VVIMP}, {APPR PTKVZ}, {PIDAT PIS}, | ||
{VVFIN VVINF}, {ART PDS PRELS}, | ||
{ADJD ADV NN NE} | ||
{ADV APPR PTKA PTKVZ PTKZU} | ||
Ambiguity rate: | 2.06 | (= 23/16) |
The following table shows the error classification by ambiguity type for the above sample text.
ambiguity type | number of | tagger | accuracy |
tokens | errors | ||
1 tag | 7 | 1 | 85.7 % |
2 tags | 4 | 1 | 75.0 % |
3 tags | 3 | 0 | 100.0 % |
4 tags | 1 | 0 | 100.0 % |
5 tags | 1 | 0 | 100.0 % |
overall accuracy | 16 | 2 | 87.5 % |