Tagger Evaluation

Next: Tagset evaluation Up: Proposals for practical experiments Previous: Error Statistics

Tagger Evaluation

The tests proposed for tagger evaluation are based on a comparison of the output produced by different taggers on the same text. The lexicon statistics will be identical for all taggers, and the comparison will have to concentrate on the tagger statistics.

The following parameters will be identified, compared and the results interpreted:

the overall error rate for each tagger,
the tags which are most problematic for each individual tagger; this can be identified by a frequency count of tagging errors and a classification by tag;
a comparison, across taggers, of tag confusion pairs, with the objective of identifying certain tags or tag confusion pairs which are problematic for all taggers tested; if indeed certain tag confusion pairs turned out to be very frequent, irrespective of tagging method, this result should have an impact on tagset design.