Next: Tagset evaluation
Up: Proposals for practical experiments
Previous: Error Statistics
The tests proposed for tagger evaluation are based on a comparison of
the output produced by different taggers on the same text. The lexicon
statistics will be identical for all taggers, and the comparison will
have to concentrate on the tagger statistics.
The following parameters will be identified, compared and the results
interpreted:
- the overall error rate for each tagger,
- the tags which are most problematic for each individual tagger;
this can be identified by a frequency count of tagging errors and a
classification by tag;
- a comparison, across taggers, of tag confusion pairs, with the
objective of identifying certain tags or tag confusion pairs which are
problematic for all taggers tested; if indeed certain tag confusion
pairs turned out to be very frequent, irrespective of tagging
method, this result should have an impact on tagset design.