Next: Taggers
Up: Resources and tools
Previous: Corpora
The point of reference for all tests described in this document is the
German specification of the EAGLES morphosyntax encoding proposal,
ELM-DE. The following needs to be taken into account when interpreting
the test setups and the results.
- The tests take into account the technical state of the art of
automatic tagging and thus concentrate on the part of speech
annotation proposed in ELM-DE, leaving aside the more detailed and
more complex set of morphosyntactic features. Thus, only the higher
levels of the ELM-DE specification have been used here. This part
corresponds to the STTS tagset for German, which distinguishes 54 part
of speech tags (cf. Appendix C).
- The tagset used at RXRC is a derivative of STTS, but it omits
the distinction between proper names and common nouns, and it also
contains a few modifications, with respect to STTS, as far as the
classification of function words is concerned. The latter differences
are not particularly relevant for the present tests, and the former
one will be kept track of.
- The 1995 version of STTS had a predecessor, the IMS/TUE-Tagset,
of 1993. STTS is mappable to ELM-DE (it only has a few finer grained
distinctions in the domain of indefinite pronouns). In the redesigned
phase which led from IMS/TUE to STTS, a number of interesting
questions came up which concern tagset design and its impact on
tagging results. Some of these questions have been analyzed in more
detail in the experiments we will report on in this document, the
others have been collected in Appendix B. Thus the tagset
evaluation is based on concrete questions which came up during tagset
design and not just on arbitrary or otherwise motivated
questions. A detailed definition of the modifications tested in the
tagset evaluation work reported here is given in Appendix A.
Next: Taggers
Up: Resources and tools
Previous: Corpora