Next: Background and Test Setups
Up: Tagsets and Taggers Interaction
Previous: Authors
This document describes a few tests which allow to assess some of the
complex interdependencies between corpus tagging methods, tagsets and
tagging results. In particular, we assess
- the impact of different statistical tagging methods on the
results, by comparing the performance of different taggers on the same
texts, when using the same tagset;
- the impact of tagset modifications on the results, by using
different versions of a tagset on the same
texts; differences between the versions of the tagset are documented
and classified, and the impact of single modifications is tested;
- the impact of perceived linguistic differences between training
texts and test (or: application) texts on the results, by using texts
from different text types in training and test, tagsets and taggers
being unchanged.
The tests have been carried out with tagsets directly derived
from the morphosyntactic specifications for German
proposed by EAGLES (see section 2 for details)
The results of these tests may help to give at least partial answers
to the following, more general questions:
- Is it possible at all to derive a corpus tagset from a lexical
specification of the kind of the EAGLES proposal?
- How well do taggers perform if used with such a lexicon-based
tagset? Are there systematic tagging errors (significantly different
from those obtained with other tagsets) across tagging methods?
- How do modifications of the tagset influence tagging results,
and which types of modifications lead to increases in accuracy? What
lessons can we learn for tagset design?
- Which is the impact of text type differences between training
and testing? Do we need specific training on, say, fairy tales if we
want to tag fairy tale texts, or can we (re-)use a tagger trained on
newspaper text? How useful is it to train on ``mixed'' corpora?
It should be made clear, at this point, that the few tests performed
here are only examples of the type of tests which would have to be
carried out for a full assessment. In the analysis of some of
the figures obtained in our tests, below, it will become very clear
that, for example, a fuller interpretation of the impact of tagset
modifications is only possible when comparisons with additional data
(e.g. on frequency, on different analyses of a problematic item, etc.)
are available. In this sense, the present document is more a case
study than a full assessment. However, we hope that some of our test
setups and of the practical and methodological considerations in this
document are relevant for actions in the field of tagger evaluation in
the broad sense.
The tests described here are part of a wider effort of validating the
outcome of the EAGLES morphosyntax work in practical applications. It
is connected with the parallel effort of the ELSNET Reusable Resources
Task Group to create small reference text corpora annotated according
to guidelines and tagset specifications directly derived from EAGLES
work: the texts produced in the framework of this ELSNET/EAGLES
collaboration have been used for some of the tests described here.
Moreover, some of the questions discussed in this paper have come up
in the EAGLES/ELSNET joint work.
This document is structured as follows:
- Section 3 gives an overview of the
background, test setups, aims and goals of the work
described here;
- section 4 briefly describes the resources
and tools used;
- section 5, contains a definition of the
test setups used;
- section 6 contains an overview of the
results obtained on each test and an interpretation of
these results.
- Suggestions for additional tests, an overview of the primarily
used tagset
, as well as other background information, can be found in
the annexes.
Next: Background and Test Setups
Up: Tagsets and Taggers Interaction
Previous: Authors