Next: Background and Test Setups Up: Tagsets and Taggers Interaction Previous: Authors

Introduction

This document describes a few tests which allow to assess some of the complex interdependencies between corpus tagging methods, tagsets and tagging results. In particular, we assess

the impact of different statistical tagging methods on the results, by comparing the performance of different taggers on the same texts, when using the same tagset;
the impact of tagset modifications on the results, by using different versions of a tagset on the same texts; differences between the versions of the tagset are documented and classified, and the impact of single modifications is tested;
the impact of perceived linguistic differences between training texts and test (or: application) texts on the results, by using texts from different text types in training and test, tagsets and taggers being unchanged.

The tests have been carried out with tagsets directly derived from the morphosyntactic specifications for German proposed by EAGLES (see section 2 for details)

The results of these tests may help to give at least partial answers to the following, more general questions:

Is it possible at all to derive a corpus tagset from a lexical specification of the kind of the EAGLES proposal?
How well do taggers perform if used with such a lexicon-based tagset? Are there systematic tagging errors (significantly different from those obtained with other tagsets) across tagging methods?
How do modifications of the tagset influence tagging results, and which types of modifications lead to increases in accuracy? What lessons can we learn for tagset design?
Which is the impact of text type differences between training and testing? Do we need specific training on, say, fairy tales if we want to tag fairy tale texts, or can we (re-)use a tagger trained on newspaper text? How useful is it to train on ``mixed'' corpora?

It should be made clear, at this point, that the few tests performed here are only examples of the type of tests which would have to be carried out for a full assessment. In the analysis of some of the figures obtained in our tests, below, it will become very clear that, for example, a fuller interpretation of the impact of tagset modifications is only possible when comparisons with additional data (e.g. on frequency, on different analyses of a problematic item, etc.) are available. In this sense, the present document is more a case study than a full assessment. However, we hope that some of our test setups and of the practical and methodological considerations in this document are relevant for actions in the field of tagger evaluation in the broad sense.

The tests described here are part of a wider effort of validating the outcome of the EAGLES morphosyntax work in practical applications. It is connected with the parallel effort of the ELSNET Reusable Resources Task Group to create small reference text corpora annotated according to guidelines and tagset specifications directly derived from EAGLES work: the texts produced in the framework of this ELSNET/EAGLES collaboration have been used for some of the tests described here. Moreover, some of the questions discussed in this paper have come up in the EAGLES/ELSNET joint work.

This document is structured as follows:

Section 3 gives an overview of the background, test setups, aims and goals of the work described here;
section 4 briefly describes the resources and tools used;
section 5, contains a definition of the test setups used;
section 6 contains an overview of the results obtained on each test and an interpretation of these results.
Suggestions for additional tests, an overview of the primarily used tagset , as well as other background information, can be found in the annexes.

Next: Background and Test Setups Up: Tagsets and Taggers Interaction Previous: Authors