Next: Overview of phenomena and
Up: Tagset evaluation
Previous: Tagset evaluation
Below, we describe the types of tagset modifications we will test in
the practical part of the tagset evaluation. Some of the changes were motivated by linguistic
reasons, others by practical reasons (tagging accuracy). Many of the
changes included changes to ambiguity classes for a word form or a
group of word forms.
A list of phenomena is given below, in section 5.3.2,
and more details (including a description of the ``old'' vs. ``new'' practice)
can be found in appendix A.
The different kinds of tagset modifications are defined as follows:
- Type I: Granularity Changes
- Type Ia: Simple granularity changes: Two or more tags for
ambiguous word forms are merged, or an ambiguous tag is split into
one or more unambiguous tags. As mentioned above, a splitting is
only advisable if the word forms have a different distribution.
Total tagging accuracy is regarded after the change.
- Type Ib: More complex granularity changes, overlapping tags.
Two ambiguous tags, A and B, are sometimes split into
three classes, where the new, third class has overlapping
ambiguities with either of the original classes. Such is the case
with personal and reflexive pronouns. The constellations for tests
in this case are more numerous:
- merge the forms into a single ambiguous tag (AB);
- split the forms, such that there is a single tag for the two
unambiguous forms, and a tag for all ambiguous forms (A, B, AB);
- merge the ambiguous forms with either of the unambiguous tags
(2 possibilities: ``A, AB'' and ``AB, B'').
- Type Ic: External impact of granularity changes: Granularity
changes or definition changes not only affect the immediately
involved (internal) ambiguity classes, but may also have an
influence on other ambiguous classes. It might thus be worth
splitting a non-ambiguous class, if a positive impact of this
splitting on the disambiguation of other classes can be expected.
- Type II: Changes in the assignment of wordforms to tags. The
borderline between two tags is shifted (in guidelines and in the
annotation of the reference text) but tags included in these changes
are otherwise still the same. An example for German: it is very
difficult to distinguish lexicalized participles used adverbially
(ADJD) from participles used within the verbal complex (VVPP). We
tried several different definitions, depending on the syntactic
constructions the ambiguous wordforms appear in.
Next: Overview of phenomena and
Up: Tagset evaluation
Previous: Tagset evaluation