Next: Recommendations for morphosyntactic categories
Up: Morphosyntactic Annotation
Previous: Rationale for the present
Recommendations
Like the lexicon guidelines,
the morphosyntactic tagging guidelines
- Make use of an attribute-value formalism.
- Do not adhere to a strict attribute-value hierarchy (in terms of monotonic inheritance).
- Use three levels of constraint (obligatory, recommended
and optional) in defining what is acceptable according to the guidelines.
- Subdivide the optional level into two types of optional extension to
tagsets:
- Extensions to deal with phenomena which are marginal to
morphosyntactic annotation strictly defined, but common to a number of
languages (e.g. the distinction between countable and mass nouns);
- Extensions to deal with phenomena which are specific to
particular EU languages.
A few words may be added regarding each of these points:
- At a descriptive level, morphosyntactic tags are therefore defined as
sets of attribute-value pairs, although at a `visible' character-coding
level they may not be symbolised as such.
- For an individual language, it may be an important step to formalise the
tagset as an attribute-value hierarchy. However, this degree of formalisation
is not appropriate to the cross-linguistic level of abstraction, where we are
specifying guidelines to apply to all EU languages.
- The obligatory level of constraint is limited to the major
categorisations of parts of speech as Noun, Verb, Conjunction,
etc. The recommended level of constraint applies to well-known attributes
used widely in the description of European languages: e.g. (for nouns)
Number, Gender and Case.
- At the optional level, the guidelines clearly have a weaker import, and
should not be regarded as mandatory in any sense, but simply as a presentation
of possibilities sanctioned by current practice.
The tagset guidelines should allow mappings to be stated between the coding
of morphosyntactic phenomena in a lexicon and their coding in the
morphosyntactic annotation of text corpora. However, because of
the different perspective and goals of these two
activities there is no necessary expectation
that this will be a
straightforward mapping. One suggestion, therefore, is that it should be
easier to specify the conversion between lexicon and annotation categories by
making use of an
Intermediate Tagset.
Next: Recommendations for morphosyntactic categories
Up: Morphosyntactic Annotation
Previous: Rationale for the present