Next: References
Up: Recommendations for morphosyntactic categories
Previous: Dealing with ambiguity
Recommendations
Ambiguity is just one of a number of phenomena for which some kind of multiple
tagging of the same textword may be required. Other cases of multiple tagging
which should be mentioned are:
- 1. Form-function tagging:
- Sometimes the need is felt to assign two different tags to the same word: one
representing the formal category, and the other the functional category, e.g.:
- A word with the form of a past participle but the function of an
adjective;
- A word with the form of an adjective but the function of an adverb.
In principle, it can be argued that two tags should be assigned to each of
these word types, and should be distinctly encoded. In practice, tagging
schemes up to the present have tended to give priority of one criterion over
another (i.e. giving priority to function over form or vice versa).
The annotation scheme for a given tagged corpus should clearly state the use
of such criteria.
- 2. Lemma tagging:
- A morphosyntactically tagged corpus is generally supposed to specify the
grammatical form of a textword, rather than to recover the lemma. However, in
transfer of information from a corpus to a lexicon or vice versa, it is
assumed that a lemmatisation algorithm will have an important role. There is
also a case (especially as a preliminary to syntactic and semantic annotation)
for a type of annotation which specifies the lemma, as well as the grammatical
form, for each textword. Lemma tagging, as this process may be
called, has so far not been widely undertaken. Once again, the need is for
independent ways of representing the lemma tag and the grammatical
form tag.
For both the above cases of multiple tagging, as well as for the tagging of
ambiguity, there is a need for assigning more than one morphosyntactic tag to
the same word. There is a case for preference for a vertical format for
presenting such a multiply-tagged annotated corpus. The combination of
different kinds of word tagging in the same annotated corpus can then be
managed, without confusion, by associating each kind of tag with a different
field or column alongside the vertical text.
Next: References
Up: Recommendations for morphosyntactic categories
Previous: Dealing with ambiguity