Ambiguity, as contrasted with underspecification, is the phenomenon of lack of information, where there is uncertainty between two or more alternative descriptions. Four different senses of ambiguity can be distinguished in morphosyntactic tagging.
The English word round has five potential tags: it can be
However, with large corpora, tagging is done automatically, and there may be
no need or opportunity for the manual post-editing of the whole corpus. It
can be practical, in such cases, to retain more than one tag in the annotated
corpus, where the automatic tagging algorithms have not provided strong enough
evidence for disambiguation. For example, in the British National Corpus, a
set of portmanteau tags is used in recording such
ambiguities. One of them is the tag VVD-VVN, which means ``either the past
tense or the past participle of a lexical verb''. The portmanteau tag appears
in the annotated British National Corpus in the TEI
format of an entity
reference appended to the word, e.g.: likedVVD-VVN
;. Other formats
of presentation would also be reasonable. A portmanteau tag signals
uncertainty about the appropriate tag, for reasons of fallible automatic
processing. It is assumed that a trained human post-editor would in general
have no difficulty in resolving the ambiguity.
A further type of ambiguity may arise where the human annotator cannot decide on a single appropriate tag. There may be good reasons for this type of indecision:
In the present stage of development of morphosyntactic tagging, the ability to deal with this kind of ambiguity is not a matter of great priority -- but it may become more important in the future.
By this we mean cases where the text does not provide enough information for disambiguation between two or more clearly defined categories. For example, it may be unclear whether in a given case the exclamatory word Fire! is a verb or a noun. Ideally, in such cases, more than one tag should be attached to the same textword.
The encoding of ambiguity in morphosyntactic annotation has so far received little attention, and we make no recommendations except to propose that in principle, all the kinds of ambiguity listed above should be distinguishable by different mark-up.