Questions of relevance for tagset design, addressed in the tests

Next: Practical setups Up: Tagset evaluation Previous: Purpose

Questions of relevance for tagset design, addressed in the tests

Which categories from ``traditional'' grammar, semantics, lexicon design can be used in tagsets? How much ``linguistics'' can there be in tagsets?
Which features are error-prone in the tagging of a given language, i.e. which distinctions affect the overall error rate of the taggers?
Example: GENDER is not relevant for French, i.e. the results are the same, whether GENDER is annotated or not ( cf. [&make_named_href('', "node40.html#Elworthy:95","[Elworthy 1995]")] and [&make_named_href('', "node40.html#Chanod+Tapanainen:95","[Chanod, Tapanainen 1995]")]). This question is especially interesting for applications of ELM.
Which lexical ambiguities can be solved by means of distributional classification?
- - Imperatives (a, e) occur only in sentence initial position, whereas indicative (b, c, d) forms can also occur in second or final position.
    
    kauft/V_IMP das Kleid !
    sie kauft/V_IND das Kleid !
    weil sie das Kleid kauft/V_IND
    kauft/V_IMP ihr das Kleid !
    kauft/V_IND ihr das Kleid ?
    
    As the distribution is different for the ambiguous forms, we may expect that taggers will disambiguate correctly on grounds of distribution. It is appropriate to use two non-ambigous tags.
  - German adjectives in predicative (a,d) or adverbial (b,c) use:
    
    Er ist schnell/ADJ_PRED
    Er fährt schnell/ADJ_ADV
    Er macht seine Aufgaben schnell/ADJ_ADV
    Er macht das Auto schnell/ADJ_PRED
    
    As the context is similar for adverbial and predicative adjectives, it seems appropriate to use an ambigous tag which covers both forms.
- Sometimes it is possible to increase the overall tagger rate by changes to a class of tags, even though it is not the disambiguation within this class of tags itself that is influenced, but the marking of a different distributional context for other ambiguous word forms in the context.
  We call this phenomenon contextual disambiguation power or external impact.
  Example: A subclassification of verb forms into modals and ``full'' verbs does not modify the type of ambiguity of the verb forms themselves (e.g. because the membership to the class is lexically determined), but modal verbs behave differently from content verbs (they can form passive constructions with an infinitive instead of a participle); thus they can help mark different contexts and restrict ambiguities of the context words. The ambiguity class INF??FIN (infinitives are inherently ambiguous with 1stpl or 3rdpl finite verbs) can be solved in sentence (d) (with different tags for modal and content verbs) but not in sentence (b) (with a joint class for all verbs).
  
  sie wollen/V_INF??V_FIN gehen/V_INF??V_FIN
  weil sie gehen/V_INF??V_FIN wollen/V_INF??V_FIN
  sie wollen/VM_INF??VM_FIN gehen/VV_INF
  weil sie gehen/VV_INF wollen/VM_INF??VM_FIN

Next: Practical setups Up: Tagset evaluation Previous: Purpose