next up previous contents
Next: Obligatory annotations Up: Syntactic Annotation Previous: Conclusion

Recommendations

 

Guidelines

The notion of standardisation of syntactic annotation of corpora is a problematic one. Even more than in the case of morphosyntactic annotation, the situation with respect to syntactic annotation is variable and fluid. Certain practices have become fairly widespread (e.g. common treebank conventions adopted in the Lancaster, Penn Treebank, and SUSANNE treebanks for English, and at IBM for French. Similarly, the formalism of the Helsinki Constraint Grammar, which has been applied to English, is currently being extended to Finnish, German, Swedish, French and Basque). But it would be premature to impose a standard on all syntactic annotation activities under the aegis of the EC. Work in this area is still at an early stage. A scheme chosen for syntactic annotation may have a marked effect on the success of a parser working with that scheme. Since no completely successful parser has as yet been developed, the imposition of any particular scheme could be detrimental to future research. The guidelines outlined here are therefore only intended to be preliminary. The scheme adopted for syntactically annotating any corpus is one which is partly task dependent. The framework presented here is therefore a basic introduction to the task, outlining some canonical categories, and some of the problems and difficulties that may arise from their application. There are many choices to be made in the analysis of natural language, and the preference of one approach to another will depend in part upon the intended purpose of the corpus.

Following a similar procedure adopted in the EAGLES Guidelines guidelines for morpho-syntactic annotation of corpora (EAGLES (1996a)), the information types given in the guidelines are divided into three sections, of which the first is empty in this case:

  1. Obligatory annotations: required if the parsing scheme is to be conformant with EAGLES standards.
  2. Recommended annotations: not required, but recommended as a `default' in the sense that such information should not be omitted unless there are good reasons for doing so.
  3. Optional annotations: where inclusion or omission is not required or recommended, but is a matter to be justified in its own terms, taking account of such factors as practical constraints, the task orientation, or the language of the text corpus.




next up previous contents
Next: Obligatory annotations Up: Syntactic Annotation Previous: Conclusion