next up previous contents
Next: Layers of annotation Up: Introduction Previous: Text representation and the

Recommendations

 

Existing schemes

Although we have endeavoured to take account of all available work on syntactic annotation, it is regrettable that much of the work undertaken to date has been on English. The extension of such work to other languages is relatively new, and as shown below, tends to be an extension of existing schemes developed for English, as instantiated in corpora or corpus parsing systems. The list below gives a brief overview of existing schemes; for a complete overview, see the two separate documents giving a survey of annotation practices and detailed overview of existing annotation practices. In the list, schemes/systems which are currently under development (as at May 1995) and for which no data are available are shown in square bracketsgif.

English:
Lancaster-Leeds Treebank; Lancaster Parsed Corpus; SUSANNE; Penn Treebank; ENGCG; TOSCA; CCPP; POW Corpusgif.
Dutch:
AMAZON/CASUS system in preparation (Nijmegen).
German:
[Helsinki Constraint Grammar]
Finnish:
[Helsinki Constraint Grammar]
Danish:
Extension of the Helsinki Constraint Grammar, at the Institute of Linguistics, University of Aarhus
Swedish:
[Helsinki Constraint Grammar]
French:
IBM Paris (parallel to Lancaster/IBM work); [Helsinki Constraint Grammar]
Spanish:
[extension of Penn Treebank scheme]; preliminary work is being carried out at the University of Santiago de Compostela under Guillermo Rojo
Italian:
no annotated corpora as yet, but work is being carried out at the Institute of Computational Linguistics in Pisa

It follows that what is proposed in this report is based on the limited experience of working with a small number of languages, and can be only tentatively applied to European languages in general.