Next: Parameters of description of
Up: The content of the
Previous: Formats - Representation
Above, a few properties of the consensual approach to standardisation have
been sketched out; throughout its work, the CLWG has aimed at satisfying
the
requirements related with the approach of standardisation by consensus.
- Broad overview:
- The survey phase was based on input from both the fields of computational
lexicography and of corpus linguistics (corpus processing is
seen as the most prominent application of both morphosyntactic and syntactic
classification of lexical material. In this context, we would like to
particularly thank the EAGLES Subgroups on morphosyntactic and on
syntactic corpus annotation for the valuable input and the amount of
effort offered as a contribution to the work summarised here).
- The morphosyntax work, although highly language-specific,
covers
all of the European Union languages, plus several languages of Eastern
and Central European countries.
- Critical mass:
- Broad language coverage has been ensured
(see above); contributions on the different languages have
been provided by both industry and academia; the
first proposals,
documented in EAGLES (1996g), have been discussed
in many LRE and LE projects, whose feedback was then integrated.
- The comparison of individual specifications for the individual
languages (i.e. EAGLES (1996a), EAGLES (1996b), EAGLES (1996d),
EAGLES (1996c)) is eased by the fact that all specifications are
strictly parallel (based on the same descriptive devices, the same
presentation, etc.).
- The availability of detailed documentation allows for
easy-to-reproduce descriptions. This is enhanced by the fact
that an integrated package is being prepared which will contain the
morphosyntax specifications, the practical guidelines for their use in
corpus tagging, and tagged sample texts of 50-60,000 wordforms, for both
German and Italian (see below).
- Proof of relevance, for
practical applications and for leading edge technology development:
- The breadth of the fragment coverage and the depth of the
description makes the proposals
applicable outside toy situations.
- In
addition, more sophisticated applications can hook up their own,
specific descriptions to the basic ones proposed by the CLWG.
- The formalisation employed makes the proposals reinterpretable
(the reinterpretation being even automatically verifiable) and thus
more easily reusable than proposals for which such possibilities do not
exist.
- Accessibility and testability: a pilot resource has been
produced, to be distributed via
the European Linguistic Resources Association (ELRA),
containing
50-60,000 wordform mini-corpora of Italian and German newspaper text,
annotated according to the proposals of the CLWG. The preparation of these corpora
was carried out jointly with the ELSNET Language
and Speech Resources Task Group. We most gratefully acknowledge the
financial support provided by ELSNET for this task.
Next: Parameters of description of
Up: The content of the
Previous: Formats - Representation