Next: The proposal phase
Up: Methodology adopted
Previous: Bottom-up approach
Preliminary Recommendations
The morphosyntactic descriptions and encoding
schemes involved in the comparison phase are the following:
- For lexicons:
- The MULTILEX model (the first row in the
tables), as presented
in MULTILEX (1993).
- The GENELEX model (second row in the tables)
for the encoding of the morphological and syntactic
levels in a lexicon,
originally elaborated for French (GENELEX, 1993a; GENELEX, 1993b).
- The specific AlethDic application of the GENELEX model
(third row in the tables) (Gsi-Erli, 1993).
- For corpora:
- The proposal of a consensual nucleus of
morphosyntactic information encoded by the most common existing
tagging practices (fourth row in the tables)
presented in the framework of the NERC Project (Monachini & Östling, 1992a; Monachini & Östling, 1992b),
which took into account the following tagsets: UPenn,
Gothenburg and Brown for American English; BNC, LOB, Lancaster and
ENGTWOL lexicon for British English; ILC-DMI and EUROTRA for Italian;
INaLF for French; and Uit den Boogaart for Dutch.
- The scheme proposed by
Leech and Wilson (fifth row in the tables) -- as an outcome of a
joint meeting of the Lexicon and Corpus Groups in Pisa (Leech & Wilson, 1993b).
Note that this proposal is a step forward with respect to that of NERC
since, besides corpora,
it also takes into account
the first requirements for lexicons which emerged from
the discussion at the joint meeting in Pisa.
Additionally, it should be noted that the NERC proposal also took into account
the list of common morphological features proposed within the
TEI by the
Linguistic Analysis Committee (TEI, 1991).
The comparison is displayed, category by category, by means of
synoptic tables subdivided into two zones,
containing the relevant features, followed by some explanatory notes and
comments. The first zone is devoted to comparison and the second one to
the EAGLES proposal.
The categories and languages covered are given in
table 1.
Click on a category to view the synoptic table for that category. Click
on the intersection between category and language to view the application
of the EAGLES proposal for some category to some language.
At any point in this document, clicking on the navigation button
will bring you directly to this table.
The tables representing a category are structured as follows (see table 2):
- The 1st vertical column on the left contains the names of the
encoding systems under analysis;
- The top horizontal row displays the morphosyntactic category (PoS)
considered (first item on the left),
and the relevant morphosyntactic
information, presented as attribute names;
- Each column has the name of an attribute and lists the relevant
values that are used within each system -- if the cell is left empty,
this means that the system does not mark the information.
PoS | Attribute | Attribute | Attribute | Attribute | ... |
MULTILEX | value | value | value | value | ... |
GENELEX | value | | value | value | ... |
AlethDic | value | value | value | | ... |
NERC | value | value | | value | ... |
EAGLeech | value | | value | value | ... |
EAG-L0 | obligatory: PoS |
EAG-L1 | recommended: minimal common core set of features |
EAG-L2a | optional: information common to languages, |
| either not usually encoded |
| or not purely morphosyntactic |
EAG-L2b | language-specific: language-specific info |
| | | | | |
Table 2: Overall structure of synoptic tables
Next: The proposal phase
Up: Methodology adopted
Previous: Bottom-up approach