Next: Syntactically annotated corpora
Up: Practical NLP lexicons
Previous: A comparative overview
Preliminary Recommendations
Among the lexicons considered, there appears to be substantive agreement as to
the range of linguistic phenomena that are taken into account in the lexical
entry of a verb (as shown in table 3.3), much less so as to the
way these phenomena are to be encoded. In some cases, the relevant
information is only indirectly encoded but nonetheless inferrable (as
indicated by the label `infer' below) from other pieces of available
information. It is interesting to note, here, that the main area of
disagreement revolves around the boundary line between syntax and semantics,
as shown in the last two columns of our table, which are the only ones where
some minus signs appear (meaning that the phenomenon considered is not
accounted for). The fact that the information stored in each
lexicon is not uniformly distributed across the various levels of linguistic
description has a number of consequences:
- The lexicons differ greatly in the criterial properties they adopt
for the definition of an individual lexical entry: for instance, if, for
some lexicons (e.g. Comlex), different syntactic realisations of the same
complement give rise to different lexical entries, other lexicons
(e.g. Acquilex and LDOCE) collapse different argument structures of a verb in one
entry provided that they share a common core meaning.
- Some lexicons try to explain at one single level what other lexicons
tend to distribute over more than one level; a paradigmatic example is the
distinction between control and raising verbs, which is located at the
semantic level by those lexicons which avail themselves of this level
(e.g. Acquilex), while being captured in purely syntactic terms in
those lexicons, such as Comlex, which do not resort to semantics; yet some other lexicons
(e.g. PLNLP) neutralise the distinction itself by using a unique category to
encompass both cases.
- As a consequence of both 1 and 2, although there exists a
common set of observational linguistic phenomena which all lexicons agree in
taking into account, the way these phenomena are related to one another in
each lexicon varies considerably.
Lexicon | arg. | syn. | funct. | control | lexical | morphsyn. | frame | deep |
| arity | cat. | role | | select. | constr. | altern. | struct. |
Acquilex | infer | infer | infer | + | + | + | + | + |
Comlex | infer | + | + | + | + | + | + | - |
Eurotra | infer | + | + | + | + | + | - | + |
Genelex | infer | + | + | + | + | + | + | - |
ILC | + | + | + | + | + | + | + | - |
LDOCE | infer | infer | infer | + | + | + | infer | - |
PLNLP | infer | infer | infer | + | + | + | infer | - |
Table 3.3: Lexical entry information for verbs
A final point worth stressing in this context is that some lexicons, due to
their theoretical biases and/or their application-driven needs, appear to be
rather prescriptive as to the sort of information which should be put in the
lexicon: e.g. Eurotra allows a maximum of four arguments in a verb entry,
each of which can only be assigned a non-terminal syntactic category.
The Genelex attitude in this respect is far more liberal.
Next: Syntactically annotated corpora
Up: Practical NLP lexicons
Previous: A comparative overview