Next: Approaches to verb subcategorisation
Up: Comparing approaches to subcategorisation
Previous: A comparison
Preliminary Recommendations
In this section an overview of the following practical lexicons for NLP is
provided:
- ACQUILEX: a typed feature structure, multilingual lexicon developed
within the Acquilex project for four different languages (Dutch, English,
Italian and Spanish) following a sign-based approach to lexical specification
(Sanfilippo, 1993b).
- COMLEX: the American English lexicon prepared for the Linguistic Data
Consortium by the University of Pennsylvania, which uses a typed feature structure
formalism (Rohen Wolff et al., 1994).
- EUROTRA: the lexicon for generation and analysis in the Eurotra
Machine Translation system (ten Hacken et al., 1991).
- GENELEX: a full-scale application- and theory-independent lexicon
developed within the Genelex project (GENELEX, 1993).
It is assumed that from this general
repository of lexical information theory-dependent lexical representations
and/or application-specific dictionaries can easily be derived. The general
architecture of the lexicon is conceptually based on the entity-relationship
model.
- ILCLEX: an Italian lexicon, originally designed for integration with
a robust wide-coverage corpus grammar, which collects information extracted
from machine readable sources such as dictionaries and corpora
(Vanocchi et al., 1994).
- LDOCE: the Longman Dictionary of Contemporary English
(Procter, 1987), which
was taken as a basis for the construction of several computational lexicons
(see, for instance, Boguraev & Briscoe (1989).
- PLNLP: the lexicon used by the PLNLP Italian grammar, originally
developed for a style-checking application, and later used for other
different tasks, e.g. the acquisition of lexico-semantic information
from machine readable dictionaries (Jensen et al., 1993).
The lexicons differ largely as to the number of entries they contain and to
the amount of linguistic information each entry is provided with: while some
lexicons attach more importance to coverage, others appear to concentrate on
granularity and depth of lexical representations often to the detriment of
breadth. This is mainly due to the fact that most such lexicons were aimed at a
wide variety of different applications (ranging from machine translation to
style-checking). Moreover, some are strongly biased towards specific
linguistic theories. For the sake of concreteness, a specific case study was
chosen as a basis for comparison between these lexicons: the encoding of verb
entries and, particularly, of their subcategorisation frame. Although this
area has been the focus of intense research over the last fifteen
years, it still rests on fairly shaky ground.
In what follows, we will list all aspects which appear to be relevant for the
encoding of subcategorisation and consider them in more detail. It should be
noted that not all of the following are pertinent to all lexicons, as will be indicated
below:
- Number of arguments;
- Syntactic category of arguments;
- Functional role of arguments;
- Control and raising phenomena;
- Lexical selection;
- Morphosyntactic constraints;
- Frame alternation and argument optionality;
- structure and deep structure;
- Other properties.
The division between the top of this list and the bottom is to be
interpreted as indicative of a scale from the least to the most
uniformly-distributed properties: in other words, the properties in the
topmost part of the list are shared by all lexicons, the others are shared only by some
and with considerable differences in the way these properties are
represented therein. In what follows, we will first sketchily characterise
the approach to verb subcategorisation that each lexicon takes, then move
on to a more detailed consideration of the above listed aspects.
Next: Approaches to verb subcategorisation
Up: Comparing approaches to subcategorisation
Previous: A comparison