Next: GLDB - The Göteborg
Up: Lexical Semantic Resources
Previous: Introduction
Subsections
The Longman Dictionary and Thesaurus
Introduction
The Longman Dictionary and the Longman Lexicon of Contemporary English
have extensively been used in the pioneer work to extract NLP-lexicons
from Machine-Readable Dictionaries. Many of the insights for building
large-scale NLP lexicons have been based on studies of these
resources. Because of their age, their organization and structuring
is still based on the traditional practice of making dictionaries, but
certain features have made them particularly suitable for deriving
NLP-lexicons.
The Longman Dictionary of Contemporary English
The Longman Dictionary of Contemporary English [Pro78] is a
middle-size learner's dictionary: 45,000 entries and 65,000 word
senses. Entries are distinguished as homographs on the basis of the
historic origin of words and their part-of-speech, where each entry
may have one or more meanings. The entry-sense distributions for the
major parts of speech are as shown in Table 3.1.
Table 3.1:
Number of Entries and Senses in LDOCE
|
Entries |
Senses |
Polysemy |
Nouns |
23800 |
37500 |
1.6 |
Verbs |
7921 |
15831 |
1.9 |
Adjectives |
6922 |
11371 |
1.6 |
Total |
38643 |
64702 |
1.7 |
|
The information provided in entries comprises:
- Definitions using a limited set of 2000 Controlled Vocabulary
Words and 3000 derived words.
- Examples.
- Grammatical information on the constituent structure of
complementation of the words. LDOCE is mostly known for its
high-quality grammatical coding, however, since the focus is here on
semantics these are not further specified here.
- Usage labels in the form of codes and comments, covering
register , style (11 codes), dialect (20 codes) and region constraints
(9 codes).
- Subject Field codes and comments indicating the domain of
interest to which a meaning is related.
- Semantic codes either classifying nominal meanings or expressing selectional
restrictions for the complementation of verbal and adjectival meanings.
Most of the information is stored in textual form. However, the usage
codes, the subject-field code and the semantic codes are stored in the
form of a unique code system.
There are 100 main Subject Field codes which can be subdivided as
follows:
- MD
- medical
- MDZA
- medical anatomy
- ON
- occupation
- VH
- vehicles
The Subject Field Codes have been stored for 30% of the verb senses
and 59% of the noun senses. There are 100 main fields and 246
subdivisions. Two main fields can also be combined, MDON represents
both medical and occupation.
In total, there are 32 different semantic codes in LDOCE. A distinction
can be made between basic codes (19 codes) and codes that represent a
combination of a basic code (13 combinations):
- A
- Animal
- B
- Female Animal
- C
- Concrete
- D
- Male Animal
- E
- Solid or Liquid (not gas): S + L
- F
- Female Human
- G
- Gas
- H
- Human
- I
- Inanimate Concrete
- J
- Movable Solid
- K
- Male Animal or Human = D + M
- L
- Liquid
- M
- Male Human
- N
- Not Movable Solid
- O
- Animal or Human = A + H
- P
- Plant
- Q
- Animate
- R
- Female = B + F
- S
- Solid
- T
- Abstract
- U
- Collective Animal or Human = (Collective + O)
- V
- Plant or Animal = (P + A)
- W
- Inanimate Concrete or Abstract = (T + I)
- X
- Abstract or Human = (T + H)
- Y
- Abstract or Animate = (T + H)
- Z
- Unmarked
- 1
- Human or Solid = (H + S)
- 2
- Abstract or Solid = (T + S)
- 4
- Abstract Physical
- 5
- Organic Material
- 6
- Liquid or Abstract = (L + T)
- 7
- Gas or Liquid = (G + L)
The basic codes are organized into the hierarchy shown in Figure 3.1
Figure 3.1:
Hierachy of semantic codes in LDOCE
|
Most noun senses have a semantic code. In the case of nouns these
codes can be seen as a basic classification of the meaning. In the
case of verbs and adjectives however the codes indicate selection
restrictions of their arguments. These selection restrictions can also
be inferred from their definitions in which constituents corresponding
with the complements of the defined verbs or adjectives have been put
between brackets.
The Longman Lexicon of Contemporary English
LLOCE, the Longman Lexicon of Contemporary English, is a small
size learner style dictionary largely derived from LDOCE
and organized along semantic principles. A quantitative profile of the
information provided is given in the table below.
Number of entries |
16,000 |
Numer of senses |
25,000 |
Semantic fields |
Major codes |
14 |
Group codes |
127 |
Set codes |
2441 |
|
Grammar codes |
same as LDOCE |
Selectional restrictions |
same as LDOCE |
Domain & register Labels |
same as LDOCE |
Semantic classification in LLOCE is articulated in 3 tiers of
increasingly specific concepts represented as major, group and set
codes, e.g.
<MAJOR: A> Life and living things
|
<GROUP: A50-61> Animals/Mammals
|
<SET: A53> The cat and similar animals:
cat, leopard, lion, tiger,...
Each entry is associated with a set code, e.g.
<SET: A53> nouns The cat and similar animals
--------------------------------------------
cat 1 a small domestic [=> A36] animal ...
2 any animal of a group ...
...
panther [Wn1] 1 a leopard ...
2 AmE cougar.
...
<SET: A53> nouns The dog and similar animals
--------------------------------------------
dog a domestic animal with a coat of hair ...
Relations of semantic similarity between codes not expressed
hierarchically are crossreferenced, e.g.
<SET: A53> nouns The cat and similar animals
--------------------------------------------
cat 1 a small domestic [=> A36] animal ...
^^^^^^^^
<SET: A36> Man breeding living things
-------------------------------------
....
There are 14 major codes, 127 group codes and 2441 set codes. The
list of major codes below provides a general idea of the semantic
areas covered:
<A> Life and living things
<B> The body, its functions and welfare
<C> People and the family
<D> Buildings, houses, the home, clothes, belongings, and personal care
<E> Food, drink, and farming
<F> Feelings, emotions, attitudes, and sensations
<G> Thought and communication, language and grammar
<H> H Substances, materials, objects, and equipment
<I> Arts and crafts, sciences and technology, industry and education
<J> Numbers, measurement, money, and commerce
<K> Entertainment, sports, and games
<L> Space and time
<M> Movement, location, travel, and transport
<N> General and abstract terms
The list of group and set codes for the M domain (Movement,
location, travel, and transport) given in Table 3.2 provides an
example of the degree of details used in semantic classification.
Table 3.2:
Set codes for the domain of Movement, location, travel
and trasport in LLOCE.
Moving, coming, and going |
M 1 moving, coming, and going |
M 2 (of a person or object) not moving |
M 3 stopping (a person or object) from moving |
M 4 leaving and setting out |
M 5 arriving, reaching, and entering |
M 6 letting in and out |
M 7 welcoming and meeting |
M 8 getting off, down, and out |
M 9 climbing and getting on |
M 10 movement and motion |
M 11 staying and stopping |
M 12 passages, arrivals, and departures |
M 13 climbing, ascending, and descending |
M 14 moving |
M 15 not moving |
M 16 moving quickly |
M 17 not moving quickly |
M 18 speed |
M 19 particular ways of moving |
M 20 walking unevenly, unsteadily, etc |
M 21 walking gently, etc |
M 22 walking strongly, etc |
M 23 walking long and far, etc |
M 24 running and moving quickly, etc |
M 25 running and moving lightly and quickly, etc |
M 26 crawling and creeping, etc |
M 27 loitering and lingering, etc |
M 28 flying in various ways |
M 29 driving and steering, etc |
M 30 going on a bicycle, etc |
M 31 moving faster and slower |
M 32 coming to a stop, moving away, etc |
M 33 hurrying and rushing |
M 34 following, chasing, and hunting |
M 35 escaping, etc |
M 36 things and persons chased, etc |
M 37 avoiding and dodging |
M 38 leaving and deserting |
M 39 moving forward, etc |
M 40 turning, twisting, and bending |
M 41 flowing |
M 42 coasting and drifting |
M 43 bouncing and bobbing |
Putting and taking, pulling and pushing |
M 50 putting and placing |
M 51 carrying, taking, and bringing |
M 52 sending and transporting |
M 53 taking, leading, and escorting |
M 54 sending and taking away |
M 55 showing and directing |
M 56 pulling |
M 57 pulling out |
M 58 pushing |
M 59 throwing |
M 60 throwing things and sending things out |
M 61 extracting and withdrawing |
M 62 sticking and wedging |
M 63 closing, shutting, and sealing |
M 64 fastening and locking |
M 65 opening and unlocking |
M 66 open and not open |
M 67 openings |
Travel and visiting |
M 70 visiting |
M 71 inviting and summoning people |
M 72 Meeting people and things |
M 73 visiting and inviting |
M 74 travelling |
M 75 travelling |
M 76 people visiting and travelling |
M 77 people guiding and taking |
M 78 travel businesses |
M 79 hotels, etc |
M 80 in hotels, etc |
M 81 people in hotels, etc |
M 82 in hotels, travelling, etc |
M 83 in hotels, travelling, etc |
Vehicles and transport on land |
M 90 transport |
M 91 vehicles generally |
M 92 special, usu older, kinds of vehicles |
M 93 lighter motor vehicles, etc |
M 94 heavier motor vehicles |
M 95 buses, etc |
M 96 bicycles and motorcycles, etc |
M 97 persons driving vehicles, etc |
M 98 smaller special vehicles, etc |
M 99 vehicles for living in |
M 100 parts of vehicles outside |
|
M 101 parts of vehicles inside |
M 102 the chassis and the engine |
M 103 parts of a bicycle |
M 104 related to motocycles |
M 105 garages and servicing |
M 106 trams |
M 107 railways |
M 108 trains |
M 109 places relating to railways, travel, etc |
M 110 persons working on railways, etc |
M 111 driving and travelling by car, etc |
M 112 crashes and accidents |
Places |
M 120 places and positions |
M 121 space |
M 122 edges, boundaries, and borders |
M 123 neighbourhoods and environments |
M 124 at home and abroad |
M 125 roads and routes |
M 126 special roads and streets in towns |
M 127 special roads and streets in the country |
M 128 special streets in towns |
M 129 very large modern roads |
M 130 no-entries and cul-de-sacs |
M 131 paths and tracks |
M 132 parts of roads, etc |
M 133 lights on roads, etc |
M 134 bends and bumps, etc |
M 135 intersections and bypasses |
M 136 bridges and tunnels |
Shipping |
M 150 boats |
M 151 boats in general |
M 152 smaller kinds of boats |
M 153 larger kinds of sailing boats |
M 154 powered ships |
M 155 ships with special uses |
M 156 merchant ships, etc |
M 157 parts of ships |
M 158 positions on ships, etc |
M 159 harbours and yards |
M 160 quays and docks |
M 161 lighthouses, buoys, etc |
M 162 crews |
M 163 sailors, etc |
M 164 ship's officers, etc |
M 165 mooring and docking |
M 166 setting sail |
M 167 oars and paddles |
M 168 floating and sinking, etc |
M 169 wrecking and marooning, etc |
Aircraft |
M 180 aircraft and aviation |
M 181 jet aeroplanes |
M 182 balloons, etc |
M 183 helicopters |
M 184 spaceships |
M 185 airports |
M 186 parts of aircraft |
M 187 landing and taking off |
M 188 landing and taking off |
M 189 people working on and with aeroplanes |
Location and direction |
M 200 surfaces and edges |
M 201 higher and lower positions in objects, space, etc |
M 202 front, back, and sides |
M 203 about and around, etc |
M 204 in, into, at, etc |
M 205 out, from, etc |
M 206 here and not here |
M 207 across, through, etc |
M 208 against |
M 209 near |
M 210 far |
M 211 between and among |
M 212 away and apart |
M 213 back and aside |
M 214 to and towards |
M 215 from place to place |
M 216 on and upon |
M 217 off |
M 218 below, beneath, and under |
M 219 above and over |
M 220 after and behind |
M 221 in front, before, and ahead |
M 222 through and via |
M 223 past and beyond |
M 224 up |
M 225 down |
|
|
Comparison with Other Lexical Databases
LDOCE is a traditional Machine-Readable Dictionary. However, because
of its controlled- vocabulary, the systematic coding of the
information and the elaborate use of codes it has been a very useful
starting point for deriving basic NLP lexicons. [Bri89] give an
extensive description of the possibilities for elaboration. Except
for the semantic features, LDOCE does not contain complete semantic
hierarchies as in WordNet, EDR or other ontologies.
The bottom level of word sense clustering in LLOCE consists of sets of
semantically related words which need not be synonyms. For example,
the set D172 (baths and showers) contain nouns such as bath, shave, shower. This contrasts with lexical databases such as
WordNet where synsets are meant to contain synonymous word senses.
A further difference with WordNet regards taxonomic organization. In
Wordnet, hierarchical relations are mainly encoded as hyp(er)onymic
links forming chains of synsets whose length can vary considerably. In
LLOCE there are only three tiers and considerable
crossreferencing. Moreover, only the terminal leaves of the LLOCE
taxonomy correspond to actual word senses; the labels associated with
intermediate levels (major, group and set codes) are abstractions over
sets of semantically related word senses, just like the intermediate concepts used in the EDR (see §3.6).
Relations to Notions of Lexical Semantics
The semantic codes for nouns in LDOCE represents a very minimal and
shallow classification. The LLOCE classification is more elaborated but is still not
very deep. This classification information is similar to the taxonomic
models described in §2.7.
LLOCE in addition combines the entry format of LDOCE, which provides
detailed syntactic information (in the form of grammar codes) with the
semantic structure of a thesaurus. This combination is particularly
well suited for relating syntactic and semantic properties of words,
and in particular to individuate dependencies between semantic
predicates classes and subcategorization frames as described in
§2.4.
LE Uses
LDOCE has been most useful as a syntactic lexicon for parsing. The
usage of LDOCE as a semantic resource is not as wide-spread as one
would expect. This is mainly due to its restricted availability and
the fact that it still requires considerable processing to derive a
full-coverage NLP lexicon from it. [Bri89] give an overview of the
different kind of NLP lexicons that can be derived from
it. [Vos95b] give a description how a richly encoded semantic
lexicon with weighted features can be derived which is used in an
information retrieval task.
[San92a] and [San93b] use LLOCE to
derive verb entries with detailed semantic frame
information. [Poz96] describe a system which
uses LLOCE to assign semantic tags to verbs in bracketed corpora to
elicit dependencies between semantic verb classes and their admissible
subcategorization frames.
Next: GLDB - The Göteborg
Up: Lexical Semantic Resources
Previous: Introduction
EAGLES Central Secretariat eagles@ilc.cnr.it