Next: Resources from MemoData
Up: Lexical Semantic Resources
Previous: GLDB - The Göteborg
Subsections
Wordnets
Introduction
WordNet1.5 is a generic lexical semantic network developed at
Princeton University [Mil90a] structured around the notion of
synsets. Each synset comprises one or more word senses with the same
part of speech which are considered to be identical in meaning,
further defined by a gloss, e.g.:
- Synset = file 2, data file 1
- Part of Speech = noun
- Gloss = a set of related records kept together.
A synset represents a concept and semantic relations are expressed
mostly between concepts (except for antonymy and derivational
relations). The relations are similar to the lexical semantic
relations between word senses as described by [Cru86] (see §
2.7).
EuroWordNet [Vos97a] is a multilingual database containing
several monolingual wordnets structured along the same lines as
Princeton WordNet1.5: synsets with basic semantic relations. The
languages covered in EuroWordNet are: English, Dutch, German, Spanish,
French, Italian, Czech and Estonian. In addition to the relations
between the synsets of the separate languages there is also an
equivalence relation for each synset to the closest concept from an
Inter-Lingual-Index (ILI). The ILI contains all WordNet1.5 synsets
extended with any other concept needed to establish precise
equivalence relations across synsets. Via the ILI it is possible to
match synsets from one wordnet to another wordnet (including
WordNet1.5). Such mapping may be useful for cross-language
Information-Retrieval, for transfer of information and for comparing
lexical semantic structures across wordnets.
The Princeton WordNet1.5
General information on the size and coverage of WordNet1.5 is given in
Table 3.6, taken from [Die96b]:
Table 3.6:
Numbers and figures for WordNet1.5 (* the synonymy relation
is included in the notion of synset and is not counted here; ** the
relations/sense is here calculated for synsets, because most relations
apply to the synsets as a whole)
|
All PoS |
Nouns |
Verbs |
Adjectives |
Adverbs |
Other |
Number of Entries |
126520 |
87642 |
14727 |
19101 |
5050 |
0 |
Number of Senses |
168217 |
107484 |
25768 |
28762 |
6203 |
0 |
Senses/Entry |
1.33 |
1.23 |
1.75 |
1.51 |
1.23 |
|
Morpho-Syntax |
|
|
Yes |
|
|
|
Synsets |
yes |
|
|
|
|
|
- Number of Synsets |
91591 |
60557 |
11363 |
16428 |
3243 |
0 |
- Synonyms/Synset |
1.84 |
1.77 |
2.27 |
1.75 |
1.91 |
|
Sense Indicators |
Yes |
yes |
yes |
yes |
yes |
|
- Indicator Types |
1 |
1 |
1 |
1 |
1 |
|
- Indicator Tokens |
76705 |
51253 |
8847 |
13460 |
3145 |
|
- Indicators/Sense |
0.46 |
0.48 |
0.34 |
0.47 |
0.51 |
|
Semantic Network |
|
|
|
|
|
|
- Relation Types* |
13 |
10 |
6 |
5 |
2 |
|
- Relation Tokens |
128313 |
80735 |
13321 |
30659 |
3598 |
|
- Relations/Sense** |
1.4 |
1.3 |
1.1 |
1.8 |
1.1 |
|
- Number of Tops |
584 |
11 |
573 |
|
|
|
Semantic Features |
No |
|
|
|
|
|
Multilingual Relations |
No |
|
|
|
|
|
Argument Structure |
Yes |
|
|
|
|
|
- Semantic Roles |
No |
|
|
|
|
|
Semantic Frames |
Yes |
|
|
|
|
|
- Frame Types |
35 |
|
35 |
|
|
|
Selection Restrictions |
Yes |
|
|
|
|
|
- Restriction Types |
2 |
|
2 |
|
|
|
Domain Labels |
No |
|
|
|
|
|
Register Labels |
No |
|
|
|
|
|
|
The following relations are distinguished between synsets:
- Synonyms:
- members of the synset which are equal or very close
in meaning, e.g.
{man 1, adult male}
- Antonyms:
- synsets which are opposite in meaning, e.g.
{man, adult male}
==>
{woman, adult female}
- Hyperonyms:
- synsets which are the more general class of a
synset, e.g.
{man, adult male}
==>
{male, male person}
- Hyponyms:
- synsets which are particular kinds of a synset, e.g.
{weather, atmospheric condition, elements} |
==> |
{cold weather, cold snap, cold wave, cold spell}
{fair weather, sunshine, temperateness}
{hot weather, heat wave, hot spell} |
- Holonyms:
- synsets which are the whole of which a synset is a part.
[Part of] e.g.,
{flower, bloom, blossom} PART OF: {angiosperm, flowering plant}
[Member of] e.g.,
{homo, man, human being, human} MEMBER OF: {genus Homo}
[Substance of] e.g.,
{glass} SUBSTANCE OF: |
{glassware, glasswork} {plate glass, sheet of glass} |
- Meronyms:
- synsets which are the parts of a synset.
[Has Part] e.g.,
{flower, bloom, blossom}
HAS PART: |
{stamen}
{pistil, gynoecium}
{carpel}
{ovary}
{floral leaf} |
{perianth, floral envelope}
[Has Member] e.g.,
{womankind} HAS MEMBER: {womanhood, woman}
[Has Substance]
{glassware, glasswork} HAS SUBSTANCE: {glass}
- Entailments:
- synsets which are entailed by the synset, e.g.
{walk, go on foot, foot, leg it, hoof, hoof it}
{step, take a step}
- Causes:
- synsets which are caused by the synset, e.g.
{kill}
{die, pip out, decease, perish, go, exit, pass away, expire}
- Value of:
- (adjectival) synsets which represent a value for a
(nominal) target concept. e.g.
poor VALUE OF: {financial condition, economic condition}
- Has Value:
- (nominal) synsets which have (adjectival) concept
as values, e.g.
size
{large, big}
- Also see:
- Related synsets, e.g.
{cold} Also See
{cool, frozen}
- Similar to:
- Peripheral or Satellite adjective synset linked to the most central
(adjectival) synset, e.g.
{damp, dampish, moist} SIMILAR TO: {wet}
- Derived from:
- Morphological derivation relation with a synset, e.g.
{coldly, in cold blood, without emotion} Derived from adj
{cold}
In the description of WordNet1.5 ([Fel90]), troponymy is discussed
as a separate relation. It is restricted to verbs referring to
specific manners of change. However, in the database is it is
represented in the same way as hyponymy. In the description given here
verb-hyponymy is thus eqquivalent to troponymy.
Finally, multiple hyperonyms are possible but have not been encoded exhaustively.
Table 3.7 gives the distribution of the relations for each
Part of Speech in terms of the number of synsets.
Table 3.7:
Numbers and figures for WordNet1.5
Relation |
Nouns |
Verbs |
Adjectives |
Adverbs |
Antonym |
1713 |
1025 |
3748 |
704 |
Hyponym |
61123 |
10817 |
0 |
0 |
Mero-member |
11472 |
0 |
0 |
0 |
Mero-sub |
366 |
0 |
0 |
0 |
Mero-part |
5695 |
0 |
0 |
0 |
Entailment |
0 |
435 |
0 |
0 |
Cause |
0 |
204 |
0 |
0 |
Also-See |
0 |
840 |
2686 |
0 |
Value of |
1713 |
0 |
636 |
0 |
Similar to |
0 |
0 |
20050 |
0 |
Derived from |
0 |
0 |
3539 |
2894 |
Total |
82082 |
13321 |
30659 |
3598 |
|
The semantic network is distributed over different files representing
some major semantic clusters per parts-of-speech:
- Noun Files in WordNet1.5
- noun.act
- nouns denoting acts or actions
- noun.animal
- nouns denoting animals
- noun.artifact
- nouns denoting man-made objects
- noun.attribute
- nouns denoting attributes of people and objects
- noun.body
- nouns denoting body parts
- noun.cognition
- nouns denoting cognitive processes and contents
- noun.communication
- nouns denoting communicative processes and contents
- noun.event
- nouns denoting natural events
- noun.feeling
- nouns denoting feelings and emotions
- noun.food
- nouns denoting foods and drinks
- noun.group
- nouns denoting groupings of people or objects
- noun.location
- nouns denoting spatial position
- noun.motive
- nouns denoting goals
- noun.object
- nouns denoting natural objects (not man-made)
- noun.person
- nouns denoting people
- noun.phenomenon
- nouns denoting natural phenomena
- noun.plant
- nouns denoting plants
- noun.possession
- nouns denoting possession and transfer of possession
- noun.process
- nouns denoting natural processes
- noun.quantity
- nouns denoting quantities and units of measure
- noun.relation
- nouns denoting relations between people or things or ideas
- noun.shape
- nouns denoting two and three dimensional shapes
- noun.state
- nouns denoting stable states of affairs
- noun.substance
- nouns denoting substances
- noun.time
- nouns denoting time and temporal relations
- Verb Files in WordNet1.5
- verb.body
- verbs of grooming, dressing and bodily care
- verb.change
- verbs of size, temperature change, intensifying, etc.
- verb.cognition
- verbs of thinking, judging, analyzing, doubting
- verb.communication
- verbs of telling, asking, ordering, singing
- verb.competition
- verbs of fighting, athletic activities
- verb.consumption
- verbs of eating and drinking
- verb.contact
- verbs of touching, hitting, tying, digging
- verb.creation
- verbs of sewing, baking, painting, performing
- verb.emotion
- verbs of feeling
- verb.motion
- verbs of walking, flying, swimming
- verb.perception
- verbs of seeing, hearing, feeling
- verb.possession
- verbs of buying, selling, owning
- verb.social
- verbs of political and social activities and events
- verb.stative
- verbs of being, having, spatial relations
- verb.weather
- verbs of raining, snowing, thawing, thundering
Within each of these files there may be one or more synsets which have
no hyperonym and therefore represent the tops of the network. In the
case of nouns there are only 11 tops or unique- beginners, in the case
of verbs 573 tops.
- Noun Tops in WordNet1.5
- entity
- something having concrete existence; living or nonliving
- psychological feature
- a feature of the mental life of a living organism
- abstraction
- a concept formed by extracting common features from examples
- location, space
- a point or extent in space
- shape, form
- the spatial arrangement of something as distinct from its substance
- state
- the way something is with respect to its main
attributes; ``the current state of knowledge"; ``his state of health";
``in a weak financial state"]
- event
- something that happens at a given place and time
- act, humanaction, humanactivity
- something that people do or cause to happen
- group, grouping
- any number of entities (members) considered as a unit
- possession
- anything owned or possessed
- phenomenon
- any state or process known through the senses rather than by intuition
or reasoning
Whereas semantic relations such as hyponymy and synonymy are strictly
paradigmatic relations, other relations such as meronymy and cause can
be seen as syntagmatic relations imposing a preference relation
between word senses:
- Meronymy
- the head of a lion
- Cause
- she died because he killed her
Finally, provisional argument-frames are stored for verbs. These
frames provide the constituent structure of the complementation of a
verb, where --s represents the verb and the left and right strings
the complementation pattern:
Verb-frames in WordNet1.5
- 1.
- Something --s
- 2.
- Somebody --s
- 3.
- It is --ing
- 4.
- Something is --ing PP
- 5.
- Something --s something Adjective/Noun
- 6.
- Something --s Adjective/Noun
- 7.
- Somebody --s Adjective
- 8.
- Somebody --s something
- 9.
- Somebody --s somebody
- 10.
- Something --s somebody
- 11.
- Something --s something
- 12.
- Something --s to somebody
- 13.
- Somebody --s on something
- 14.
- Somebody --s somebody something
- 15.
- Somebody --s something to somebody
- 16.
- Somebody --s something from somebody
- 17.
- Somebody --s somebody with something
- 18.
- Somebody --s somebody of something
- 19.
- Somebody --s something on somebody
- 20.
- Somebody --s somebody PP
- 21.
- Somebody --s something PP
- 22.
- Somebody --s PP
- 23.
- Somebody's (body part) --s
- 24.
- Somebody --s somebody to INFINITIVE
- 25.
- Somebody --s somebody INFINITIVE
- 26.
- Somebody --s that CLAUSE
- 27.
- Somebody --s to somebody
- 28.
- Somebody --s to INFINITIVE
- 29.
- Somebody --s whether INFINITIVE
- 30.
- Somebody --s somebody into V-ing something
- 31.
- Somebody --s something with something
- 32.
- Somebody --s INFINITIVE
- 33.
- Somebody --s VERB-ing
- 34.
- It --s that CLAUSE
- 35.
- Something --s INFINITIVE
The distinction between human (Somebody) and non-human (Something) fillers of the
frame-slots represents a shallow type of selection restriction.
The data in WordNet1.5. is stored in two separate files for each part
of speech. The data file contains all the information for the synsets,
where a file-offset position identifies the synset in the file. In the
next example, the synset for entity is given:
00002403 03 n 01 entity 0 013
~ 00002728 n 0000
~ 00003711 n 0000
~ 00004473 n 0000
~ 00009469 n 0000
~ 01958400 n 0000
~ 01959683 n 0000
~ 02985352 n 0000
~ 05650230 n 0000
~ 05650477 n 0000
~ 05763289 n 0000
~ 05763845 n 0000
~ 05764087 n 0000
~ 05764262 n 0000
| something having concrete existence; living or nonliving
The first line in this example starts with the file-offset number,
which uniquely identifies a synset within a part-of-speech file. It is
followed by a reference to the global semantic cluster (03 =
noun.animal), the part-of-speech, the size of the synset, a verbal
synset name, a sense number of the verbal synset name, and the number
of relations. On the next lines the related synsets are given where
the symbol indicates the type of relation, which is followed by a
file-offset identifying the target synset, its part-of-speech and a
number code for relations holding between synset members only. The
final line contains the gloss. Verbal synsets may have an additional
number for the verb-frames attached to it. A separate index-file then
contains a list of lemmas with references to the synsets in which they
occur:
abacus n 2 1 @ 2 02038006 02037873
abandoned\_infant n 0 1 @ 1 06098838
abandoned\_person n 0 2 @ ~ 1 05912614
abandoned\_ship n 0 1 @ 1 02038160
abandonment n 0 2 @ ~ 3 00116137 00027945 00051049
Here, each word lemma is followed by a part-of-speech code, the
polysemy rate (0 if only 1 sense), a number indicating the number of
different relations a word has, a list of the relation types (@ ) and
a number indicating the number of synsets. Finally, the actual synsets
are listed as file-off-set positions.
EuroWordNet
As indicated in Table 3.8, the size of the wordnets in EuroWordNet will be (as it is still in development) between
15,000-30,000 synsets and 30,000-50,000 word senses per language. The
vocabulary is limited to general language but some subvocabulary is
included for demonstration purposes. The information is limited to
nouns and verbs. Adjectives and adverbs are only included in so far
they are related to the nouns and verbs. Since the wordnets are still
under development we cannot complete quantitative data.
Table 3.8:
Numbers and figures for EuroWordNet
|
All PoS |
Number of Entries |
20.000 |
Number of Senses |
50.000 |
Senses/Entry |
2.5 |
Morpho-Syntax |
no |
Synsets |
|
- Number of Synsets |
30.000 |
- Synonyms/Synset |
1.7 |
Sense Indicators |
|
- Indicator Types |
|
Semantic Network |
yes |
- Relation Types |
46 |
- Number of Tops |
1 |
Semantic Features |
yes |
- Feature Types |
63 |
- Feature Tokens |
1024 |
Multilingual Relations |
yes |
- Relation Types |
17 |
Argument Structure |
yes |
- Semantic Roles |
yes |
- Role Types |
8 |
Semantic Frames |
no |
Selection Restrictions |
no |
Domain Labels |
yes |
Register Labels |
yes |
|
The data in EuroWordNet is divided into separate modules:
- The Language Modules
- The Language-independent Modules
The Top Ontology
The Domain Ontology
The Inter-Lingual-Index
The Language Modules
The following information is then stored for each synset
in the language-specific wordnets (the Language Modules):
- Part of Speech
- Noun, Verb, Adjective or Adverb
- Synset
- Set of synonymous word meanings (synset members)
- Language-internal relations
- to one or more target synsets
- Language-external relations
- to one or more ILI-records
Each of the synset-members represents a word sense for which further
information can be specified:
- Usage Labels
- Register
- Style
- Dialect
- Region
- Frequency
- Morpho-syntactic information
- Definition
- Gloss
Most of this information for the synset-members or variants is optional.
The most basic semantic relations, such as synonymy, hyponymy and
meronymy, have been taken over from WordNet1.5. Some relations have
been added to capture less-clear cases of synonymy, to be able to
relate equivalences across parts-of-speech (so-called XPOS-
relations), to deal with meronymy-relations between events (SUBEVENT),
and to express role-relations between nouns and verbs (ROLE/INVOLVED
relations):
Paradigmatic relations in EuroWordNet
- Synonymy
- Synset-membership
- Near_synonym, e.g. machine, apparatus, tool, instrument
- XPOS_near_synonym
- adorn V XPOS_NEAR_SYNONYM adornment N
- Hyponymy
- Has_Hyperonym/ Has_Hyponym
- Has_XPOS_hyperonym/ Has_XPOS_hyponym
- arrivo HAS_XPOS_HYPERONYM andare
- andare HAS_XPOS_HYPONYM arrivo
- Antonymy
- Antonym
- Near_antonym
- XPOS_near_antonym,
- dead XPOS_near_antonym live
Syntagmatic relations in EuroWordNet
- Role/Involved-relations
- Role/Involved
- Role_Agent/Involved_Agent
- watch-dog ROLE_AGENT to guard
- Role_Patient/ Involved_Patient
- to teach INVOLVED_PATIENT learner
- Role_Instrument/ Involved_Instrument
- hammer ROLE_INSTRUMENT to hammer
- Role_Location/ Involved_Location
- school ROLE_LOCATION to teach
- Role_Direction/ Involved_Direction
- Role_Source_Direction/ Involved_Source_Direction
- to emigrate INVOLVED_SOURCE_DIRECTION one's country
- Role_Target_Direction/ Involved_Target_Direction
- rincasare(to go back home) INVOLVED_TARGET_DIRECTION casa
(home)
- Be_In_State/ State_of
- 'the poor' are 'poor' (noun @ adjective)
- Cause/Caused_by
- 'to kill' causes 'to die'
- Meronymy
- Has_Meronym/ Has_Holonym
- Has_Mero_Part/ Has_Holo_Part
- a whole and its constituent parts (e.g., hand' - finger)
- Has_Mero_Member/ Has_Holo_Member
- a set and its members (e.g., fleet - ship)
- Has_Mero_Portion/ Has_Holo_Portion
- a whole and a portion of it (e.g., metal - ingot)
- Has_Mero_Madeof/ Has_Holo_Madeof
- a thing and the substance it is made-of (e.g., book - paper).
- Has_Holo_Location/ Has_Holo_Location
- a place and location included within it (e.g., desert - oasis)
- Has_Subevent/ Is_Subevent_of
- 'to buy' has subevent 'to pay'
The syntagmatic relations in the above list can be seen as specification of a potential semantic
context for a word, where especially the role-relations may coincide with grammatical
contexts as well. As is the case for WordNet1.5, multiple hyperonyms are also allowed in
EuroWordNet,just as the other relations (multiple meronyms, holonyms, causes, subevents, etc.).
Furthermore, relations can be augmented with specific features to
differentiate the precise semantic implication expressed:
- Conjunction or disjunction of multiple relations of the same type
- airplane
- HAS_MERO_PART door conjunctive
- HAS_MERO_PART engine conjunctive
- door
- HAS_HOLO_PART car disjunctive
- HAS_HOLO_PART room disjunctive
- HAS_HOLO_PART airplane disjunctive
- Factivity of causal relations
- kill CAUSES die factive
- search CAUSES find non-factive
- Reverseness of relations
- paper-clip HAS_MERO_MADEOF metal
- metal HAS_HOLO_MADEOF paper-clip reversed
- Negation of implications expressed by relations
- monkey HAS_MERO_PART tail
- ape HAS_MERO_PART tail not
The equivalence relations are used to link the language-specific synset to
the Inter-Lingual-Index or ILI. The relations parallel the language-internal relations:
- EQ_Synonym
- EQ_Near_Synonym
- HAS_EQ_Hyperonym and HAS_EQ_Hyponym
- HAS_EQ_Holonym and HAS_EQ_Meronym
- EQ_Involved and EQ_Role
- EQ_Causes and EQ_Is_Caused_By
- EQ_HAS_Subevent and EQ_IS_Subevent_Of
- EQ_Be_In_State and EQ_Is_State_Of
Eq_synonym is the most important relation to encode direct
equivalences. However, when there is no direct equivalence the synset
is linked to the most informative and closest concept using one of the
complex equivalence relations. Eq_near_synonym is used when a single
synset links to multiple but very similar senses of the same target
word (this may be the result of inconsistent sense-differentiation across
resources). Has_eq_hyperonym and has_eq_hyponym are typically used
for gaps, when the closest target synsets are too narrow or too
broad. The other relations are only used when the closest target
concept cannot be related by one of the previous relations.
Below is an example of the Dutch synset (aanraking; beroering: touch as a Noun) in the EuroWordNet database import
format:
0 WORD_MEANING
1 PART_OF_SPEECH "n"
1 VARIANTS
2 LITERAL "aanraking"
3 SENSE 1
3 DEFINITION "het aanraken"
4 FEATURE "Register"
5 FEATURE_VALUE "Usual"
3 EXTERNAL_INFO
4 CORPUS_ID 1
5 FREQUENCY 1026
4 SOURCE_ID 1
5 NUMBER_KEY 1336
2 LITERAL "beroering"
3 SENSE 2
4 FEATURE "Date"
5 FEATURE_VALUE "Old-fashioned"
3 EXTERNAL_INFO
4 CORPUS_ID 1
5 FREQUENCY 238
4 SOURCE_ID 1
5 NUMBER_KEY 401472
1 INTERNAL_LINKS
2 RELATION "XPOS_NEAR_SYNONYM"
3 TARGET_CONCEPT
4 PART_OF_SPEECH "v"
4 LITERAL "aanraken"
5 SENSE 1
3 SOURCE_ID 1001
2 RELATION "HAS_HYPERONYM"
3 TARGET_CONCEPT
4 PART_OF_SPEECH "n"
4 LITERAL "beweging"
5 SENSE 1
3 SOURCE_ID 1001
2 RELATION "CAUSES"
3 TARGET_CONCEPT
4 PART_OF_SPEECH "n"
4 LITERAL "contact"
5 SENSE 1
3 SOURCE_ID 1001
2 RELATION "XPOS_NEAR_SYNONYM"
3 TARGET_CONCEPT
4 PART_OF_SPEECH "v"
4 LITERAL "raken"
5 SENSE 2
3 SOURCE_ID 1001
1 EQ_LINKS
2 EQ_RELATION "EQ_SYNONYM"
3 TARGET_ILI
4 PART_OF_SPEECH "n"
4 FILE_OFFSET 69655
3 SOURCE_ID 1002
The Inter-Lingual-Index
The ILI is not internally structured: no lexical semantic relations
are expressed between the ILI-records. In this respect it should not
be seen as a language-neutral ontology but only as a linking-index
between wordnets3.1. The Inter-Lingual-Index is thus basically a
list of ILI-records, with the only purpose to provide a matching
across wordnets. In addition, it also provides access to the
language-neutral modules by listing the Top Concepts and Domain labels
that may apply to it.
Simple ILI-records contain the following information fields:
- variants: word forms, sense number
- part-of-speech
- gloss
- 1 or more relations to Top Concepts
- 1 or more relations to Domains
- 1 or more relations to synsets in the language-specific concepts
Most information is optional. The Top Concepts and Domains linked to an ILI-record can be transferred to the synsets in the local wordnets that are linked to the same ILI-record, as is illustrated in the next schema:
ES wordnet language-specific-word-meaning
|___
___> eq_synonym-> ILI-record -has_top_concept-> Top Concept
|
IT wordnet: language-specific-word-meaning
In addition to the Simple ILI-records, there are Complex ILI-records
which group closely related meanings. These groupings are based on
systematic polysemy relations between meanings, such as specialization
of more general meanings, metonymy (§2.7, 3.10.2) and
diathesis alternations (§2.6.2). Complex ILI-records are
needed to provide a better linking between the wordnets. Inconsistent
sense-differentiation across resources often makes it very difficult
to find exact equivalences across the resources. By linking different
meaning realizations (e.g. university as a building and as
the institute) to same complex ILI-records it is still possible
to find the closely related meanings.
Below is an example of a complex ILI-record in which specific meanings of car are grouped by
a new generalized meaning:
0 ILI_RECORD
1 PART_OF_SPEECH "n"
1 NEW_ILI_ID 1234
1 GLOSS "a 4-wheeled vehicle"
1 VARIANTS
2 LITERAL "car"
3 SENSE 2
2 LITERAL "automobile"
3 SENSE 1
1 EQ_RELATION "eq_generalization"
2 ILI_RECORD
3 FILE_OFFSET 54321
2 ILI_RECORD
3 NEW_ILI_ID 9876
Here, eq_generalization expresses the relation that holds with
two more specific ILI-records, identified by FILE_OFFSET and
NEW_ILI_ID respectively. The former indicates that it originates from
WordNet1.5, the latter that it has been added as a new concept in
EuroWordNet. These sense-groupings apply cross-linguistically,
although the lexicalization of these meanings can differ from language
to language.
Top Ontology
The EuroWordNet top ontology contains 63 concepts. It is developed to classify a set of so-
called Base Concepts extracted from the Dutch, English, Spanish and Italian wordnets that are
being developed. These Base Concepts have most relations and occupy high positions in the
separate wordnets, as such making up the core of the semantic networks. The Base Concepts
are specified in terms of WordNet1.5 synsets in the ILI.
The top-ontology incorporates the top-levels of WordNet1.5, ontologies
developed in EC-projects Acquilex (BRA 3030, 7315) and Sift
(LE-62030)[Vos96], Qualia-structure [Pus95a], Aktions-Art
distinctions [Ven67], [Ver72], [Ver89], [Pus91b],and
entity orders [Lyo77]. Furthermore, the ontology has been adapted
to group the Base Concepts into coherent semantic clusters. The
ontology combines notions described in §2.2, 2.7,
and 2.5. Important characteristics of the ontology are:
- semantic distinctions applying to situations cut across the
parts of speech: i.e. they apply to both nouns, verbs and
adjectives. This is necessary because words from different parts of
speech can be related in the language-specific wordnets via a
xpos_synonymy relation, and the ILI-records can be related to any
part-of-speech.
- the Top Concepts are hierarchically ordered by means of a
subsumption relation but there can only be one super-type linked to
each Top Concept: multiple inheritance between Top Concepts is not
allowed.
- in addition to the subsumption relation Top Concepts can have an
opposition-relation to indicate that certain distinctions are
disjunct, whereas others may overlap.
- there may be multiple relations from ILI-records to Top
Concepts. This means that the Base Concepts can be cross-classified in
terms of multiple Top Concepts (as long as these have no
opposition-relation between them): i.e. multiple inheritance from Top
Concept to Base Concept is allowed.
The Top Concepts are more like semantic features than like common
conceptual classes. We typically find Top Concepts for Living and for
Part but we do not find a Top Concept Bodypart, even though this may
be more appealing to a non-expert. BCs representing body parts are now
cross-classified by two feature-like Top Concepts Living and Part. The
main reason for this is that a more flexible system of features is
needed to deal with the diversity of the Base Concepts.
The top-concepts are structured according to the hierarchy shown in
Fig 3.2.
Figure 3.2:
Hierachy of Top Concepts in EuroWordNet
|
Following [Lyo77], the first level the ontology is
differentiated into 1stOrderEntity, 2ndOrderEntity,
3rdOrderEntity. According to Lyons, 1stOrderEntities are publicly
observable individual persons, animals and more or less discrete
physical objects and physical substances. They can be located at any
point in time and in, what is at least psychologically, a
three-dimensional space. The 2ndOrderEntities are events, processes,
states-of-affairs or situations which can be located in time. Whereas
1stOrderEntities exist in time and space 2ndOrderEntities occur or
take place, rather than exist. The 3rdOrderEntities are propositions,
such as ideas, thoughts, theories, hypotheses, that exist outside
space and time and which are unobservable. They function as objects of
propositional attitudes, and they cannot be said to occur or be
located either in space or time. Furthermore, they can be predicated
as true or false rather than real, they can be asserted or denied,
remembered or forgotten, they may be reasons but not causes.
- List of Top Ontology concepts in EuroWordNet with definitions
- Top
- all
- 1stOrderEntity
- Any concrete entity (publicly) perceivable by the
senses and located at any point in time, in a three-dimensional space.
- 2ndOrderEntity
- Any Static Situation (property, relation) or Dynamic
Situation, which cannot be grasped, heard, seen, felt as an independent
physical thing. They can be located in time and occur or take place rather
than exist; e.g. continue, occur, apply.
- 3rdOrderEntity
- An unobservable proposition which exists independently
of time and space. They can be true or false rather than real. They can be
asserted or denied, remembered or forgotten, e.g. idea, thought,
information, theory, plan.
- Origin
- Considering the way concrete entities are created or
come into existence.
- Natural
- Anything produced by nature and physical forces as opposed to
artifacts.
- Living
- Anything living and dying including objects, organic
parts or tissue, bodily fluids; e.g. cells; skin; hair, organism, organs.
- Human
- e.g. person, someone.
- Creature
- Imaginary creatures; e.g. god, Faust, E.T..
- Animal
- e.g. animal, dog.
- Plant
- e.g. plant, rice.
- Artifact
- Anything manufactured by people as opposed to natural.
- Form
- Considering the shape of concrete entities, fixed as
an object or a-morf as a substance
- Substance
- all stuff without boundary or fixed shape, considered
from a conceptual point of view not from a linguistic point of view; e.g.
mass, material, water, sand, air.
- Solid
- Substance which can fall, does not feel wet and you
cannot inhale it; e.g. stone, dust, plastic, ice, metal
- Liquid
- Substance which can fall, feels wet and can flow on
the ground; e.g. water, soup, rain.
- Gas
- Substance which cannot fall, you can inhale it and it
floats above the ground; e.g. air, ozone.
- Object
- Any conceptually-countable concrete entity with an
outer limit; e.g. book, car, person, brick.
- Composition
- Considering the composition of concrete entities in
terms of parts, groups and larger constructs
- Part
- Any concrete entity which is contained in an object,
substance or a group; head, juice, nose, limb, blood, finger, wheel, brick,
door.
- Group
- Any concrete entity consisting of multiple discrete
objects (either homogeneous or heterogeneous sets), typically people,
animals, vehicles; e.g. traffic, people, army, herd, fleet.
- Function
- Considering the purpose, role or main activity of a
concrete entity. Typically it can be used for nouns that can refer to any
substance, object which is involved in a certain way in some event or
process; e.g. remains, product, threat.
- Vehicle
- e.g. car, ship, boat.
- Software
- e.g. computer programs and databases.
- Representation
- Any concrete entity used for conveying a message; e.g.
traffic sign, word, money.
- Place
- Concrete entities functioning as the location for
something else; e.g. place, spot, centre, North, South.
- Occupation
- e.g. doctor, researcher, journalist, manager.
- Instrument
- e.g. tool, machine, weapon
- Garment
- e.g. jacket, trousers, shawl
- Furniture
- e.g. table, chair, lamp.
- Covering
- e.g. skin, cloth, shield.
- Container
- e.g. bag, tube, box.
- Comestible
- food and drinks, including substances, liquids and
objects.
- Building
- e.g. house, hotel, church, office.
- MoneyRepresentation
- Physical Representations of value, or money; e.g.
share, coin.
- LanguageRepresentation
- Physical Representations conveyed in language (spoken, written or sign language); e.g.
text, word, utterance, sentence, poem.
- ImageRepresentation
- Physical Representations conveyed in a visual medium; e.g.
sign language, traffic sign, light signal.
- SituationType
- Considering the predicate-inherent Aktionsart
properties of Situations: dynamicity and boundedness in time. Subclasses
are disjoint, every 2ndOrderEntity has only 1 SituationType.
- Static
- Situations (properties, relations and states) in which
there is no transition from one eventuality or situation to another: non-
dynamic; e.g. state, property, be.
- Relation
- Static Situation which applies to a pair of concrete
entities or abstract Situations, and which cannot exist by itself without
either one of the involved entities; e.g. relation, kinship, distance,
space.
- Property
- Static Situation which applies to a single concrete
entity or abstract Situation; e.g. colour, speed, age, length, size, shape,
weight.
- Dynamic
- Situations implying either a specific transition from
a state to another (Bounded in time) or a continuous transition perceived
as an ongoing temporally unbounded process; e.g. event, act, action,
become, happen, take place, process, habit, change, activity.
- UnboundedEvent
- Dynamic Situations occurring during a period of time
and composed of a sequence of (micro-)changes of state, which are not
perceived as relevant for characterizing the Situation as a whole; e.g.
grow, change, move around, live, breath, activity, hobby, sport, education,
work, performance, fight, love, caring, management.
- BoundedEvent
- Dynamic Situations in which a specific transition from
one Situation to another is implied; Bounded in time and directed to a
result; e.g. to do, to cause to change, to make, to create.
- SituationComponent
- Considering the conceptual components that play a role in Situations. Situations can be cross-classified by any number of Situation Components
- Cause
- Situations involving causation of Situations (both
Static and Dynamic); result, effect, cause, prevent.
- Stimulating
- Situations in which something elicits or arouses a
perception or provides the motivation for some event, e.g. sounds, such as song,
bang, beep, rattle, snore, views, such as smell, appetizing, motivation.
- Phenomenal
- Situations that occur in nature controlled or
uncontrolled or considered as a force; e.g. weather, chance.
- Agentive
- Situations in which a controlling agent causes a
dynamic change; e.g. to kill, to do; to act.
- Usage
- Situations in which something (an instrument, substance, time, effort, force, money)
is or can be used;
e.g. to use, to spent, to represent, to mean, to be about, to operate, to fly, drive, run, eat , drink, consume.
- Time
- Situations in which duration or time plays a
significant role; Static e.g. yesterday, day, pass, long, period, Dynamic e.g.
begin, end, last, continue.
- Social
- Situations related to society and social interaction
of people: Static e.g. employment, poor, rich, Dynamic e.g. work,
management, recreation, religion, science.
- Quantity
- Situations involving quantity and measure ; Static
e.g. weight, heaviness, lightness; changes of the quantity of first order
entities; Dynamic e.g. to lessen, increase, decrease.
- Purpose
- Situations which are intended to have some effect.
- Possession
- Situations involving possession; Static e.g. have,
possess, possession, contain, consist of, own; Dynamic changes in
possession, often to be combined which changes in location as well; e.g.
sell, buy, give, donate, steal, take, receive, send.
- Physical
- Situations involving perceptual and measurable
properties of first order entities; either Static e.g. health, a colour, a
shape, a smell; or Dynamic changes and perceptions of the physical
properties of first order entities; e.g. redden, thicken, widen, enlarge,
crush, form, shape, fold, wrap, thicken, to see, hear, notice, smell.
- Modal
- Situations (only Static) involving the possibility or
likelihood of other situations as actual situations; e.g. abilities, power,
force, strength.
- Mental
- Situations experienced in mind, including emotional and attitudinal situations; a mental state is changed; e.g.
invent, remember, learn, think, consider.
- Manner
- Situations in which the way or manner plays a role.
This may be Manner incorporated in a dynamic situation, e.g. ways of
movement such as walk, swim, fly, or the static Property itself: e.g.
manner, sloppy, strongly, way.
- Location
- Situations involving spatial relations; static e.g.
level, distance, separation, course, track, way, path; something changes
location, irrespective of the causation of the change; e.g. move, put,
fall, drop, drag, glide, fill, pour, empty, take out, enter.
- Experience
- Situations which involve an experiencer: either mental
or perceptual through the senses.
- Existence
- Situations involving the existence of objects and
substances; Static states of existence e.g. exist, be, be alive, life,
live, death; Dynamic changes in existence; e.g. kill, produce, make,
create, destroy, die, birth.
- Condition
- Situations involving an evaluative state of something:
Static, e.g. health, disease, success or Dynamic e.g. worsen, improve.
- Communication
- Situations involving communication, either Static,
e.g. be about or Dynamic (Bounded and Unbounded); e.g. speak, tell, listen,
command, order, ask, state, statement, conversation, call.
The Top Concepts have been applied to the 1024 Base Concepts,
distributed as shown in Table 3.9.
Table 3.9:
Distribution of Base Concepts in EuroWordNet
|
Nouns |
Verb |
Total |
1stOrderEntities |
491 |
|
491 |
2ndOrderEntities |
272 |
228 |
500 |
3rdOrderEntities |
33 |
|
33 |
Total |
796 |
228 |
1024 |
|
As suggested above the Base Concepts are typically classified in terms
of several top concepts.
The ontology thus should be seen as a partial lattice in which distinctions can be combined.
In total 450 clusters of features have been used to classify 1014 Base Concepts.
Below are examples of top concept conjunctions for the Base Concepts (note that some classifications may
seem odd because they apply to rather specific senses of words):
- 1stOrderEntities
- Container+Part+Solid+Living
- blood vessel; passage; tube; vas; vein
- Place+Part+Solid
- face; field; layer; parcel; space
- Place+Part+Liquid+Natural
- Furniture+Object+Artifact
- article of furniture; chair; seat; table
- 2ndOrderEntities
- Experience + Stimulating + Dynamic+Condition (undifferentiated for Mental or
Physical)
- Verbs: cause to feel unwell; cause pain
- Physical + Experience + SituationType (undifferentiated for Static/Dynamic)
- Verbs: look; feel; experience;
- Nouns: sense; sensation; perception;
- Mental + (BoundedEvent) Dynamic + Agentive
- Verbs: identify; form an opinion of; form a resolution about; decide;
choose; understand; call back; ascertain; bump into; affirm; admit defeat
- Nouns: choice, selection
- Mental + Dynamic + Agentive
- Verbs: interpret; differentiate; devise; determine; cerebrate; analyze;
arrange
- Nouns: higher cognitive process; cerebration; categorization; basic
cognitive process; argumentation; abstract thought
- Mental + Experience + SituationType (undifferentiated for Static/Dynamic)
- Verbs: consider; desire; believe; experience
- Nouns: pleasance; motivation; humor; feeling; faith; emotion;
disturbance; disposition; desire; attitude
- Relation+Physical+Location
- Verbs: go; be; stay in one place; adjoin
- Nouns: path;course; aim; blank space; degree; direction; spatial
relation; elbow room; course; direction; distance; spacing; spatial property; space
- 3rdOrderEntities
- theory; idea; structure; evidence; procedure; doctrine; policy; data point;
content; plan of action; concept; plan; communication; knowledge base; cognitive
content; know-how; category; information; abstract; info;
Comparison with Other Lexical Databases
WordNet1.5 is fundamentally different from a traditional dictionary
because the semantic information is mainly stored for synsets rather
than for words or word senses. Synsets are considered as conceptual
units, and the lexical index table gives a mapping of the words in a
language on these units. In this respect, WordNet1.5 is also rather
different from many NLP lexicons, which often use traditional
sense-based units for storing semantic information. WordNet1.5 can
best be characterized as somewhere in between a semantic network and a
conceptual ontology. The synsets are conceptual units rather than
lexical semantic units. The relations are better seen as semantic
inferencing schemes than as lexicalization patterns. However, compared
to conceptual ontologies such as CYC the content is shallow.
It further differs from formalized NLP lexicons
and feature-based resources in that the network is completely
relational: i.e. no relations are expressed to semantic values or
features outside the synset system (although the references to the
lexicographer's files, representing the global semantic clusters, can
be seen as a form of shallow feature encoding).
Obviously, EuroWordNet shows much resemblance with WordNet1.5. The
main differences are the multilinguality, the top concepts and domain
ontology. A more fundamental difference between WordNet1.5 and
EuroWordNet is that the former includes many non-lexicalized and
artificial classes in the hierarchy, whereas the wordnets in
EuroWordNet are restricted to lexicalized units (words and
expressions) of a language.
Compared to traditional bilingual dictionaries (§3.11) there are
also some differences. The equivalence relations in EuroWordNet are
encoded at the synset level rather than the word sense level, as is
done in traditional bilingual dictionaries. This means that the
equivalence relations abstract from stylistic, pragmatic and minor
morpho-syntactic differences. Another difference is that the kind of
equivalence relation is explicitly coded in EuroWordNet, which is often not
clear in bilingual dictionaries. Furthermore, EuroWordNet combines
monolingual with multilingual information, which is very useful from a
translation or language-learning perspective.
EuroWordNet is different from AI-ontologies such as CYC or
Sensus/Pangloss (§3.7) in that its focus is on the
linguistically-motivated relations rather than the semantic inference
schemes only. In this respect, it provides information on the exact
semantic relation between the lexicalized words and expressions of
languages, (this may still be useful for making inferences
as well). Nevertheless, the design of the database makes it
possible to relate the wordnets to other ontologies, which focus on
the cognitive implications only. Such linking is facilitated by the
EuroWordNet Top Ontology.
Relations to Notions of Lexical Semantics
WordNet1.5 is a relational and taxonomic semantic model. It
incorporates information on lexicalizations patterns, semantic
components and conceptual inferences.
Like WordNet1.5, EuroWordNet represents a taxonomic, psychological
model of meaning as a semantic network (§2.7). Even stronger
than WordNet1.5, it expresses the lexicalization patterns and the
important semantic components of languages (§2.5, 2.6)
and the mapping of meaning to cognitive concepts as described in
Generative Models (§2.7). Although EuroWordNet does not
contain specifications of the argument structures of verbs, it does
include a system for encoding conceptual dependencies between concrete
nouns and abstract nouns and verbs, in the form of semantic
roles. When combined with syntactic frames, these roles could be used
to derive richly annotated argument structures as described in
§2.4. Finally, the EuroWordNet Top Ontology provides a basic
classification which incorporates formal notions of lexical aspect
(§2.2) and semantic components (§2.6) for verbs and
higher-order-nouns and qualia-roles as defined in Generative models of
the lexicon (§2.7).
LE Uses
With LDOCE, WordNet1.5 is one of the most widely used lexical
resources. This is mainly due to its availability and its large
coverage. Because only limited semantic information is given, the
usage is limited to more shallow processing such as
information-retrieval (Ric95, Sme95). Furthermore, WordNet1.5 is used
for tasks in NLP, such as similarity measurement and semantic tagging
(Kur94, LiA95, Fuj97,
Comparison with Other Lexical Databases
WordNet1.5 is fundamentally different from a traditional dictionary
because the semantic information is mainly stored for synsets rather
than for words or word senses. Synsets are considered as conceptual
units, and the lexical index table gives a mapping of the words in a
language on these units. In this respect, WordNet1.5 is also rather
different from many NLP lexicons, which often use traditional
sense-based units for storing semantic information. WordNet1.5 can
best be characterized as somewhere in between a semantic network and a
conceptual ontology. The synsets are conceptual units rather than
lexical semantic units. The relations are better seen as semantic
inferencing schemes than as lexicalization patterns. However, compared
to conceptual ontologies such as CYC the content is shallow.
It further differs from formalized NLP lexicons
and feature-based resources in that the network is completely
relational: i.e. no relations are expressed to semantic values or
features outside the synset system (although the references to the
lexicographer's files, representing the global semantic clusters, can
be seen as a form of shallow feature encoding).
Obviously, EuroWordNet shows much resemblance with WordNet1.5. The
main differences are the multilinguality, the top concepts and domain
ontology. A more fundamental difference between WordNet1.5 and
EuroWordNet is that the former includes many non-lexicalized and
artificial classes in the hierarchy, whereas the wordnets in
EuroWordNet are restricted to lexicalized units (words and
expressions) of a language.
Compared to traditional bilingual dictionaries (§3.11) there are
also some differences. The equivalence relations in EuroWordNet are
encoded at the synset level rather than the word sense level, as is
done in traditional bilingual dictionaries. This means that the
equivalence relations abstract from stylistic, pragmatic and minor
morpho-syntactic differences. Another difference is that the kind of
equivalence relation is explicitly coded in EuroWordNet, which is often not
clear in bilingual dictionaries. Furthermore, EuroWordNet combines
monolingual with multilingual information, which is very useful from a
translation or language-learning perspective.
EuroWordNet is different from AI-ontologies such as CYC or
Sensus/Pangloss (§3.7) in that its focus is on the
linguistically-motivated relations rather than the semantic inference
schemes only. In this respect, it provides information on the exact
semantic relation between the lexicalized words and expressions of
languages, (this may still be useful for making inferences
as well). Nevertheless, the design of the database makes it
possible to relate the wordnets to other ontologies, which focus on
the cognitive implications only. Such linking is facilitated by the
EuroWordNet Top Ontology.
Relations to Notions of Lexical Semantics
WordNet1.5 is a relational and taxonomic semantic model. It
incorporates information on lexicalizations patterns, semantic
components and conceptual inferences.
Like WordNet1.5, EuroWordNet represents a taxonomic, psychological
model of meaning as a semantic network (§2.7). Even stronger
than WordNet1.5, it expresses the lexicalization patterns and the
important semantic components of languages (§2.5, 2.6)
and the mapping of meaning to cognitive concepts as described in
Generative Models (§2.7). Although EuroWordNet does not
contain specifications of the argument structures of verbs, it does
include a system for encoding conceptual dependencies between concrete
nouns and abstract nouns and verbs, in the form of semantic
roles. When combined with syntactic frames, these roles could be used
to derive richly annotated argument structures as described in
§2.4. Finally, the EuroWordNet Top Ontology provides a basic
classification which incorporates formal notions of lexical aspect
(§2.2) and semantic components (§2.6) for verbs and
higher-order-nouns and qualia-roles as defined in Generative models of
the lexicon (§2.7).
LE Uses
With LDOCE, WordNet1.5 is one of the most widely used lexical
resources. This is mainly due to its availability and its large
coverage. Because only limited semantic information is given, the
usage is limited to more shallow processing such as
information-retrieval (Ric95, Sme95). Furthermore, WordNet1.5 is used
for tasks in NLP, such as similarity measurement and semantic tagging
(Kur94, LiA95, Fuj97, Res95, San97, Agi96), or automatic acquisition
and information extraction (McC97, Gri92, Rib95, Cha97). More
elaborate usage requires a further encoding of information or linking
with other resources with for example syntacic information.
As EuroWordNet is still in development, there are no actual uses on
which we can report. Within the project, the resource will be applied
to cross-language text-retrieval, experiments on this usage are
reported in [Gil97]. In addition, we foresee that it will be a very
useful tool for language generation tasks or authoring tools, for
machine-translation tools, language-learning tools and for
summarizers.
Next: Resources from MemoData
Up: Lexical Semantic Resources
Previous: GLDB - The Göteborg
EAGLES Central Secretariat eagles@ilc.cnr.it