Next: Wordnets
Up: Lexical Semantic Resources
Previous: The Longman Dictionary and
Subsections
GLDB - The Göteborg Lexical DataBase
Introduction
The work on the GLDB started in 1977 by professor Sture Allén and
his research group at Språkdata (today the department of Swedish),
Göteborg University, Sweden. The underlying linguistic model is
the lemma-lexeme model where, in short, the lemma stands for the
canonical form of a word and all its formal data, whilst the lexeme
stands for the semantic division of a lemma into one or more
senses. The GLDB has the advantage of covering the `whole' language and
is not just a testbase comprising a small subset. Two major printed
Swedish monolingual dictionaries, [Sve86] and [Nat95] have
been generated from the GLDB. Both are also available on CD-rom.
The numbers and figures in Table 3.3 describe a subset of the
GLDB, named NEO, that has been utilized in the printed Swedish
monolingual dictionaries, [Sve86] and [Nat95].
Table 3.3:
Numbers and figures for GLDB:NEO; (*antonyms, hyponyms, hyperonyms and cohyponyms)
| |
All PoS |
Nouns |
Verbs |
Adjectives |
Adverbs |
Other |
| Number of Entries |
61050 |
41810 |
7641 |
9296 |
1169 |
1134 |
| Number of Senses |
67785 |
45446 |
9752 |
10184 |
1323 |
1080 |
| Senses/Entry |
1.11 |
1.09 |
1.28 |
1.10 |
1.13 |
0.95 |
| Morpho-Syntax |
|
|
Yes |
Yes |
|
|
| Synonyms |
Yes |
|
|
|
|
|
| - Number of Synonyms,...* |
34201 |
19633 |
5895 |
7280 |
821 |
572 |
| Sense Indicators |
No |
|
|
|
|
|
| Semantic Network |
No |
|
|
|
|
|
| Semantic Features |
No |
|
|
|
|
|
| Multilingual Relations |
NO |
|
|
|
|
|
| Argument Structure |
19082 |
7406 |
9739 |
1929 |
1 |
7 |
| Domain Labels |
Yes |
|
|
|
|
|
| - Domain Types |
95 |
|
|
|
|
|
| - Domain Tokens |
85971 |
58297 |
8102 |
16823 |
756 |
2173 |
| -
Domains/Sense |
1.27 |
1.28 |
0.83 |
1.65 |
0.43 |
2.01 |
|
Description
The information in the database is centered around two main
information blocks, the lemma and the lexeme. The lemma comprises
formal data: technical stem, spelling variation, part of speech,
inflection(s), pronunciation(s), stress, morpheme division, compound
boundary and element, abbreviated form, verbal nouns (for verbs). The
lexemes are in their turn divided into two main categories, a
compulsory kernel sense and a non-compulsory set of one or more
sub-senses, called the cycles. Both categories comprise the variables,
definition, definition extension, formal (mainly grammatical) comment,
main comment, references, morphological examples and syntactical
examples. The kernels are in addition marked with terminological
domain(s), and the cycles have additional information about area of
usage and type of sub-sense.
The Aristotelian type of definition with genus proximum and
differentia specifica focusing relevant semantic concepts of the
kernel sense is the main source of semantic information. A great deal
of the definitions are extended with additional semantic information
of a non-kernel character which has a separate place in the database,
the definition extension. For instance, selection restrictions of
external and/or internal arguments of verbs and adjectives are often
specified here. (The information on selection restrictions is not
strictly formalized and therefore not included in the above table.)
Semantic relations like hyperonomy, cohyponomy, hyponomy, synonymy and
semantic opposition are linked for a substantial number of the kernels
and cycles
There are 95 types of terminological domains in the GLDB:NEO database that are
linked to kernel senses. Their types and frequency are listed in
Tables 3.4-3.5.
Table 3.4:
Terminological domains in GLDB:NEO (part 1)
| Freq |
Code |
Domain Type |
| 8418 |
admin. |
administration |
| 1064 |
anat. |
anatomy |
| 781 |
arb. |
work |
| 76 |
arkeol. |
archaeology |
| 478 |
arkit. |
architecture |
| 13 |
astrol. |
astrology |
| 307 |
astron. |
astronomy |
| 44 |
bergsvet. |
science of minining |
| 343 |
biol. |
biology |
| 1033 |
bok. |
the world of books |
| 9 |
bokf|r. |
bookkeeping |
| 1803 |
bot. |
botany |
| 278 |
byggn.tekn. |
building |
| 135 |
dans. |
dancing |
| 166 |
databehandl |
dataprocessing, computers |
| 42 |
dipl. |
diplomati |
| 1911 |
ekon. |
ekonomi |
| 221 |
eltekn. |
electrotechnology |
| 375 |
fil. |
philosophy |
| 134 |
film. |
film, cinematography |
| 88 |
flygtekn. |
aviation, aeronautics |
| 195 |
form. |
designing |
| 188 |
foto. |
photography |
|
| Freq |
Code |
Domain Type |
| 772 |
fys. |
physics |
| 536 |
fysiol |
physiology |
| 263 |
frg. |
colour terms |
| 818 |
geogr. |
geography |
| 344 |
geol. |
geology |
| 11 |
geom. |
geometry |
| 247 |
handarb. |
needlework,embroidery |
| 513 |
handel. |
commerce |
| 478 |
heminr. |
interior decoration |
| 36 |
herald. |
heraldry |
| 126 |
historia. |
history |
| 835 |
hush. |
housekeeping |
| 358 |
hyg. |
hygiene |
| 275 |
instr. |
instrument |
| 353 |
jakt. |
hunting |
| 846 |
jordbr. |
agriculture |
| 1221 |
jur. |
law |
| 608 |
kem. |
chemistry |
| 1194 |
kld. |
clothes |
| 2548 |
kokk. |
cookery |
| 4859 |
komm. |
communication |
| 685 |
konstvet. |
art |
| 14 |
lantmt. |
surveying |
| 555 |
litt.vet. |
literature |
|
|
|
Table 3.5:
Terminological domains in GLDB:NEO (part 2)
| Freq |
Code |
Domain Type |
| 238 |
maskin. |
mechanical engineering |
| 573 |
mat. |
mathematics |
| 279 |
matrl. |
material |
| 2635 |
med. |
medicine |
| 80 |
metallurg. |
metallurgy |
| 513 |
meteorol. |
meteorology |
| 1739 |
mil. |
military |
| 160 |
mineral. |
mineralogy |
| 387 |
m-med. |
mass media |
| 1160 |
mus. |
music |
| 241 |
mtt |
measure |
| 130 |
numism. |
numismatics |
| 183 |
optik. |
optics |
| 998 |
pedag. |
pedagogy |
| 734 |
pol. |
politics |
| 5654 |
psykol. |
psychology |
| 248 |
radiotekn. |
radio engineering |
| 1702 |
relig. |
religion |
| 1292 |
rum. |
room, space |
| 316 |
sag. |
the world of fairy-tale |
| 3968 |
samh. |
society, community |
| 706 |
scen. |
dramatic art |
| 290 |
serv. |
service |
| 1592 |
sj|. |
navigation, shipping |
|
| Freq |
Code |
Domain Type |
| 248 |
skogsbr. |
forestry |
| 511 |
slkt. |
kinship, family |
| 498 |
sociol. |
sociology |
| 742 |
spel. |
game, play |
| 1765 |
sport. |
sport |
| 1281 |
sprkvet. |
linguistics |
| 77 |
statist. |
statistics |
| 1741 |
tekn. |
technology |
| 5 |
teol. |
theology |
| 258 |
textil. |
textiles |
| 2656 |
tid. |
time |
| 1431 |
trafik. |
traffic |
| 18 |
tryck.tekn. |
printing technique |
| 95 |
trdg. |
gardening |
| 608 |
utstr. |
extension |
| 456 |
verkt. |
tools |
| 475 |
vetenskapl. |
science |
| 41 |
veter. |
veterinary |
| 5745 |
yrk. |
proffesion, people |
| 2417 |
zool. |
zoology |
| 445 |
mne. |
matter, substance |
| 147 |
allm. kult. |
culture |
| 182 |
allm. vrd. |
valuation |
| 1562 |
land. |
countries, ethnic groups |
|
|
Comparison with Other Lexical Databases
The GLDB is a sense-oriented full scale lexical database. The explicit information on the semantic relations such as hyperonymy, synonymy, cohyponymy and semantic opposition are comparable to Wordnet1.5 but, in addition
it has traditional information.
Relation to Notions of Lexical Semantics
The GLDB is a valuable resource for building ontologies and semantic networks based on the
lexical system of the Swedish language. The rich semantic content of GLDB
is instrumental for such tasks, e.g. the explicit information on the
terminological domains and semantic relations such as hyperonymy, synonymy,
cohyponymy and semantic opposition as well as the retrievable information on
the genus proximum and selection restictions in definitions and their
extensions.
Next: Wordnets
Up: Lexical Semantic Resources
Previous: The Longman Dictionary and
EAGLES Central Secretariat eagles@ilc.cnr.it