next up previous contents
Next: Wordnets Up: Lexical Semantic Resources Previous: The Longman Dictionary and

Subsections


   
GLDB - The Göteborg Lexical DataBase

  
Introduction

The work on the GLDB started in 1977 by professor Sture Allén and his research group at Språkdata (today the department of Swedish), Göteborg University, Sweden. The underlying linguistic model is the lemma-lexeme model where, in short, the lemma stands for the canonical form of a word and all its formal data, whilst the lexeme stands for the semantic division of a lemma into one or more senses. The GLDB has the advantage of covering the `whole' language and is not just a testbase comprising a small subset. Two major printed Swedish monolingual dictionaries, [Sve86] and [Nat95] have been generated from the GLDB. Both are also available on CD-rom.

The numbers and figures in Table 3.3 describe a subset of the GLDB, named NEO, that has been utilized in the printed Swedish monolingual dictionaries, [Sve86] and [Nat95].


 
Table 3.3: Numbers and figures for GLDB:NEO; (*antonyms, hyponyms, hyperonyms and cohyponyms)
  All PoS Nouns Verbs Adjectives Adverbs Other
Number of Entries 61050 41810 7641 9296 1169 1134
Number of Senses 67785 45446 9752 10184 1323 1080
Senses/Entry 1.11 1.09 1.28 1.10 1.13 0.95
Morpho-Syntax     Yes Yes    
Synonyms Yes          
- Number of Synonyms,...* 34201 19633 5895 7280 821 572
Sense Indicators No          
Semantic Network No          
Semantic Features No          
Multilingual Relations NO          
Argument Structure 19082 7406 9739 1929 1 7
Domain Labels Yes          
- Domain Types 95          
- Domain Tokens 85971 58297 8102 16823 756 2173
- Domains/Sense 1.27 1.28 0.83 1.65 0.43 2.01
 

   
Description

The information in the database is centered around two main information blocks, the lemma and the lexeme. The lemma comprises formal data: technical stem, spelling variation, part of speech, inflection(s), pronunciation(s), stress, morpheme division, compound boundary and element, abbreviated form, verbal nouns (for verbs). The lexemes are in their turn divided into two main categories, a compulsory kernel sense and a non-compulsory set of one or more sub-senses, called the cycles. Both categories comprise the variables, definition, definition extension, formal (mainly grammatical) comment, main comment, references, morphological examples and syntactical examples. The kernels are in addition marked with terminological domain(s), and the cycles have additional information about area of usage and type of sub-sense.

The Aristotelian type of definition with genus proximum and differentia specifica focusing relevant semantic concepts of the kernel sense is the main source of semantic information. A great deal of the definitions are extended with additional semantic information of a non-kernel character which has a separate place in the database, the definition extension. For instance, selection restrictions of external and/or internal arguments of verbs and adjectives are often specified here. (The information on selection restrictions is not strictly formalized and therefore not included in the above table.)

Semantic relations like hyperonomy, cohyponomy, hyponomy, synonymy and semantic opposition are linked for a substantial number of the kernels and cycles

There are 95 types of terminological domains in the GLDB:NEO database that are linked to kernel senses. Their types and frequency are listed in Tables 3.4-3.5.

 
Table 3.4: Terminological domains in GLDB:NEO (part 1)
Freq Code Domain Type
8418 admin. administration
1064 anat. anatomy
781 arb. work
76 arkeol. archaeology
478 arkit. architecture
13 astrol. astrology
307 astron. astronomy
44 bergsvet. science of minining
343 biol. biology
1033 bok. the world of books
9 bokf|r. bookkeeping
1803 bot. botany
278 byggn.tekn. building
135 dans. dancing
166 databehandl dataprocessing, computers
42 dipl. diplomati
1911 ekon. ekonomi
221 eltekn. electrotechnology
375 fil. philosophy
134 film. film, cinematography
88 flygtekn. aviation, aeronautics
195 form. designing
188 foto. photography
Freq Code Domain Type
772 fys. physics
536 fysiol physiology
263 frg. colour terms
818 geogr. geography
344 geol. geology
11 geom. geometry
247 handarb. needlework,embroidery
513 handel. commerce
478 heminr. interior decoration
36 herald. heraldry
126 historia. history
835 hush. housekeeping
358 hyg. hygiene
275 instr. instrument
353 jakt. hunting
846 jordbr. agriculture
1221 jur. law
608 kem. chemistry
1194 kld. clothes
2548 kokk. cookery
4859 komm. communication
685 konstvet. art
14 lantmt. surveying
555 litt.vet. literature
 
 


 
Table 3.5: Terminological domains in GLDB:NEO (part 2)
Freq Code Domain Type
238 maskin. mechanical engineering
573 mat. mathematics
279 matrl. material
2635 med. medicine
80 metallurg. metallurgy
513 meteorol. meteorology
1739 mil. military
160 mineral. mineralogy
387 m-med. mass media
1160 mus. music
241 mtt measure
130 numism. numismatics
183 optik. optics
998 pedag. pedagogy
734 pol. politics
5654 psykol. psychology
248 radiotekn. radio engineering
1702 relig. religion
1292 rum. room, space
316 sag. the world of fairy-tale
3968 samh. society, community
706 scen. dramatic art
290 serv. service
1592 sj|. navigation, shipping
Freq Code Domain Type
248 skogsbr. forestry
511 slkt. kinship, family
498 sociol. sociology
742 spel. game, play
1765 sport. sport
1281 sprkvet. linguistics
77 statist. statistics
1741 tekn. technology
5 teol. theology
258 textil. textiles
2656 tid. time
1431 trafik. traffic
18 tryck.tekn. printing technique
95 trdg. gardening
608 utstr. extension
456 verkt. tools
475 vetenskapl. science
41 veter. veterinary
5745 yrk. proffesion, people
2417 zool. zoology
445 mne. matter, substance
147 allm. kult. culture
182 allm. vrd. valuation
1562 land. countries, ethnic groups
 

  
Comparison with Other Lexical Databases

The GLDB is a sense-oriented full scale lexical database. The explicit information on the semantic relations such as hyperonymy, synonymy, cohyponymy and semantic opposition are comparable to Wordnet1.5 but, in addition it has traditional information.

  
Relation to Notions of Lexical Semantics

The GLDB is a valuable resource for building ontologies and semantic networks based on the lexical system of the Swedish language. The rich semantic content of GLDB is instrumental for such tasks, e.g. the explicit information on the terminological domains and semantic relations such as hyperonymy, synonymy, cohyponymy and semantic opposition as well as the retrievable information on the genus proximum and selection restictions in definitions and their extensions.



next up previous contents
Next: Wordnets Up: Lexical Semantic Resources Previous: The Longman Dictionary and
EAGLES Central Secretariat eagles@ilc.cnr.it