GLDB - The Göteborg Lexical DataBase

The work on the GLDB started in 1977 by professor Sture Allén and his research group at Språkdata (today the department of Swedish), Göteborg University, Sweden. The underlying linguistic model is the lemma-lexeme model where, in short, the lemma stands for the canonical form of a word and all its formal data, whilst the lexeme stands for the semantic division of a lemma into one or more senses. The GLDB has the advantage of covering the `whole' language and is not just a testbase comprising a small subset. Two major printed Swedish monolingual dictionaries, [Sve86] and [Nat95] have been generated from the GLDB. Both are also available on CD-rom.

The numbers and figures in Table 3.3 describe a subset of the GLDB, named NEO, that has been utilized in the printed Swedish monolingual dictionaries, [Sve86] and [Nat95].

Table 3.3: Numbers and figures for GLDB:NEO; (*antonyms, hyponyms, hyperonyms and cohyponyms)

	All PoS	Nouns	Verbs	Adjectives	Adverbs	Other
Number of Entries	61050	41810	7641	9296	1169	1134
Number of Senses	67785	45446	9752	10184	1323	1080
Senses/Entry	1.11	1.09	1.28	1.10	1.13	0.95
Morpho-Syntax			Yes	Yes
Synonyms	Yes
- Number of Synonyms,...*	34201	19633	5895	7280	821	572
Sense Indicators	No
Semantic Network	No
Semantic Features	No
Multilingual Relations	NO
Argument Structure	19082	7406	9739	1929	1	7
Domain Labels	Yes
- Domain Types	95
- Domain Tokens	85971	58297	8102	16823	756	2173
- Domains/Sense	1.27	1.28	0.83	1.65	0.43	2.01

Description

The information in the database is centered around two main information blocks, the lemma and the lexeme. The lemma comprises formal data: technical stem, spelling variation, part of speech, inflection(s), pronunciation(s), stress, morpheme division, compound boundary and element, abbreviated form, verbal nouns (for verbs). The lexemes are in their turn divided into two main categories, a compulsory kernel sense and a non-compulsory set of one or more sub-senses, called the cycles. Both categories comprise the variables, definition, definition extension, formal (mainly grammatical) comment, main comment, references, morphological examples and syntactical examples. The kernels are in addition marked with terminological domain(s), and the cycles have additional information about area of usage and type of sub-sense.

The Aristotelian type of definition with genus proximum and differentia specifica focusing relevant semantic concepts of the kernel sense is the main source of semantic information. A great deal of the definitions are extended with additional semantic information of a non-kernel character which has a separate place in the database, the definition extension. For instance, selection restrictions of external and/or internal arguments of verbs and adjectives are often specified here. (The information on selection restrictions is not strictly formalized and therefore not included in the above table.)

Semantic relations like hyperonomy, cohyponomy, hyponomy, synonymy and semantic opposition are linked for a substantial number of the kernels and cycles

There are 95 types of terminological domains in the GLDB:NEO database that are linked to kernel senses. Their types and frequency are listed in Tables 3.4-3.5.

Table 3.4: Terminological domains in GLDB:NEO (part 1)

Freq	Code	Domain Type
8418	admin.	administration
1064	anat.	anatomy
781	arb.	work
76	arkeol.	archaeology
478	arkit.	architecture
13	astrol.	astrology
307	astron.	astronomy
44	bergsvet.	science of minining
343	biol.	biology
1033	bok.	the world of books
9	bokf\|r.	bookkeeping
1803	bot.	botany
278	byggn.tekn.	building
135	dans.	dancing
166	databehandl	dataprocessing, computers
42	dipl.	diplomati
1911	ekon.	ekonomi
221	eltekn.	electrotechnology
375	fil.	philosophy
134	film.	film, cinematography
88	flygtekn.	aviation, aeronautics
195	form.	designing
188	foto.	photography

Freq	Code	Domain Type
772	fys.	physics
536	fysiol	physiology
263	frg.	colour terms
818	geogr.	geography
344	geol.	geology
11	geom.	geometry
247	handarb.	needlework,embroidery
513	handel.	commerce
478	heminr.	interior decoration
36	herald.	heraldry
126	historia.	history
835	hush.	housekeeping
358	hyg.	hygiene
275	instr.	instrument
353	jakt.	hunting
846	jordbr.	agriculture
1221	jur.	law
608	kem.	chemistry
1194	kld.	clothes
2548	kokk.	cookery
4859	komm.	communication
685	konstvet.	art
14	lantmt.	surveying
555	litt.vet.	literature

Table 3.5: Terminological domains in GLDB:NEO (part 2)

Freq	Code	Domain Type
238	maskin.	mechanical engineering
573	mat.	mathematics
279	matrl.	material
2635	med.	medicine
80	metallurg.	metallurgy
513	meteorol.	meteorology
1739	mil.	military
160	mineral.	mineralogy
387	m-med.	mass media
1160	mus.	music
241	mtt	measure
130	numism.	numismatics
183	optik.	optics
998	pedag.	pedagogy
734	pol.	politics
5654	psykol.	psychology
248	radiotekn.	radio engineering
1702	relig.	religion
1292	rum.	room, space
316	sag.	the world of fairy-tale
3968	samh.	society, community
706	scen.	dramatic art
290	serv.	service
1592	sj\|.	navigation, shipping

Freq	Code	Domain Type
248	skogsbr.	forestry
511	slkt.	kinship, family
498	sociol.	sociology
742	spel.	game, play
1765	sport.	sport
1281	sprkvet.	linguistics
77	statist.	statistics
1741	tekn.	technology
5	teol.	theology
258	textil.	textiles
2656	tid.	time
1431	trafik.	traffic
18	tryck.tekn.	printing technique
95	trdg.	gardening
608	utstr.	extension
456	verkt.	tools
475	vetenskapl.	science
41	veter.	veterinary
5745	yrk.	proffesion, people
2417	zool.	zoology
445	mne.	matter, substance
147	allm. kult.	culture
182	allm. vrd.	valuation
1562	land.	countries, ethnic groups

Comparison with Other Lexical Databases

The GLDB is a sense-oriented full scale lexical database. The explicit information on the semantic relations such as hyperonymy, synonymy, cohyponymy and semantic opposition are comparable to Wordnet1.5 but, in addition it has traditional information.

Relation to Notions of Lexical Semantics

The GLDB is a valuable resource for building ontologies and semantic networks based on the lexical system of the Swedish language. The rich semantic content of GLDB is instrumental for such tasks, e.g. the explicit information on the terminological domains and semantic relations such as hyperonymy, synonymy, cohyponymy and semantic opposition as well as the retrievable information on the genus proximum and selection restictions in definitions and their extensions.

GLDB - The Göteborg Lexical DataBase

Introduction

Description

Comparison with Other Lexical Databases

Relation to Notions of Lexical Semantics