The work on the GLDB started in 1977 by professor Sture Allén and his research group at Språkdata (today the department of Swedish), Göteborg University, Sweden. The underlying linguistic model is the lemma-lexeme model where, in short, the lemma stands for the canonical form of a word and all its formal data, whilst the lexeme stands for the semantic division of a lemma into one or more senses. The GLDB has the advantage of covering the `whole' language and is not just a testbase comprising a small subset. Two major printed Swedish monolingual dictionaries, [Sve86] and [Nat95] have been generated from the GLDB. Both are also available on CD-rom.
The numbers and figures in Table 3.3 describe a subset of the GLDB, named NEO, that has been utilized in the printed Swedish monolingual dictionaries, [Sve86] and [Nat95].
The information in the database is centered around two main information blocks, the lemma and the lexeme. The lemma comprises formal data: technical stem, spelling variation, part of speech, inflection(s), pronunciation(s), stress, morpheme division, compound boundary and element, abbreviated form, verbal nouns (for verbs). The lexemes are in their turn divided into two main categories, a compulsory kernel sense and a non-compulsory set of one or more sub-senses, called the cycles. Both categories comprise the variables, definition, definition extension, formal (mainly grammatical) comment, main comment, references, morphological examples and syntactical examples. The kernels are in addition marked with terminological domain(s), and the cycles have additional information about area of usage and type of sub-sense.
The Aristotelian type of definition with genus proximum and differentia specifica focusing relevant semantic concepts of the kernel sense is the main source of semantic information. A great deal of the definitions are extended with additional semantic information of a non-kernel character which has a separate place in the database, the definition extension. For instance, selection restrictions of external and/or internal arguments of verbs and adjectives are often specified here. (The information on selection restrictions is not strictly formalized and therefore not included in the above table.)
Semantic relations like hyperonomy, cohyponomy, hyponomy, synonymy and semantic opposition are linked for a substantial number of the kernels and cycles
There are 95 types of terminological domains in the GLDB:NEO database that are
linked to kernel senses. Their types and frequency are listed in
Tables 3.4-3.5.
|
|
The GLDB is a sense-oriented full scale lexical database. The explicit information on the semantic relations such as hyperonymy, synonymy, cohyponymy and semantic opposition are comparable to Wordnet1.5 but, in addition it has traditional information.
The GLDB is a valuable resource for building ontologies and semantic networks based on the lexical system of the Swedish language. The rich semantic content of GLDB is instrumental for such tasks, e.g. the explicit information on the terminological domains and semantic relations such as hyperonymy, synonymy, cohyponymy and semantic opposition as well as the retrievable information on the genus proximum and selection restictions in definitions and their extensions.