Next: Linguistic architecture
Up: An overview of the
Previous: An overview of the
The general dynamics of project GENELEX is to design models
for computational dictionaries of the languages it deals with in a
separate manner (although in close cooperation). Once a sufficient
level of specification has been obtained in these languages (i.e. for
the time being French, Portuguese, Italian, Spanish, and, to a lesser
extent, English), GENELEX will proceed with a search for general
convergence and with a more precise definition of the multilingual
dimension.
The issue involved in the development of GENELEX(GENEric
LEXicon) is the generality of its format, which covers the following
aspects:
- maximum coverage:
to take into account, for a given data entry, the maximum amount
of non-redundant linguistic information. Information that can be
deduced systematically from other linguistic information will not be
recorded. In addition, the information to be recorded must not be
limited by the needs of specific applications.
- maximum portability:
to be able to support various types of implementations, the model
must be a conceptual and not a physical model of data. GENELEX
pursues exactly the same objectives as the Text Encoding Initiative.
The realization of the GENELEX conceptual data model was therefore
conceived independently of physical models, permitting GENELEX
partners to choose different physical models for the
implementation of GENELEX dictionaries.
- minimum discrimination:
there is a need for the model to remain as independent as possible
of a particular linguistic theory. For this reason, the linguistic aspect
has been reduced to accommodate the needs of all users, who may
then, with formal rules of translation and/or deduction, find the
information of interest in a GENELEX dictionary.
- linguistic realism:
GENELEX partners attempted to avoid as much as possible to
establish constraints on the completeness an electronic lexicon must
reach to be declared in conformity with the specification, being
conscious that there is still a long way to go before complete and
generic lexica are available. The objective of GENELEX is thus a
double one:
- allowing convergence of various existing lexical assets into a
common conceptual structure
- enhancing harmonious developments towards generic and
complete lexica by expressing methodological guidelines
The stress put on methodological guidelines leads GENELEX to pay a
great attention to explicitation of the meta-linguistic features of the
model. There are therefore two main types of components in this
model:
- a level for linguistic description, that lists pieces of
information ('features') to be attached to lexical entities and defines
relations between such entities
- a level for explicit and analytic description of the features
themselves, that can be inter-related, structured, organized into
hierarchies, refined by the means of more atomic values, etc.
Nevertheless, GENELEX offers a means of checking the conformity of
a given set of lexical data to the model it recommends. The
conceptual model is expressed:
- through the entity/relationship formalism, despite of its
limits, in order to grant a better communication between partners
and to allow a synthetic graphical representation of the model ;
- through an SGML Document Type Definition (DTD) that may
serve as a formal specification of the model, and as a format for the
interchange of data, which may be checked by an SGML Parser.
Large sets of lexical data produced by GENELEX partners have
already gone through such controls w.r.t. morphological as well as
to syntactical information .
Next: Linguistic architecture
Up: An overview of the
Previous: An overview of the