One of the basic goals of lexical semantic theory is to provide a
specification of word meanings in terms of semantic components and
combinatory relations among them. Different works in lexical semantics
converge now on the ipothesis that the meaning of every lexeme can be
analysed in terms of a set of more general meaning components, some or all
of which are common to groups of lexemes in a language or
cross-linguistically. In other words, meaning components can be identified
which may or may not be lexicalized in particular languages. The
individuation of the meaning components characterising classes of words in
a language and of the possible combinations of such components within word
roots leads to the identification of lexicalization patterns varying across
languages. Moreover there is a strong correlation between each combination
of meaning components and the syntactic constructions allowed by the words
displaying them (e.g., [Tal85]; [Jac83], [Jac90]).
A trend has recently emerged towards addressing the issues of
The basic goals of research on lexicalization of meaning components are:
The main aims of this section are firstly to point out some problematic issues raised in works dealing with the identification and discussion of meaning components; then, to briefly discuss proposals concerned with the identification of lexicalization patterns of semantic components both in a language and cross-linguistically. Furthermore, we shall point out how information on lexicalization is eventually encoded in lexical databases and useful for LE applications.
Representing complex meanings in terms of simpler ones has generally been considered one of the fundamental goals of semantic theory; however different positions have been taken with respect to various aspects of the issue in works devoted to it. In any case, the following hypotheses are shared by the various positions:
It was [Hje61] componential analysis of word meaning which gave rise to various researches of the same type in Europe (e.g., [Gre66]; [Cos67]; [Pot74], etc.). These researches, although different with respect to the specific hypotheses put forward, tried to identify semantic components shared by groups of words by observation of paradigmatic relations between words. The semantic components identified in the various proposals differentiate both a group of words from another and, by combining in various ways, a word from another. Here is a standard example, showing the kind of analysis usually performed:
Componential analysis in America developed independently firstly among anthropologists (e.g., [Lou56]; [Goo56]) who described and compared kinship terminology in various languages. Their research was taken up and generalized by various linguists and in particular by scholars working within the framework of transformational grammar (cf. [Kat63]), who aimed at integrating componential analyses of words with treatments of the syntactic organization of sentences. Generative semanticists (e.g. [McC68], [Lak70]) tried to determine units of meaning, or 'atomic predicates', by means of syntagmatic considerations. Thus, for instance, the components BECOME and CAUSE were identified by analysing pairs of sentences displaying similar syntactic relationships such as the following:
Afterwards, scholars working in different fields of
language research dealt with various issues connected with the
identification/definition of meaning components. Within this survey we do
not intend to report on all the similarities/differences among the various
hypotheses put forward. We shall instead point out problematic aspects
which have been dealt with and which are of interest for our work.
The most important issues raised in the work on semantic components are the following:
These issues have been explicitly or implicitly dealt with in theoretical semantic research, in computational linguistics, in philosophy, and in psycholinguistics. The `strongest' proposal put forward with respect to them is probably that presented by Wierzbicka in a number of works on semantic primitives (cf. [Wie72], [Wie80], [Wie85], [Wie89a], [Wie89b]). The avowed goal of these works is to arrive at a definition of a complete and stable set of semantic primitives, by means of cross-linguistic research on lexical universals. These are concepts which are encoded in the lexica of (nearly or possibly) all natural languages. While lexical universals are not necessarily universal semantic primitives (e.g., a concept such as mother), according to Wierzbicka the converse is true, i.e. all semantic primitives are universal. Decisive for succeeding in identifying such semantic primitives are large scale lexicographic studies. These studies should not rely on research of the most frequent words recurring in the definitions of conventional dictionaries, due to all the limits, incoherences and lack of data which are typically evidenced in these sources. In the various stages of her research, Wierzbicka postulated different sets of primitives. While the first set included only 14 elements, in [Wie89b] a set of twenty-eight universal semantic primitive candidates was proposed:
In general, studies dealing with meaning components treat them as
`primitives', i.e. as units which cannot be further defined. However, the
components proposed as primitives in certain works are not always accepted
as such in others (cf. Jackendoff's discussion of [McC68] proposal of
a primitive ALIVE ([Jac83]).
Sometimes, a strong relation between `primitivity' and `universality' is
not explicitly stated. For instance, [Mel89] conceives semantic
primitives simply as 'elementary lexical meanings of a particular language'
without wondering if they are the same for all the languages. However,
others, and especially scholars working within a Chomskian framework,
assume the universality of semantic primitives, in that they share the
position that the meaning components which are lexicalized in any language
are taken from a finite inventory, the knowledge of which is innate (e.g.
[Jac90]).
The main problem remains, then, to decide which the universal semantic
primitives are; i.e., to (eventually) define a finite and complete set of
them. Indeed, while Wierzbicka proposes a complete and 'stable' (although
not necessarily definitive) set of (pure) primitives, strong hypotheses
like hers have not in general been presented. Rather, analyses of portions
of the lexicon have been proposed: for instance, [Tal76] describes the
various semantic elements combining to express causation; [Tal85]
discusses the semantics of motion expressions; [Jac83], [Jac90]
extends to various semantic fields semantic analyses provided for motion or
location verbs (e.g., verbs of transfer of possession, verbs of touching,
etc.); etc. In any case, by analysis and comparison of different works on
the issue, we cannot circumscribe a shared set of primitives which could
also be seen as `complete'.
Finally, no clear procedure for identification of semantic components has
been so far formalized.
An approach which deliberately seeks to avoid a strong theoretical characterization of semantic components is that chosen by [Cru86], which, for this reason, could be taken as the starting point for the Eagles recommendations on the encoding of semantic components. According to Cruse's 'contextual approach', the meaning of a word can be described as composed of the meanings of other words with which it contracts paradigmatic and syntagmatic relations within the lexicon. These words are called semantic traits of the former word. Thus, for instance, animal can be considered a semantic trait of dog, since it is its hyperonym. Moreover, dog is implied in the meaning of to bark, given that it is the typical subject selected by the verb. Cruse clearly states that his 'semantic traits' are not claimed to be "primitive, functionally discrete, universal, or drawn from a finite inventory; nor is it assumed that the meaning of any word can be exhaustively characterised by any finite set of them ([Cru86], p. 22)". A similarly weakly theoretically characterised approach has been taken by [Dik78]; [Dik80] with his 'stepwise lexical decomposition'. No semantic primitive/universal elements are postulated. Lexical meaning is reduced to a limited set of basic lexical items of the object language, identified by analysing a network of meaning descriptions.
Relying on the basic assumption that it is possible to identify a discrete
set of elements (semantic components) within the domain of meaning and
combinatory relations among them, [Tal85] carried out a study on the
relationships among such semantic components and morphemes/words/phrases in
a sentence/text. In particular, he deeply investigated the regular
associations (lexicalization patterns) among meaning components (or sets of
meaning components) and the verb, providing a cross-linguistic study of
lexicalization patterns connected with the expression of motion. He was
mainly interested in evidencing typologies, i.e. small number of
patterns exhibited by groups of languages, and universals, i.e.
single patterns shared cross-linguistically.
According to Talmy, a motion event may be analysed as related, at least, to five basic semantic elements:
Firstly, Talmy presents three basic lexicalization types for verb roots which are used by different languages in their most characteristic expression of motion:
Talmy provides examples of these patterns of conflation:
Interesting discussion of lexicalization patterns are found in
[Jac83], [Jac90]. His theory of Conceptual Semantics and the
organization of Lexical Conceptual Structure are discussed in detail in
the following section. We shall only briefly recall some points of interest
for our purposes. The main elements of the LCS language are:
conceptual constituents, semantic fields and primitives. Then there are
other elements, like conceptual variables, semantic features, constants,
and lexical functions, which play minor roles.
Each conceptual constituent belongs to one of a small set of ontological
categories such as Thing, Event, State, Action, Place, Path, etc.
Among conceptual primitives the main ones are BE, which represents a state,
and GO, which represents any event. Other primitives include: STAY, CAUSE,
INCH, EXT, etc. A second larger set of primitives describes prepositions:
AT, IN, ON, TOWARD, FROM, TO, etc.
The LCS organization incorporates [Gru67]'s view, according to which the
formalism used for encoding concepts of spatial location and motion can be
abstracted and generalized to many other semantic fields (cf. next
section). Thus, Jackendoff tries to extend semantic analyses provided for motion or location verbs to a wide range of other
semantic fields.
This turns out to require an additional elaboration of his conceptual
system. At the same time, observations are added on the various
correspondences between different lexicalization patterns and syntactic
expressions. An interesting proposal put forward by Jackendoff (developing
a suggestion from [Car88]) concerns a distinction between a
MOVE-function and a GO-function: manner-of-motion verbs which cannot occur
with complements referring to a PATH (more precisely, a bounded path)
should only be linked to a MOVE-function. A rule is then proposed to
account for (typically English) sentences containing manner-of-motion verbs
allowing directional complements: a sentence like Debbie danced into
the room expresses a conceptual structure that includes both a
MOVE-function and a GO-function (indicating change of position). What
differentiates English manner-of-motion verbs from, e.g., Spanish ones is
the possibility of incorporating what Jackendoff calls a GO-Adjunct.
Both Talmy and Jackendoff observed a strict correlation between the
meaning components clustered within a verb root and the verb syntactic
properties. An extensive study on the correlation between verb semantics
and syntax has been provided by [Lev93]. This study shows that verb
semantic classes can be identified, each characterized by particular
syntactic properties (2.6.2).
Within the Acquilex project (3.10.3) work has been carried out
to identify information on lexicalization of meaning components and to
connect such information to the syntactic properties of verbs. MRD
definitions of some classes of verbs (e.g., verbs referring to motion, to
change-of-state-by-cooking, etc.) were analysed in order to link recurrent
patterns to specific meaning components characterizing each class in a
specific language. Furthermore, connections were stated between single
components and syntactic properties displayed by the verbs under analysis
(cf. e.g. [Alo94a]; [Tau94].
Within the EuroWordNet project (3.4.3) relations between words are being encoded which allow data to be gathered on lexicalization. For instance, information on arguments involved in verb meaning is being encoded and compared cross-linguistically (cf. [AloFC]).
The kinds of meaning components 'conflated' within verb roots are strongly
correlated with the syntactic properties of the verbs themselves, i.e. with
the possibility of verbs occurring with certain arguments (e.g.
[Tal85]; [Lev93]; cf. this volume §1.4). Moreover, a clear
identification of the semantic components conflated within verb roots in
individual languages could be relevant also for isolating semantic classes
displaying, or amenable to, similar sense extensions, given that
amenability to yield different interpretations in context appears to be
connected with semantic characteristics which verbs (words) share (cf.
[San94]).
By adopting a strongly 'relational' view of the lexicon, then, we may identify lexicalization patterns by stating paradigmatic/syntagmatic relations between words (cf. work carried out within EuroWordNet). Thus, research on lexicalization is strictly linked to work on lexical relations such as hyponymy, meronymy, etc.
The work carried out within the Acquilex project led the identification of semantic
components lexicalized within the roots of various verb classes. The
information acquired is variously encoded in the language-specific LDBs.
Furthermore, part of this information was encoded within the multilingual
LKB by linking the relevant meaning components to the participant role
types involved by verb meaning. For instance, the subject of the English
verb swim was associated with the participant role type proto-agent-cause-move-manner2.8, indicating that the verb involves self-causing,
undirected motion for which manner is specified (cf. [San92b]).
Much information on lexicalization patterns is being encoded within the EuroWordNet database for substantial portions of the lexica of various languages. Here, information on semantic components lexicalized within word meanings is encoded by means of lexical relations applying between synsets (3.4.2).
Results of research on lexicalization seem necessary for a variety of NLP tasks and applications. Because of