Next: Component Technologies
Up: Areas of Application
Previous: Text Summarization
Subsections
Natural Language Generation
Introduction
The problem of automatic production of natural language texts becomes
more and more salient with the constantly increasing demand for
production of technical documents in multiple languages; intelligent
help and tutoring systems which are sensitive to the user's knowledge;
and hypertext which adapts according to the user's goals, interests
and prior knowledge, as well as to the presentation context. This
section will outline the problems, stages and knowledge resources in
natural language generation.
Survey
Natural Language Generation (NLG) systems produce language output
(ranging from a single sentence to an entire document) from
computer-accessible data usually encoded in a knowledge or data base.
Often the input to a generator is a high-level communicative goal to
be achieved by the system (which acts as a speaker or writer). During
the generation process, this high-level goal is refined into more
concrete goals which give rise to the generated utterance.
Consequently, language generation can be regarded as a goal-driven
process which aims at adequate communication with the reader/hearer,
rather than as a process aimed entirely at the production of
linguistically well-formed output.
In order to structure the generation task, most existing systems
divide it into the following stages, which are often organised in a
pipeline architecture:
- Content Determination and Text Planning:
- This stage involves
decisions regarding the information which should be conveyed to the
user (content determination) and the way this information should be
rhetorically structured (text planning). Many systems perform these
tasks simultaneously because often rhetorical goals determine what
is relevant. Most text planners have hierarchically-organised plans
and apply decomposition in a top-down fashion following AI planning
techniques. However, some planning approaches (e.g., schemas
[McK85], Hovy's structurer [Hov90]) rely on previously
selected content - an assumption which has proved to be inadequate
for some tasks (e.g., a flexible explanation facility
[Par91,Moo90])
- Surface realisation
- : Involves
generation of the individual sentences in a
grammatically correct manner, e.g., agreement, reflexives, morphology.
However, it is worth mentioning that there is no agreement in the
NLG community on the exact problems addressed in each one of these
steps and they vary between the different approaches and systems.
In order to make these complex choices, language generators need various
knowledge resources:
- discourse history - information about what has been presented so far.
For instance,
if a system maintains a list of previous explanations, then it can use this
information
to avoid repetitions, refer to already presented facts or draw parallels.
- domain knowledge - taxonomy and knowledge of the domain
to which the content of the generated utterance pertains.
- user model - specification of the user's domain knowledge, plans,
goals,
beliefs, and interests.
- grammar - a grammar of the target language which is used
to generate linguistically correct utterances. Some of the grammars
which have been used successfully in various NLG systems are:
(i) unification grammars--Functional Unification Grammar
[McK85], Functional Unification Formalism [McK90]; ( ii) Phrase Structure Grammars--Referent Grammar (GPSG with
built-in referents) [Sig91], Augmented Phrase Structure Grammar
[Sow84]; (iii) systemic grammar [Man83]; (iv)
Tree-Adjoining Grammar [Jos87,Nik95]; (v) Generalised
Augmented Transition Network Grammar [Sha82].
- lexicon - a lexicon entry for each word, containing
typical information like part of speech, inflection class, etc.
The formalism used to represent the input semantics also affects the
generator's algorithms and its output. For instance, some surface
realisation components expect a hierarchically structured input, while
others use non-hierarchical representations. The latter solve the more
general task where the message is almost free from any language
commitments and the selection of all syntactically prominent elements
is made both from conceptual and linguistic perspectives. Examples of
different input formalisms are: hierarchy of logical forms
[McK90], functional representation [Sig91], predicate
calculus [McD83], SB-ONE (similar to KL-ONE) [Rei91],
conceptual graphs [Nik95].
Related Areas and Techniques
Machine Translation (§4.1) , Text summarisation (§
4.4).
Next: Component Technologies
Up: Areas of Application
Previous: Text Summarization
EAGLES Central Secretariat eagles@ilc.cnr.it