Next: Recommendations for the orthographic
Up: Spoken Texts
Previous: Recommendations for data acquisition
In section 2.3 transcription and representation practices for spoken texts are
reviewed, paying special attention to the NERC and TEI proposals. A survey of
events represented and encoded in spoken texts (2.3.1) shows that an important number
of phenomena can be of interest to different types of research. However, it seems necessary to
consider a minimal set of events to be encoded according to the TEI-compliant Corpus Encoding Standard (CES)
proposed for EAGLES (Ide, 1996). The present document will only be concerned with the events
themselves, and the encoding of the International Phonetic Alphabet, of the transcription, and of the
linguistic annotation of speech will be presented as part of CES. Proposals for the encoding
of spoken texts within the TEI initiative can also be found in Johansson (1995a, b).
As a starting point, it should be noted that there are important differences between
the transcription of read text - when the original written source is available - and the
transcription of spontaneous speech.
These differences are reviewed in detail in the EAGLES Handbook on
Spoken Language Systems (EAGLES Spoken Language Working Group, 1995) and can be summarized in the
following points:
- The planning process of spontaneous speech is reflected in several types of disfluencies which
do not normally occur in read speech, increasing the difficulty of the transcription process and
the complexity of the representation. In section 2.3 most of the usually transcribed events
related to this fact are presented.
- The criteria to define utterances are not clear in spontaneous speech, neither in monologues
nor in conversations.
- In the case of dialogues, interruptions and overlappings still add more complexity to the
representation.
Similar problems in the transcription of speech are mentioned by Johansson (1995b), who still
adds one more dimension, i.e., the fact that since speech is generally addressed to a
limited audience in a private setting, an adequate knowledge of the context and the
situation is needed for a correct understanding.
Despite the difficulties involved in the transcription of unprepared speech, it should be
possible to define a minimal common set of events to be encoded in the transcription of different
types of spoken texts.
In section 2.4 the structural elements considered in the TEI Guidelines have been defined;
they are listed again here for the reader's convenience:
- Utterance
- Pause
- Vocal
- Kinesic
- Event (non-vocalised, non-communicative)
- Writing
- Shift
The EAGLES Handbook on Spoken Language Systems (EAGLES Spoken Language Working Group, 1995) considers
a set
of non-linguistic phenomena that should be annotated when transcribing a speech corpus:
- Omissions in read text
- Verbal deletions and corrections
- Word fragments
- Unintelligible words
- Hesitations and filled pauses
- Non-speech acoustic events
- Produced by the speaker
- Produced by other speakers or environmental noises
- Simultaneous speech
- Speaking turns
A comparison between these recommendations shows that there are elements which are common to both
proposals, and therefore, they could possibly be part of the minimal set of elements to be encoded.
These elements are the following:
- Vocal semi-lexical events
- Included in this category are filled or voiced pauses and hesitations. As will be proposed
in the next section, it is convenient to keep a list of standardized spellings for these phenomena,
using, when possible, the conventional orthographic forms which appear in reference dictionaries for a given
language.
- Vocal non-lexical events
- This category includes burps, clicks, smacks, coughs, giggles, laughs, sneezes, sobs, yawns,
heavy breathing and all the non-speech acoustic events produced by the speaker.
The number of these events can be variable, and a description of the event is used in the annotation.
- Non-vocalised non-communicative events
- This includes all the extraneous noises produced by other speakers or those wich result
from the recording environment such as doors slaming, telephone ringing, etc. The annotation is, as
in the previous category, a written description of the event.
Note that the first two categories correspond to those subsumed under the tag <vocal> in the
TEI, while the third corresponds to <event>.
The transcription of spoken interactions where more than one speaker is involved also requires
the consideration of the following elements:
- Speaker identity
- In the TEI encoding this information is
indicated in the header within the `profile description'
<profileDesc> element, which has a `participant' <partics> sub-element containing a
series of elements `person' <person>. Among the attributes of <person> there is
one - named `id' - coding the identity of the speaker. Within the text, each utterance can have
an attribute `who' with the value corresponding to the
identity of the speaker coded in the `id' attribute (Sperberg-McQueen - Burnard (Eds.), 1994;
Johansson, 1995a). Other simplified forms of encoding can be found, but, in any case, this is
a necessary element in the transcription of spoken interactions.
- Speaking turns, indicating a change of speaker
- Changes of speaker can be
coded in the TEI by means of changes in the value of the `who'
attribute, and appear to be the basis for the definition of utterances. Independently of the
mechanisms that can be used, this is an essential information in the transcription of
conversations.
- Simultaneous speech or overlapping
- Proposals for marking this phenomenon are found in the TEI (see 2.4) as part of
the strategies for encoding simultaneous events. Although other ways of representing speech
overlapping can be found, again this is an important element in the transcriptions of the
type of spoken material discussed here.
A third group of elements to be transcribed is related to the performance of the speaker.
The convenience to include them in transcriptions is discussed in the EAGLES Handbook
on Spoken Language Systems, where three different types of phenomena are identified:
- Omissions in read text
- Where a written script exists, it might be recommended that the words or segments omitted by
the reader should be marked in the transcription as such.
- Self-repairs
- In spontaneous speech, the planning process is sometimes evidenced by the presence of
self-repair phenomena used by the speaker to
correct speech production errors `on-line' (see Cutler
(Ed.), 1982 and Fromkin (Ed.), 1973, 1980 for a psycholinguistic approach to the topic). They
might be explicitly indicated by the speaker
(using, for example, forms such `I mean')
or they might be implicit; in other cases they might involve restarts or repetitions.
Also in read speech it is possible to find
corrections of errors detected by the reader himself in the course of the reading. Such phenomena
should not be omitted in a transcription.
- Word fragments
- Word fragments are one or more sounds belonging to a word which is not fully pronounced by the
speaker at a first attempt and are then repeated when the speaker succeeds in producing the
complete word. In some systems they are marked by a hyphen (e.g. `fli-flights), while in
others a star is used (e.g. `fli* flights). It seems also adequate to indicate these hesitations
in the transcription.
Moreover, the encoding of spoken texts should contain a documentation of the difficulties
encountered during the transcription process.
The NERC proposals mention `guessed' and
`unintellible fragments', while the SpeechDat conventions include a notational device for
partially or totally unintelligible words. It seems also adequate to provide means for
the notation of the uncertainties of the transcriber:
- Unintelligible fragments
- Fragments, words or part of words which are not intelligible to the transcriber
should be indicated. A distinction between `guessed' or 'uncertain' and 'unintelligible'
can be made if necessary.
Finally, the encoding of utterances - defined as a strecht of speech usually preceded and followed by a pause
or by a change of speaker - should be considered. We have already recommended the marking of
changes of the speaker, and in section 5.2.2 devoted to prosody it is also proposed that
pauses should be part of the elements to be encoded. This implies that utterances are
necessarily encoded, since they are related to these elements.
An important point which has to be considered is the usability of the TEI recommendations from the
point of view of the transcriber. Sinclair (1995) and Chafe (1995) discuss this issue, which is also mentioned
by the EAGLES Spoken Language Working Group. As a general rule, a balance between the advantages offered by the
TEI, the aims of the corpus and the demands imposed on the transcriber should be sought. The
distinction put forward by Sinclair (1995:107) between conformity and compatibility
with TEI is useful in clarifying the debate. In fact, the need to develop conversion software between
a user-friendly system of transcription and the TEI encoding scheme was one of the recommendations
arising from the EAGLES Workshop on `Issues in Corpus Work' organized by the Text Corpora Working
Group in Madrid in January 1996.
Next: Recommendations for the orthographic
Up: Spoken Texts
Previous: Recommendations for data acquisition