next up previous contents
Next: The corpus linguistics community Up: Introduction Previous: Scope of the guidelines

Transcription and representation needs in different research communities

 

Symbolic representations of speech are needed by at least two different scientific communities: on the one hand, the corpus linguistics community, whose main aim is the description of the spoken language from a language-oriented point of view -- typical domains of use of spoken corpora within this community are discourse analysis, conversation analysis, sociolinguistics, dialectology, psycholinguistics, child language acquisition, speech pathology, second language acquisition, descriptive linguistics based on large amounts of data or corpus-based lexicography; on the other hand, the speech community comprising those who are interested in the basic processes underlying speech production and perception from a phonetically-oriented point of view and those who are more concerned with the application of this knowledge to speech technology.

Although these two communities have traditionally used different material and have viewed spoken corpora as very different objects due to the different aims of their research, a gradual convergence is taking place such that the same body of data can be fruitfully used in corpus linguistics and in speech work. However, this process is largely subject to the provision that data are collected, transcribed, encoded and annotated taking into account a minimal set of standard requirements. Guidelines concerned with these common standards are then needed, and they should also provide guidance about how to fulfill the more specific requirements of a particular area of research.