Language corpora can comprise only written, only spoken, or both written and spoken data.
Spoken data can be collected for a variety of R&D tasks. We distinguish here between two different approaches:
The document on spoken texts (EAGLES, 1996f) points out in detail the differences and similarities between the two approaches.
In EAGLES, speech corpora are dealt with by the Spoken Language Systems Working Group.
The EAGLES Text Corpora Working Group aims at providing guidelines for encoding both written and spoken data: usually, in fact, large language corpora comprise both types of data.
However, recognising the fact that spoken data increasingly represent a point of convergence of the interests of both the NLP and speech communities, the provision of guidelines for encoding of spoken data has been assigned to a subgroup for spoken corpora, formed by members of the two Working Groups (TCWG and SLWG).