Next: Spoken corpus
Up: Characteristics
Previous: Simplicity
The default value is
documented. This means that, as proposed
in =1 (; NERC1994), full details
about the constituents of a component are kept separately from the component
itself. The model for this is the DTD or header of SGML, and, following
that, TEI. In contrast to the recommendations of those bodies,
corpus users seem to prefer to keep the documentation of texts in a separate
place from the texts themselves, and to include only a minimal header that
contains a reference to the documentation. For the management of corpora this
practice allows the effective separation of plain text from annotation with
only a small amount of programming effort; since DTDs can be extremely
verbose, the efficiency of real time search procedures is hampered if they are
not detachable.