The maximal, syntactically independent, segments into which a text is subdivided, for parsing purposes, are normally considered to be sentences. In a written text, they are typically (though by no means invariably) delimited by an initial capital letter and a final full stop (`.') or other terminal punctuation. It is convenient to accept this primary orthographic definition of `sentence' for the purposes of syntactic annotation. However, a sentence, so defined, may be either a full sentence:
(9) | [S This is a sentence. S] |
(10) | [S Well done. S] |
(11) | [S [S ``Well done'', S] she said. S] } |
In transcriptions of spoken discourse, there is no simple answer to the question ``What is a sentence?''. Some transcriptions, based on standard orthography, yield de facto sentences in the form of units beginning with a capital letter and closing with a terminal punctuation mark. For these, there is no problem in recognising the primary sentential segments and delimiting them by [S ... S], even though these segments frequently lack the canonical structure of a complete written sentence. Moreover, even in other transcriptions, where the standard orthographic practices of sentence delimitation are avoided, it is possible to identify `primary segments' analogous to the written sentence, viz. the primary units into which the transcribed discourse is divided for parsing purposes. For spoken as well as written language, then, the [S] unit may be retained, although it may be interpreted differently, and some other term, such as `primary segment', may be preferred to `sentence'.
We conclude by recommending, for the syntactic annotation of any text (including a transcription of spoken language), an exhaustive division of the text into units labelled [S ... S].