The maximal, syntactically independent, segments into which a text is
subdivided, for parsing purposes, are normally considered to be
sentences. In a written text, they are typically (though by no means
invariably) delimited by an initial capital letter and a final full stop (`.')
or other terminal punctuation. It is convenient to accept this primary
orthographic definition of `sentence' for the purposes of syntactic
annotation. However, a sentence, so defined, may be either a full
sentence:
(9) | [S This is a sentence. S] |
(10) | [S Well done. S] |
(11) | [S [S ``Well done'', S] she said. S] } |
In transcriptions of spoken discourse, there is no simple answer to the question ``What is a sentence?''. Some transcriptions, based on standard orthography, yield de facto sentences in the form of units beginning with a capital letter and closing with a terminal punctuation mark. For these, there is no problem in recognising the primary sentential segments and delimiting them by [S ... S], even though these segments frequently lack the canonical structure of a complete written sentence. Moreover, even in other transcriptions, where the standard orthographic practices of sentence delimitation are avoided, it is possible to identify `primary segments' analogous to the written sentence, viz. the primary units into which the transcribed discourse is divided for parsing purposes. For spoken as well as written language, then, the [S] unit may be retained, although it may be interpreted differently, and some other term, such as `primary segment', may be preferred to `sentence'.
We conclude by recommending, for the syntactic annotation of any
text (including a transcription of spoken language), an
exhaustive division of the text into units labelled [S ... S].