Traditionally sentences are deemed to be composed of major constituents known as clauses, which may be main clauses (not included in a superordinate clause) or embedded clauses (included in a superordinate clause) such as relative clauses or adverbial clauses. A main clause, unless it is coordinated, is equivalent to a simple sentence, and does not need to be separately labelled. Embedded and coordinated clauses, on the other hand, will need to be separately identified. We recommend that such units be identified in the annotation, and labelled either as sentences (S) or as clauses (CL), according to the preference specified in the annotation scheme.
The reason for allowing choice here is that different theoretical preferences have to be accommodated. In some syntactic models, the `clause' category is not used (except informally), embedded clauses being marked by included [S] constituents. In other models, clauses are identified as such, even where they are coextensive with an independent sentence.
One solution which commends itself (and is employed in the Lancaster Treebank and the SUSANNE Corpus) is to retain [S ... S] as the delimiter of sentences, whether included or not, and also to use [S ... S] for the coordinated parts of a compound sentence; but to use `clause' labels for subordinate clauses. An example which illustrates this division is 12:
(12) | [S [S The distinction at issue is relatively clear S] , but [S closer examination reveals [CL that all is not quite so straightforward [CL as it seems CL] CL] S] . S] |
(13) | [S [S So far so good S] , but [S now consider gender in adjectives S] . S] |