next up previous contents
Next: Sublanguages Up: Corpus Typology Previous: Spoken corpus

Samples

 

It has become traditional to use a sampling technique in the assembly of corpora -- typically following the Brown model summarised above. Samples are small, in relation to texts such as newspapers, books and radio programmmes, and of a constant size, hence not qualifying as texts, as has been pointed out above. A distinction is proposed here between a text corpus or a whole text corpus and a samples corpus. We would like to see `whole text' as a default condition, thus classifying samples corpora as one of the categories of special corpora, but there are still many corpora in use made up of small samples. It should, however, be realised that this feature is just a remnant of the early restraints on corpus building and it confers no beneift on the corpus. The use of samples of a constant size gains only a spurious air of scientific method.