A comparable corpus is one which selects similar texts in more than one language or variety. There is as yet no agreement on the nature of the similarity, because there are very few examples of comparable corpora. One of the clearest is ICE --- the International Corpus of English =1
, ; ( Greenbaum 19??) Corpora of around one million words in each of many varieties of English around the world are being assembled following the same model, which prescribes genres and the target quantity of words to be gathered in each. Originally, the corpora were all to be gathered in the same year.
The possibilities of a comparable corpus are to compare different languages or varieties in similar circumstances of communication, but avoiding the inevitable distortion introduced by the translations of a parallel corpus.
Note: `Multilingual corpora'
At present there are no multilingual corpora apart from parallel and
comparable corpora; there are plenty of centres that have collected
text material in several languages, and some of these collections are corpora
in their own right. But unless the collections share common features of
selection, at least at the level of the comparable corpus, then they are just
text resources in different languages. It therefore seems unhelpful to use
the term `multilingual corpus'.