next up previous contents
Next: Results Up: Proposals for practical experiments Previous: Overview of phenomena and

Text type evaluation

 

Probabilistic taggers rely basically on similarities between training and application data.

It seems obvious that a tagger which is trained on a ``standard'' corpus will be less accurate on a text with highly irregular sentence structures (for example technical maintenance manuals, etc.gif). The question is, however, how big the structural (syntactic) differences are in the corpora used in ``real-life applications'', and how much these differences really influence the tagger accuracy.

We cannot provide a well defined measurement of text type difference and thus the experiments are based on a subjective choice of texts which we assume are different with respect to their syntactic structure.

The experiment consists in producing several ``specialised'' taggers, i.e. each tagger is trained on one specific text type only. The resulting taggers are then evaluated with test corpora which also correspond to different text types.