Next: Overview of German tagset
Up: Additional suggestions for tagset
Previous: Function Words ``wie''``als''
- Problem:
- There is more foreign material (FM) and foreign names of
entities (more than one item long) in the corpus than expected.
Some of this material forms part of the German syntax (names),
some don't (film titles or translations). Proverbial sayings from
Latin, French or English are inserted and have a function as a
whole, but a single function cannot be given to its parts (because
we are not describing Latin, French or English parts of speech).
Is the length of the inclusion relevant? What does the language
model do with chains of FM?
- All foreign material had to be sorted into a special
class. In analogy zu z.B./ADV: last/ADV but/ADV not/ADV least/ADV
- Tagging practice STTS:
- Names of cities, persons, institutions are
tagged as proper names, where one is sure. Foreign common nouns which are
already lexicalized in German (Yoga, Joghurt, Jeans) are to be tagged as NN.
All other foreign material to be tagged as FM.
- Der beliebte Film `` A/FM fish/FM called/FM
Wanda/NE ''.
- per/FM se/FM ist das kein Problem.
- der berühmte ``dedazo/FM'' (Fingerzeig) funktioniert
noch immer.
- Sie essen heute a/FM la/FM carte/FM
- er war damals schon lange persona/FM non/FM
- er ist primus/FM inter/FM pares/FM
- Der spanische Titel `` Mi/FM querido/FM
Tim/NE Mix/NE ''
- und dann, last/FM but/FM not/FM
least/FM, gibt es ein gutes Mittagessen.
- Tests:
- I
- simple granularity test
- inclusion in Test on NN/NE.