This section will discuss the IBM Paris Treebank, and the various Lancaster schemes. Although many changes have been made throughout the various incarnations of the Lancaster syntactic annotation schemes, for our purposes they may be considered together. The Paris scheme (for French) was intended as a parallel scheme to the Lancaster English annotation scheme, and can therefore also be discussed with the various Lancaster schemes.
Both these schemes use a constituent structure analysis called skeleton parsing, which is a limited form of parsing, that indicates the structure of a sentence in terms of major constituents, i.e. sentence types, predicates, clauses and major phrase types. The constituents are marked using square brackets, which correspond to the phrase structure markers of tree diagrams. The schemes were developed with two main considerations in mind:
The corpora were annotated manually by a team of grammarians, using a screen editor program, which validates the parse only to the extent of checking for unbalanced brackets.
The labelled constituents of the UCREL Skeleton Parsing System are shown in table 3.6:
Unlabelled brackets may also be used in cases where a constituent is not one of the types for which a label is authorised, and the grammarian is convinced that the sequence of words is a constituent.
Most of the abovementioned labels represent categories that are easily identifiable and non-controversial. The tag `Nn' for a metalinguistic constituent was introduced in order to deal with the problem of computerese verbal-name constructions, often found in the IBM manuals being processed, e.g.
the [Nn enter Nn] key
Similar expressions will occur in other genres, and this constituent was therefore included in the general scheme, for example:
a [Nn rent a bike Nn] scheme
The French scheme developed at IBM Paris is almost identical to the above shown scheme, but the following labels were added:
A partially parsed sentence from the American Printing House for the Blind corpus is shown in table 3.7:
In this example, there is a cleft sentence within an adverbial clause, and this is marked by unlabelled brackets. The scheme does not make any allowances for marking any more detail that that which is shown in the surface structure. This is only to be expected with the theory-neutral approach, and intention to be uncontroversial, of the scheme designers.
The following example shows a parsed sentence from the IBM Computer Manual Corpus:
M1154602 v [N Files_NN2 N][V[V& come_VV0 [P into_II [N the_AT print_NN1 queue_NN1 N]P]V&] and_CC [V+ either_LE [V&[V& match_VV0 [N[G a_AT1 printer_NN1 's_$ G] setup_NN1 N]V&] (_( [V+ get_VV0 [Tn printed_VVN Tn]V+] )_) V&] or_CC [V+[V& do_VD0 not_XX match_VVI V&] (_( [V+ wait_VV0 V+] )_) V+] V+]V] ._.In this short sentence, complex verbal coordination is shown, exhibiting
the use of the `&' and `+' signs for coordination.
Below are two example parsed sentences from the French corpus, the second of which illustrates the labelling of coordinated sentences, which follows the Lancaster approach for English:
[N Vous_PPSA5MS N] [V accedez_VINIP5 [P a_PREPA [N cette_DDEMFS session_NCOFS N] P] [Pv a_PREP31 partir_PREP32 de_PREP33 [N la_DARDFS fenetre_NCOFS [A Gestionnaire_AJQFS [P de_PREPD [N taches_NCOFP N] P] A] N] Pv] V] ._.
[Z [Z& [N L' industrie N] [V n'EG a pas reussi [P [Vi etablir [N des prix [A satisfaisants A] N] [Pv pour [N ces grains N] Pv] Vi] P] V] Z&] , mais [Z+ [N le ministre N] [N lui N] , [V a reussi V] Z+] Z] .