This section gives a short survey of the second phase of syntactic annotation of the UPenn Treebank Project, as a summary of the publications Marcus et al. (1993) and Marcus et al. (1994).
The first phase of the UPenn Treebank project (from November 1989 to December 1992) produced 4.5 million words of text, all tagged for part of speech and 2/3 skeletally bracketted. The text was automatically tagged and parsed (using the Fidditch partial parser), and then corrected by hand. The syntactic analysis used a modified form of the Lancaster Treebank approach, using a context-free structure.
The main goals of the second phase, which began in 1993, were the following:
The following issues were to be addressed in phase II of the project:
Example: A predicate is always either
(S (NP-SBJ your safety belt) (VP is-PRD your friend))
(S (NP-SBJ I) (VP consider (S (NP-SBJ Kris) (NP-PRD a fool))))
(SBARQ (WHNP-1 What) (SQ is (NP-SBJ Tim) (VP eating (NP *T*-1))) ?)
Some examples are shown below of the use of null elements:
(S (NP-SBJ-1 Chris (VP wants (S (NP-SBJ *-1) (VP to (VP throw (NP the ball)))))))
(S (NP-SBJ Ford (VP persuaded (NP-1 Zaphod (S (NP-SBJ *-1) (VP to (VP run (PP-CLR for (NP president)))))))))
(S (NP-SBJ-1 Zaphod (VP promised (NP Ford (S (NP-SBJ *-1) (VP to (VP run (PP-CLR for (NP president)))))))))
(S (NP-SBJ-3 Everyone) (VP seems (S (NP-SBJ *-3) (VP to (VP dislike (NP Drew Barrymore))))))
The context free mechanism of Phase I led to the trapping problem when a sentential level adverb is followed by verb complements.
With context free notation, one can do one of the following, with their respective consequences:
(S (NP-SBJ Chris) (VP knew (SBAR yesterday that (S (NP-SBJ Terry) (VP would (VP catch (NP the ball)))))))
(S (NP-SBJ Chris) (VP knew) yesterday (SBAR that (S (NP-SBJ Terry) (VP would (VP catch (NP the ball))))))
In UPenn II, discontinuous elements are called pseudoattached:
(S (NP-SBJ Chris) (VP knew (SBAR *ICH*-1) (NP-TMP yesterday) (SBAR-1 that (S (NP-SBJ Terry) (VP would (VP catch (NP the ball)))))))
Even given context, the ambiguity cannot be resolved for human annotators.
(S (NP-SBJ I (VP saw (NP (NP the man) (PP *PPA*-1) (PP-CLR-1 with (NP the telescope))))))
The same constituent appears to have been shifted out of both conjuncts.
(S But (NP-SBJ-2 our outlook) (VP (VP has (VP been (ADJP *RNR*-1))) , and (VP continues (S (NP-SBJ *-2) (VP to (VP be (ADJP *RNR*-1))))) , (ADJP-1 defensive)))
(S (NP-SBJ (NP It) (S *EXP*-1)) (VP is (NP a pleasure)) (S-1 (NP-SBJ *) (VP to (VP teach (NP her))))) pleasure(teach(*someone*, her))
As it is very difficult to determine a set of underlying semantic roles, the UPenn II Project restricted itself to the clearly distinguishable semantic roles listed below. The given list mostly relates to adjuncts.
(S (NP-SBJ the process) (VP will (VP take (NP (QP as many as six) months) (S-CLR (NP-SUBJ *) (VP to (VP complete))))))
(VP associate (NP snow) (PP-CLR with (NP winter)))
(VP taking (NP-CLR care) (PP-CLR of (NP the problem)))
Examples are given below for the tags -DTV, -PRD, -TPC, -CLF and -PRP:
(S (NP-SBJ Aristotle) (VP gave (NP the book) (PP-DTV to (NP Plato)))) (S (NP-SBJ Aristotle) (VP gave (NP Plato) (NP the book)))
Non-VP predicates
(SQ Was (NP-SBJ he) (ADVP-TMP ever) (ADJP-PRD successful) ?) (SINV and (ADVP-PRD-TPC-1 so) (VP did (ADVP-PRD *T*-1)) (NP-SBJ the hippopotamuses))
(S (PP-TPC-12 Of (NP (NP the 500 barbers) (PP-LOC in (NP Philadelphia)))) , (NP-SBJ (NP (QP only 10)) (PP *T*-12)) (VP know (SBAR (WHNP-13 what) (S (NP-SBJ they) (VP are (VP doing (NP *T*-13)))))))
(S-CLF (PP-TMP In (NP the past)) , (NP-SBJ it) (VP has (VP been (NP-PRD-2 the husband) (SBAR (WHNP-1 who) (S (NP-SBJ *T*-1) (VP has (VP been ADJP-PRD-3 dominant)))))))
(S (NP-SBJ-1 (NP activity) (PP-LOC at (NP (NP a number) (PP of (NP brokerage houses))))) (VP was (VP curtailed (NP *-1) (PP-PRP as (NP (NP a result) (PP of (NP the earthquake))))))
(S (S (NP-SBJ-1 Mary) (VP likes (NP-2 Bach))) and (S (NP-SBJ=1 Susan) , (NP=2 Beethoven))) like(Mary, Bach) like(Susan, Beethoven)
(S (S (NP-SBJ I) (VP eat (NP-1 breakfast (PP-TMP-2 in (NP the morning)))) and (S (NP=1 lunch) (PP-TMP=2 in (NP the afternoon)))))
However, there is no recovery of structure outside the single sentence concerned.
Who threw the ball? (FRAG (NP Chris) , (NP-TMP yesterday)) What is Tim eating? (FRAG (NP-SBJ Mary Ann) (VP thinks (SBAR 0 (FRAG (NP chocolate)))))