Presentation is loading. Please wait.

Presentation is loading. Please wait.

November 2004csa3050: Sentence Parsing II1 CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog.

Similar presentations


Presentation on theme: "November 2004csa3050: Sentence Parsing II1 CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog."— Presentation transcript:

1 November 2004csa3050: Sentence Parsing II1 CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog

2 November 2004csa3050: Sentence Parsing II2 Sources Jurafsky & Martin Chapter 10 Covington Chapter 6

3 November 20043 Derivation top down, left-to- right, depth first

4 November 2004csa3050: Sentence Parsing II4 Bottom Up Filtering We know the current input word must serve as the first word in the derivation of the unexpanded node the parser is currently processing. Therefore the parser should not consider grammar rule for which the current word cannot serve as the "left corner" The left corner is the first preterminal node along the left edge of a derivation.

5 November 2004csa3050: Sentence Parsing II5 Left Corner The node marked Verb is a left corner of VP fl

6 November 2004csa3050: Sentence Parsing II6 Left Corner B is a left corner of A iff A  * Bα for non-terminal A, pre-terminal B and symbol string α. Possible left corners of all non-terminal categories can be determined in advance and placed in a table.

7 November 2004csa3050: Sentence Parsing II7 Left Corner (Operational Definition) A nonterminal B is a left-corner of another nonterminal A if: –A=B (reflexive case); –there exists a rule A → Bα for non-terminal A, pre-terminal B and symbol string α; (immediate case) –there exists a rule C such that A → Cα and B is a left-corner of C (transitive case).

8 November 2004csa3050: Sentence Parsing II8 DCG-style Grammar/Lexicon s--> np, vp. s --> aux, np, vp. s --> vp. np --> det nom. nom --> noun. nom --> noun, nom. nom --> nom, pp pp --> prep, np. np --> pn. vp --> v. vp --> v np What are the left corners of S?

9 November 2004csa3050: Sentence Parsing II9 Example of Left Corner Table CategoryLeft Corners S NP Nominal VP Det, Proper-Noun, Aux, Verb Det, Proper-Noun Noun Verb

10 November 2004csa3050: Sentence Parsing II10 How to use the Left Corner Table If attempting to parse category A, only consider rules A → Bα for which category(current input)  LeftCorners(B) S → NP VP S → Aux NP VP S → VP

11 November 2004csa3050: Sentence Parsing II11 Prolog Implementations Top Down: depth first recursive descent Bottom Up: shift/reduce Left Corner BUP

12 November 2004csa3050: Sentence Parsing II12 Top Down Implementation in Prolog Parser takes form of predicate parse(C,S1,S) : parse a constitutent C starting with input string S1 and ending with input string S. ?- parse(s,[the,dog,barked],[]). If C is a pre-terminal category, check, use lexicon to determine that current input word has that category. Otherwise expand C using grammar rules and parse rhs constitutents.

13 November 2004csa3050: Sentence Parsing II13 Recoding the Grammar/Lexicon % Grammar rule(s,[np,vp]). rule(np,[d,n]). rule(vp,[v]). rule(vp,[v,np]). % Lexicon word(d,the). word(n,dog). word(n,cat). word(n,dogs). word(n,cats). word(v,chase). word(v,chases).

14 November 2004csa3050: Sentence Parsing II14 Top Down Parser parse(C,[Word|S],S) :- word(C,Word). parse(C,S1,S) :- rule(C,Cs), parse_list(Cs,S1,S). parse_list([],S,S). parse_list([C|Cs],S1,S) :- parse(C,S1,S2), parse_list(Cs,S2,S).

15 November 2004csa3050: Sentence Parsing II15 Shift/Reduce Algorithm Two data structures –input string –stack Repeat until input is exhausted –Shift word to stack –Reduce stack using grammar and lexicon until no further reductions Unlike top down, algorithm does not require category to be specified in advance. It simply finds all possible trees.

16 November 2004csa3050: Sentence Parsing II16 Shift/Reduce Operation →| StepActionStackInput 0(start)the dog barked 1shiftthedog barked 2reduceddog barked 3shiftdog dbarked 4reducen dbarked 5reducenpbarked 6shiftbarked np 7reducev np 8reducevp np 9reduce s

17 November 2004csa3050: Sentence Parsing II17 Shift/Reduce Implementation in Prolog parse(S,Res) :- sr(S,[],Res). sr(S,Stk,Res) :- shift(Stk,S,NewStk,S1), reduce(NewStk,RedStk), sr(S1,RedStk,Res). sr([],Res,Res). shift(X,[H|Y],[H|X],Y). reduce(Stk,RedStk) :- brule(Stk,Stk2), reduce(Stk2,RedStk). reduce(Stk,Stk). %grammar brule([vp,np|X],[s|X]). brule([n,d|X],[np|X]). brule([np,v|X],[vp|X]). %interface to lexicon brule([Word|X],[C|X]) :- word(C,Word).

18 November 2004csa3050: Sentence Parsing II18 Shift/Reduce Operation Words are shifted to the beginning of the stack, which ends up in reverse order. The reduce step is simplified if we also store the rules backward, so that the rule s → np vp is stored as the fact brule([vp,np|X],[s|X]). The term [a,b|X] matches any list whose first and second elements are a and b respectively. The first argument directly matches the stack to which this rule applies The second argument is what the stack becomes after reduction.

19 November 2004csa3050: Sentence Parsing II19 Left Corner Parsing Key Idea: accept a word, identify the constituent it marks the beginning of, and parse the rest of the constituent top down. Main Advantages: –Like a bottom-up parser, can handle left recursion without looping, since it starts each constituent by accepting a word from the input string. –Like a top-down parser, is always expecting a particular category for which only a few of the grammar rules are relevant. It is therefore more efficient than a plain shift-reduce algorithm.

20 November 2004csa3050: Sentence Parsing II20 Left Corner Algorithm To parse a constituent of type C: 1.Accept a word W from input and determine K, its category. 2.Complete C: –If K=C, exit with success; otherwise –Find a constituent whose expansion begins with K. Call that CC. For instance, if K=d (determiner), CC could be Np, since we have rule(np,[d,n]) –Recursively left-corner parse all the remaining elements of the expansion of CC (in this case, [n]). –Put CC in place of K, and return to step 2

21 November 2004csa3050: Sentence Parsing II21 Left Corner Implementation parse(C,[W|Rest],P) :- word(K,W), complete(K,C,Rest,P). parse_list([],P,P). parse_list(([C|Cs],P1,P) :- parse(C,P1,P2), parse_list(Cs,P2,P). complete(C,C,P,P). % if C=W, do nothing complete(K,C,P1,P) :- rule(CC,[K|Rest]), parse_list(Rest,P1,P2), complete(CC,C,P2,P).

22 November 2004csa3050: Sentence Parsing II22 Trace of Left Corner Call: ( 7) parse(np, [the, cat], []) ? creep Call: ( 8) word(_L128, the) ? creep Exit: ( 8) word(d, the) ? creep Call: ( 8) complete(d, np, [cat], []) ? creep Call: ( 9) rule(_L153, [d|_G306]) ? creep Exit: ( 9) rule(np, [d, n]) ? creep Call: ( 9) parse_list([n], [cat], _L155) ? creep Call: ( 10) parse(n, [cat], _L181) ? creep Call: ( 11) word(_L196, cat) ? creep Exit: ( 11) word(n, cat) ? creep Call: ( 11) complete(n, n, [], _L181) ? creep Exit: ( 11) complete(n, n, [], []) ? creep Exit: ( 10) parse(n, [cat], []) ? creep Call: ( 10) parse_list([], [], _L155) ? creep Exit: ( 10) parse_list([], [], []) ? creep Exit: ( 9) parse_list([n], [cat], []) ? creep Call: ( 9) complete(np, np, [], []) ? creep Exit: ( 9) complete(np, np, [], []) ? creep Exit: ( 8) complete(d, np, [cat], []) ? creep Exit: ( 7) parse(np, [the, cat], []) ? creep

23 November 2004csa3050: Sentence Parsing II23 BUP: Bottom Up Parser (Matsumoto et. al. 1983) Each PS rule goes into Prolog as a clause whose head is not the mother node but the leftmost daughter. The rule np → d n pp is translated as: d(C,S1,S) :- parse(n,s1,s2), parse(pp,S2,S3), np(C,S3,S). i.e. if you have just completed a d, parse an n, then a pp, then call the procedure for a completed np. In addition to a clause for each PS rule, BUP needs a terminating clause for every kind of constitutent, e.g. np(np,S,S). i.e. if you have just accepted an np and np is what you are looking for, you are done.

24 November 2004csa3050: Sentence Parsing II24 BUP - Remarks BUP is efficient because the hard part of the search – what to do with a newly completed leftmost daughter – is handled by Prolog’s fastest search mechanism – finding a clause given the predicate.

25 November 2004csa3050: Sentence Parsing II25 BUP Implementation - Parser % parse(+C,+S1,-S) % Parse a constituent of category C % starting with input string S1 and % ending up with input string S. parse(C,S1,S) :- word(W,S1,S2), P =.. [W,C,S2,S], call(P).

26 November 2004csa3050: Sentence Parsing II26 BUP Implementation - Rules % PS-rules and terminating clauses np(C,S1,S) :- parse(vp,S1,S2), s(C,S2,S). % S --> NP VP np(C,S1,S) :- parse(conj,S1,S2), parse(np,S2,S3), np(C,S3,S). % NP --> NP Conj NP np(np,X,X). d(C,S1,S) :- parse(n,S1,S2), np(C,S2,S). % NP --> D N d(d,X,X). v(C,S1,S) :- parse(np,S1,S2), vp(C,S2,S). % VP --> V NP v(C,S1,S) :- parse(np,S1,S2), parse(pp,S2,S3), vp(C,S3,S). % VP --> V NP PP v(v,X,X). p(C,S1,S) :- parse(np,S1,S2), pp(C,S2,S). % PP --> P NP p(p,X,X). % Terminating clauses for all other categories s(s,X,X). vp(vp,X,X). pp(pp,X,X). n(n,X,X). conj(conj,X,X).

27 November 2004csa3050: Sentence Parsing II27 BUP Implementation - Lexicon % Lexicon word(conj,[and|X],X). word(p,[near|X],X). word(d,[the|X],X). word(n,[dog|X],X). word(n,[dogs|X],X). word(n,[cat|X],X). word(n,[cats|X],X). word(n,[elephant|X],X). word(n,[elephants|X],X). word(v,[chase|X],X). word(v,[chases|X],X). word(v,[see|X],X). word(v,[sees|X],X). word(v,[amuse|X],X). word(v,[amuses|X],X).

28 November 2004csa3050: Sentence Parsing II28 Principles for success Left recursive structures must be found, not predicted Empty categories must be predicted, not found An alternative way to fix things is to tranform the grammar into an equivalent grammar. –Grammar transformations can fix both left-recursion and epsilon productions –But then you parse the same language but with different trees


Download ppt "November 2004csa3050: Sentence Parsing II1 CSA350: NLP Algorithms Sentence Parsing 2 Top Down Bottom-Up Left Corner BUP Implementation in Prolog."

Similar presentations


Ads by Google