Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS626-460: Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.

Similar presentations


Presentation on theme: "CS626-460: Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with."— Presentation transcript:

1 CS626-460: Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with major contributions from Dr. Rajat Mohanty)

2 Syntax Syntax is the study of the combination of words into phrases, clauses and sentences. Syntax describes how sentences and their constituents are structured.

3 Grammar A finite set of rules that generates only and all sentences of a language. that assigns an appropriate structural description to each one.

4 Grammatical Analysis Techniques Two main devices Morphological Categorial Functional Sequential Hierarchical Transformational Breaking up a String Labeling the Constituents

5 Hierarchical Breaking up and Categorial Labeling S VP VAdv ran away NP AN PoorJohn Poor John ran away.

6 Hierarchical Breaking up and Functional Labeling Immediate Constituent (IC) Analysis Construction types in terms of the function of the constituents: Predication (subject + predicate) Modification (modifier + head) Complementation(verbal + complement) Subordination (subordinator + dependent unit) Coordination (independent unit + coordinator)

7 S Head Modifier Inthemorning,theskylookedmuchbrighter SubordinatorDU PredicateSubject Head VerbalComplement Modifier In the morning, the sky looked much brighter. An Example

8 Noun Phrases John NP N student NP N the Det student NP N the Det intelligent AdjP John the student the intelligent student

9 Phrases

10 Noun Phrase five NP Quant his Det first Ord students N PhD N his first five PhD students

11 Noun Phrase five NP Quant the Det students N best AP of my class PP The five best students of my class

12 Verb Phrases sing VP V can Aux the ball VP NP can Aux hit V can sing can hit the ball

13 Verb Phrase a flower VP NP can Aux give V to Mary PP Can give a flower to Mary

14 Verb Phrase John VP NP may Aux make V the chairman NP may make John the chairman

15 Verb Phrase the book VP NP may Aux find V very interesting AP may find the book very interesting

16 Prepositional Phrases in the classroom the river PP NP near P the classroom PP NP in P near the river

17 Adjective Phrases intelligent AP A honest AP A very Degree of sweets AP PP fond A intelligent very honest fond of sweets

18 Adjective Phrase very worried that she might have done badly in the assignment that she might have done badly in the assignment AP S’ very Degree worried A

19 A segment of English Grammar S’  (C) S S  {NP/S’} VP VP  (AP+) V (AP+) ({NP/S’}) (AP+) (PP+) (AP+) NP  (D) (AP+) N (PP+) PP  P NP AP  (AP) A

20 PSG Parse Tree John wrote those words in the Book of Proverbs. S VP NP V PropN NP John wrote those words PP NP in P the book of proverbs NPPP

21 Penn Treebank (S (NP-SBJ (NP John)) (VP wrote (NP those words) (PP-LOC in (NP (NP-TTL (NP the Book) (PP of (NP Proverbs))) John wrote those words in the Book of Proverbs.

22 PSG Parse Tree Official trading in the shares will start in Paris on Nov 6. S VP NP N AP official PP trading willstart on Nov 6 A PP NP in P the shares NP PPVAux in Paris

23 Penn POS Tags [ Official/JJ trading/NN ] in/IN [ the/DT shares/NNS ] will/MD start/VB in/IN [ Paris/NNP ] on/IN [ Nov./NNP 6/CD ] Official trading in the shares will start in Paris on Nov 6.

24 Penn Treebank ( (S (NP-SBJ (NP Official trading) (PP in (NP the shares))) (VP will (VP start (PP-LOC in (NP Paris)) (PP-TMP on (NP (NP Nov 6) Official trading in the shares will start in Paris on Nov 6.

25 Penn POS Tag Sset Adjective: JJ Adverb: RB Cardinal Number: CD Determiner:DT Preposition: IN Coordinating ConjunctionCC Subordinating Conjunction: IN Singular Noun:NN Plural Noun:NNS Personal Pronoun:PP Proper Noun:NP Verb base form: VB Modal verb:MD Verb (3sg Pres):VBZ Wh-determiner:WDT Wh-pronoun:WP

26 Basic Parsing Strategy

27 A Fragment of English Grammar S  NP VP VP  V NP NP  NNP | ART N NNP  Ram V  ate | saw ART  a | an | the N  rice | apple | movie

28 Derivation S => NP VP (rewrite S) => NNP VP(rewrite NP) => Ram VP (rewrite NNP) => Ram V NP (rewrite VP) => Ram ate NP(rewrite V) => Ram ate ART N(rewrite NP) => Ram ate the N(rewrite ART) => Ram ate the rice (rewrite N) Multiple Choice Points S is a special symbol called start symbol.

29 Two Strategies : Top-Down & Bottom-Up Top down : Start with S and generate the sentence. Bottom up : Start with the words in the sentence and use the rewrite rules backwards to reduce the sequence of symbols to produce S. Previous slide showed top-down strategy.

30 Bottom-Up Derivation Ram ate the rice =>NNP ate the rice (rewrite Ram) =>NNP V the rice (rewrite ate) =>NNP V ART rice(rewrite the) =>NNP V ART N(rewrite rice) =>NP V ART N(rewrite NNP) =>NP V NP(rewrite ART N) =>NP VP(rewrite V NP) => S

31 Parsing Algorithm A procedure that “searches” through the grammatical rules to find a combination that generates a tree which stands for the structure of the sentence

32 Top-Down Parsing (using A*) DFS on the AND-OR graph Data structures: Open List (OL): Nodes to be expanded Closed List (CL): Expanded Nodes Input List (IL): Words of sentence to be parsed Moving Head (MH): Walks over the IL

33 Trace of Top-Down Parsing OL CL (empty) IL S Ram ate the rice Initial Condition (T 0 ) MH

34 Trace of Top-Down Parsing OL CL IL MH NP VP S Ram ate the rice T1:T1:

35 Trace of Top-Down Parsing OL CL IL MH NNP ART N VP S NP Ram ate the rice T2:T2:

36 Trace of Top-Down Parsing OL CL IL ART N VP S NP NNP Ram ate the rice T3:T3: MH(portion of Input consumed)

37 Trace of Top-Down Parsing OL CL IL N VP S NP NNP ART* Ram ate the rice T4:T4: MH (* indicates ‘useless’ expansion)

38 Trace of Top-Down Parsing OL CL IL VP S NP NNP ART* N* Ram ate the rice T5:T5: MH

39 Trace of Top-Down Parsing OL CL IL V NP S NP NNP ART* N* Ram ate the rice T6:T6: MH

40 Trace of Top-Down Parsing OL CL IL NP S NP NNP ART* N* V Ram ate the rice T7:T7: MH

41 Trace of Top-Down Parsing OL CL IL NNP ART N S NP NNP ART* N* V NP Ram ate the rice T8:T8: MH

42 Trace of Top-Down Parsing OL CL IL ART N S NP NNP ART* N* V NNP* Ram ate the rice T9:T9: MH

43 Trace of Top-Down Parsing OL CL IL N S NP NNP ART* N* V NNP ART Ram ate the rice T 10 : MH

44 Trace of Top-Down Parsing OL CL IL S NP NNP ART* N* V NNP ART N Ram ate the rice T 11 : MH Successful Termination: OL empty AND MH at the end of IL.

45 Bottom-Up Parsing Basic idea: Refer to words from the lexicon. Obtain all POSs for each word. Keep combining until S is obtained. (to be continued)


Download ppt "CS626-460: Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with."

Similar presentations


Ads by Google