Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie.

Similar presentations


Presentation on theme: "CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie."— Presentation transcript:

1 CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee

2 What Issues Arise in Creating Grammars?  Agreement  Subcategorization  Movement

3 Agreement  Simple Examples: –This dog –Those dogs –*Those dog –*This dogs  More complicated examples: –Do any flights stop in Chicago? –Does Delta fly from Atlanta to Boston? –Do I get dinner on this flight? –What flights leave in the morning? –* What flight leave in the morning?

4 Agreement  Nouns: Nominal → Noun | Noun Noun –Nominal Sg → Noun Sg | Noun Sg Noun Sg –Nominal Pl → Noun Pl | Noun Pl Noun Pl  Subj-Verb Agreement: S → Aux NP VP –S → Aux 3sg NP 3sg VP –S → Aux non3sg NP non3sg VP  Lexical items: –Noun Sg → flight –Noun Pl → flights –Aux 3sg → does | has | can … –Aux non3sg → do | have | can …

5 Agreement Features Propagate Downard …  NP 3sg → (Det) (Card) (Ord) (Quant) (AP) Nominal Sg  NP non3sg → (Det) (Card) (Ord) (Quant) (AP) Nominal Pl

6 What’s wrong with this picture?  Combinatorial explosion  Other feature-based expansions?  Loss of Generalization

7 Subcategorization  Verbs have preferences for the kinds of constituents they co-occur with  For example: –VP → Verb (disappear) –VP → Verb NP (prefer a morning flight) –VP → Verb NP PP (leave Boston in the morning) –VP → Verb PP (leaving on Thursday)  But not: *I disappeared the cat.

8 Sentential complements  Example: –You [ VP [ V said [ S there were two flights that were the cheapest]]] –You [ VP [ V said [ S you had a two hundred sixty six dollar fare]]] –[ VP [ V Tell][ NP me ] [ S how to get from the airport in Philadelphia to downtown]]  Rule: –VP → Verb S

9 Other VP Constituents  A VP can be the constituent of another VP: VP → Verb VP –I want [ VP to fly from Milwaukee to Orlando] –I’m trying [ VP to find a flight that goes from Pittsburgh to Denver next Friday]  Verbs can also be followed by particles to form a phrasal verb: VP → Verb Particle –take off

10 Why do we need Subcategorization?  Important observation: while a verb phrase can have many possible kinds of constituents, not every verb is compatible with every verb phrase  Example: verb want can be used –With a NP complement: I want a flight –With an infinitive VP complement: I want to fly to Dallas  But verb find cannot take such a VP complement –*I find to fly to Dallas

11 Subcategorization Frame Verb Example Ø eat, sleep, … I want to eat NP prefer, find, leave,... Find [NP the flight from Pittsburgh to Boston] NP NP show, give, … Show [NP me] [NP airlines with flights from Pittsburgh] PP from PP to fly, travel, … I would like to fly [pp from Boston] [pp to Philadelphia] NP PP with help, load, … Can you help [NP me] [pp with a flight] VP to prefer, want, need, … I would prefer [VP to to go by United airlines] VP brst can, would, might, … I can [VP brst go from Boston] S mean Does this mean [S AA has a hub in Boston]?

12 Auxiliaries  Examples: –Modals: can, could, may, might –Perfect: have –Progressive: be –Passive: be  What are their subcategories?  Ordering: modal < perfect < progressive < passive e.g, might have been prevented

13 Movement  I looked up his grade. I looked his grade up.  John put the book on the table. What did John put on the table?  Long distance dependencies.

14 Grammar Equivalence and Normal Form  Weak equivalence  Strong equivalence  Chomsky Normal Form (CNF) A → B C or A → a

15 CNF Example Convert to Chomsky-Normal Form (CNF): S → a Y X X → a X | b Y → Y a | b

16 Why do care about grammar? We need grammars for parsing! Note: Parsing is the topic we will cover next.

17 What Linguistic Representations are necessary for parsing?  Words –Classes: groups of words which behave similarly –Function/Content –P.O.S.: Nouns, Verbs, Adjectives, Prepositions  Constituents: –Groupings of words into larger units which behavior similarly and have a particular p.o.s. as their head –Phrases: NP headed by Noun... VP, PP,...

18 Analyzing Language in Terms of these Representations  Morphological parsing: –analyze words into morphemes and affixes –rule-based, FSAs, FSTs  POS Tagging  Syntactic parsing: –identify component parts and how related –to see if a sentence is grammatical –to assign a semantic structure

19 Syntactic Parsing  Declarative formalisms like CFGs define the legal strings of a language but don’t specify how to recognize or assign structure to them  Parsing algorithms specify how to recognize the strings of a language and assign each string one or more syntactic structures  Parse trees useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…and almost every task in NLP

20 CFG for Fragment of English: G 0

21 Parse Tree for ‘Book that flight’ using G 0

22 Parsing as a Search Problem  Searching FSAs –Finding the right path through the automaton –Search space defined by structure of FSA  Searching CFGs –Finding the right parse tree among all possible parse trees –Search space defined by the grammar  Constraints provided by the input sentence and the automaton or grammar

23 Two Search Strategies  Top-Down –Search for tree starting from S until input words covered.  Bottom-Up –Start with words and build upwards toward S

24 Top-Down Parser  Builds from the root S node to the leaves  Assuming we build all trees in parallel: –Find all trees with root S –Next expand all constituents in these trees/rules –Continue until leaves are pos –Candidate trees failing to match pos of input string are rejected  Top-Down: Rationalist Tradition –Expectation driven –Goal: Build tree for input starting with S

25 Top-Down Search Space for G 0

26 Bottom-Up Parsing  Parser begins with words of input and builds up trees, applying G 0 rules whose right-hand sides match  Book that flight NDetNVDetN BookthatflightBookthatflight –‘Book’ ambiguous –Parse continues until an S root node reached or no further node expansion possible  Bottom-Up: Empiricist Tradition –Data driven –Primary consideration: Lowest sub-trees of final tree must hook up with words in input.

27 Expanding Bottom-Up Search Space for ‘Book that flight’

28 Comparing Top-Down and Bottom-Up  Top-Down parsers: never explore illegal parses (e.g. can’t form an S) -- but waste time on trees that can never match the input  Bottom-Up parsers: never explore trees inconsistent with input -- but waste time exploring illegal parses (no S root)  For both: how to explore the search space? –Pursuing all parses in parallel or …? –Which node to expand next? –Which rule to apply next?

29 Search Strategy and Search Control  Parallel: –Explore all possible trees in parallel  Depth-first search: –Agenda of search states: expand search space incrementally, exploring most recently generated state (tree) each time –When you reach a state (tree) inconsistent with input, backtrack to most recent unexplored state (tree)  Which node to expand? –Leftmost  Which grammar rule to use? –Order in the grammar

30 Basic Algorithm for Top-Down, Depth-First, Left-Right Strategy  Initialize agenda with ‘S’ tree and point to first word and make this current search state (cur)  Loop until successful parse or empty agenda –Apply all applicable grammar rules to leftmost unexpanded node of cur If this node is a POS category and matches that of the current input, push this onto agenda Else, push new trees onto agenda –Pop new cur from agenda

31 Basic Top-Down Parser

32 Top-Down Depth-First Derivation Using G 0

33 Example: Does this flight include a meal?

34 Example continued …

35 Augmenting Top-Down Parsing with Bottom-Up Filtering  We saw: Top-Down, depth-first, L-to-R parsing –Expands non-terminals along the tree’s left edge down to leftmost leaf of tree –Moves on to expand down to next leftmost leaf…  In a successful parse, current input word will be the first word in derivation of the unexpanded node that the parser is currently processing  So….lookahead to left-corner of the tree in –B is a left-corner of A if A =*=> B  –Build table with left-corners of all non-terminals in grammar and consult before applying rule

36 Left Corners

37 Left-Corner Table for G 0 Previous Example: CategoryLeft Corners SDet, PropN, Aux, V NPDet, PropN NomN VPV

38 Summing Up Parsing Strategies  Parsing is a search problem which may be implemented with many search strategies  Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation: Top- Down Parsing with Bottom-Up look-ahead  Left-corner table provides more efficient look- ahead –Pre-compute all POS that can serve as the leftmost POS in the derivations of each non-terminal category

39 Three Critical Problems in Parsing  Left Recursion  Ambiguity  Repeated Parsing of Sub-trees

40 Left Recursion  Depth-first search will never terminate if grammar is left recursive: A→A B β Examples: NP → NP PP, VP → VP PP, S → S and S, NP → NP and NP  Indirect Left Recursion: A→A B β Example: NP → Det Nominal, Det → NP ’s NP NominalDet ’s *

41 Solutions to Left Recursion  Rule ordering  Don't use recursive rules  Limit depth of recursion in parsing to some analytically or empirically set limit  Don't use top-down parsing

42 Rule Ordering  Bad: –NP → NP PP –NP → Det Nominal  Better alternative: –First: NP → Det Nominal –Then: NP → NP PP

43 Grammar Rewriting  Rewrite left-recursive grammar as weakly equivalent non-recursive one.  Can be done: –By Hand (ick) or … –Automatically  Example Rewrite: NP → NP PP, NP → Det Nominal As: NP → Det Nominal Stuff, Stuff → PP Stuff, Stuff → ε

44 Problems with Grammar Rewriting Original: [NP [NP the book] [PP on [NP [NP the table] [PP in [NP [NP the yard] [PP of [NP the house]]]]]]] Becomes: [NP the book [Stuff [PP on [NP the table [Stuff [PP in [NP the yard [Stuff [PP of [NP the house [Stuff]]] [Stuff]]]] [Stuff]]]] [Stuff]]]

45 Depth Bound  Set an arbitrary bound  Set an analytically derived bound  Run tests and derive reasonable bound empirically  Use iterative deepening

46 Ambiguity  Local Ambiguity –Leads to hypotheses that are locally reasonable but eventually lead nowhere –“Book that flight”  Global Ambiguity –Leads to multiple parses for the same input –“I shot an elephant in my pajamas”

47 More Ambiguity Examples  Multiple legal structures –Attachment (e.g. I saw a man on a hill with telescope) –Coordination (e.g. younger cats and dogs) –NP bracketing (e.g. Spanish language teachers)

48 Two Parse Trees for Ambiguous Sentence

49 More Ambiguity: ‘Can you book TWA flights?’

50 A Correct Parse for ‘Show me the meal on Flight UA 386 from San Francisco to Denver’

51 Inefficient Re-Parsing of Subtrees

52 Invariants  Despite ambiguity, there are invariants  Sub-components of final parse tree are re- analyzed unnecessarily  Except for top-most component, every part of final tree is derived more than once.

53 What’s the solution?  Key to efficient processing is reuse  Fill table with solutions to sub-problems for later use.  We want an algorithm that: –Does not do repeated work –Does top-down search with bottom-up filtering –Solves left-recursion problem –Solves an exponential problem

54 Dynamic Programming  Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds  Look up subtrees for each constituent rather than re-parsing  Since all parses implicitly stored, all available for later disambiguation  Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms

55 Earley Algorithm  Uses dynamic programming to do parallel top- down search in (worst case) O(N 3 ) time  First, left-to-right pass fills out a chart with N+1 states –Think of chart entries as sitting between words in the input string keeping track of states of the parse at these positions –For each word position, chart contains set of states representing all partial parse trees generated to date. E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence

56 Chart Entries  predicted constituents  in-progress constituents  completed constituents Represent three types of constituents:

57 Progress in parse represented by Dotted Rules  Position of indicates type of constituent  0 Book 1 that 2 flight 3 S → VP, [0,0] (predicted) NP → Det Nom, [1,2] (in progress) VP → V NP, [0,3] (completed)  [x,y] tells us what portion of the input is spanned so far by this rule  Each State s i :, [, ]

58 S → VP, [0,0] –First 0 means S constituent begins at the start of input –Second 0 means the dot here too –So, this is a top-down prediction NP → Det Nom, [1,2] –the NP begins at position 1 –the dot is at position 2 –so, Det has been successfully parsed –Nom predicted next 0 Book 1 that 2 flight 3

59 0 Book 1 that 2 flight 3 (continued) VP → V NP, [0,3] –Successful VP parse of entire input

60 Successful Parse  Final answer found by looking at last entry in chart  If entry resembles S →  [nil,N] then input parsed successfully  Chart will also contain record of all possible parses of input string, given the grammar

61 Parsing Procedure for the Earley Algorithm  Move through each set of states in order, applying one of three operators to each state: –predictor: add predictions to the chart –scanner: read input and add corresponding state to chart –completer: move dot to right when new constituent found  Results (new states) added to current or next set of states in chart  No backtracking and no states removed: keep complete history of parse

62 States and State Sets  Dotted Rule s i represented as, [, ]  State Set S j to be a collection of states s i with the same.

63 Earley Algorithm from Book

64 Earley Algorithm (simpler!) 1. Add Start → · S, [nil,0] to state set 0 2. Predict all states you can, adding new predictions to state set 0. Let i = 1. 3. Scan input word i—add all matched states to state set S i. Add all new states produced by Complete to state set S i Add all new states produced by Predict to state set S i Unless i=n, (a) Let i = i + 1; (b) repeat step 3. 4. At the end, see if state set n contains Start → S ·, [nil,n]

65 3 Main Sub-Routines of Earley Algorithm  Predictor: Adds predictions into the chart.  Completer: Moves the dot to the right when new constituents are found.  Scanner: Reads the input words and enters states representing those words into the chart.

66 Predictor  Intuition: create new state for top-down prediction of new phrase.  Applied when non part-of-speech non-terminals are to the right of a dot: S → VP [0,0]  Adds new states to current chart –One new state for each expansion of the non-terminal in the grammar VP → V [0,0] VP → V NP [0,0]  Formally: S j : A → α · B β, [i,j] S j : B → · γ, [j,j]

67 Scanner  Intuition: Create new states for rules matching part of speech of next word.  Applicable when part of speech is to the right of a dot: VP → V NP [0,0] ‘Book…’  Looks at current word in input  If match, adds state(s) to next chart VP → V NP [0,1]  Formally: S j : A → α · B β, [i,j] S j+1 : A → α B ·β, [i,j+1]

68 Completer  Intuition: parser has finished a new phrase, so must find and advance states all that were waiting for this  Applied when dot has reached right end of rule NP → Det Nom [1,3]  Find all states w/dot at 1 and expecting an NP VP → V NP [0,1]  Adds new (completed) state(s) to current chart VP → V NP [0,3]  Formally: S k : B → δ ·, [j,k] S k : A → α B · β, [i,k], where: S j : A → α · B β, [i,j].

69 Example: State Set S 0 for Parsing “Book that flight” using Grammar G 0 nil

70 Example: State Set S 1 for Parsing “Book that flight” VP→  V and VP →  V NP are both passed to Scanner, which adds them to Chart[1], moving dots to right Scanner

71 Prediction of Next Rule  When VP → V  is itself processed by the Completer, S → VP  is added to Chart[1] since VP is a left corner of S  Last 2 rules in Chart[1] are added by Predictor when VP → V  NP is processed  And so on….

72 Last Two States Scanner γ → S. [nil,3] Completer

73 How do we retrieve the parses at the end?  Augment the Completer to add pointers to prior states it advances as a field in the current state –i.e. what state did we advance here? –Read the pointers back from the final state

74 Error Handling  What happens when we look at the contents of the last table column and don't find a S →  rule? –Is it a total loss? No... –Chart contains every constituent and combination of constituents possible for the input given the grammar  Also useful for partial parsing or shallow parsing used in information extraction

75 Earley’s Keys to Efficiency  Left-recursion, Ambiguity and repeated re- parsing of subtrees –Solution: dynamic programming  Combine top-down predictions with bottom-up look-ahead to avoid unnecessary expansions  Earley is still one of the most efficient parsers  All efficient parsers avoid re-computation in a similar way.  But Earley doesn’t require grammar conversion!

76 Next Time  Read Chapter 14.


Download ppt "CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie."

Similar presentations


Ads by Google