Presentation is loading. Please wait.

Presentation is loading. Please wait.

Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.

Similar presentations


Presentation on theme: "Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU."— Presentation transcript:

1 Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU

2 A Grammar Formalism We have informally described the basic constructs of English grammar Now we want to introduce a formalism for representing these constructs – a formalism that we can use as input to a parsing procedure 1/16/14NYU2

3 Context-Free Grammar A context-free grammar consists of – a set of non-terminal symbols A, B, C, … ε N – a set of terminal symbols a, b, c, … ε T – a start symbol S ε N – a set of productions P of the form N  (N U T)* 1/16/14NYU3

4 A Simple Context-Free Grammar A simple CFG: S  NP VP NP  cats NP  the cats NP  the old cats NP  mice VP  sleep VP  chase NP 1/16/14NYU4

5 Derivation and Language If A  β is a production of the grammar, we can rewrite α A γ  α β γ A derivation is a sequence of rewrite operations  …  …  NP VP  cats VP  cats chase NP The language generated by a CFG is the set of strings (sequences of terminals) which can be derived from the start symbol S  …  …  T* S  NP VP  cats VP  cats chase NP  cats chase mice 1/16/14NYU5

6 Preterminals It is convenient to include a set of symbols called preterminals (corresponding to the parts of speech) which can be directly rewritten as terminals (words) This allows us to separate the productions into a set which generates sequences of preterminals (the “grammar”) and those which rewrite the preterminals as terminals (the “dictionary”) 1/16/14NYU6

7 A Grammar with Preterminals grammar: S  NP VP NP  N NP  ART N NP  ART ADJ N VP  V VP  V NP dictionary: N  cats N  mice ADJ  old DET  the V  sleep V  chase 1/16/14NYU7

8 Grouping Alternates To make the grammar more compact, we group productions with the same left-hand side: S  NP VP NP  N | ART N | ART ADJ N VP  V | V NP 1/16/14NYU8

9 A grammar can be used to – generate – recognize – parse Why parse? – parsing assigns the sentence a structure that may be helpful in determining its meaning 1/16/14NYU9

10 vs Finite State Language CFGs are more powerful than finite-state grammars (regular expressions) – FSG cannot generate center embeddings S  ( S ) | x – even if FSG can capture the language, it may be unable to assign the nested structures we want

11 A slightly bigger CFG sentence  np vp np  ngroup | ngroup pp ngroup  n | art n | art adj n vp  v | v np | v vp | v np pp (auxilliary) pp  p np (pp = prepositional phrase) 1/16/1411NYU

12 Ambiguity Most sentences will have more than one parse Generally different parses will reflect different meanings … “I saw the man with a telescope.” Can attach pp (“with a telescope”) under np or vp 1/16/14NYU12

13 A CFG with just 2 nonterminals S  NP V | NP V NP NP  N | ART NOUN | ART ADJ N use this for tracing our parsers 1/16/1413NYU

14 Top-down parser repeat expand leftmost non-terminal using first production (save any alternative productions on backtrack stack) if we have matched entire sentence, quit (success) if we have generated a terminal which doesn't match sentence, pop choice point from stack (if stack is empty, quit (failure)) 1/16/14NYU14

15 Top-down parser 1/16/14NYU15 0: S the cat chases mice

16 Top-down parser 1/16/14NYU16 0: S 1: NP the cat chases mice 2: V backtrack table 0: S  NP V NP

17 Top-down parser 1/16/14NYU17 0: S 1: NP 3: N the cat chases mice 2: V backtrack table 0: S  NP V NP 1:NP  ART ADJ N 1: NP  ART N

18 Top-down parser 1/16/14NYU18 0: S 1: NP 3: ART the cat chases mice 4: N 2: V backtrack table 0: S  NP V NP 1:NP  ART ADJ N

19 Top-down parser 1/16/14NYU19 0: S 1: NP 3: ART the cat chases mice 4: ADJ 2: V 5: N backtrack table 0: S  NP V NP

20 Top-down parser 1/16/14NYU20 0: S 1: NP the cat chases mice 2: V3: NP backtrack table

21 Top-down parser 1/16/14NYU21 0: S 1: NP 3: N the cat chases mice 2: V3: NP backtrack table 1:NP  ART ADJ N 1: NP  ART N

22 Top-down parser 1/16/14NYU22 0: S 1: NP 4: ART the cat chases mice 2: V3: NP 5: N backtrack table 1:NP  ART ADJ N

23 Top-down parser 1/16/14NYU23 0: S 1: NP 4: ART the cat chases mice 2: V3: NP 5: N 6: N parse! backtrack table 1:NP  ART ADJ N 3: NP  ART ADJ N 3: NP  ART N

24 Bottom-up parser Builds a table where each row represents a parse tree node spanning the words from start up to end 1/16/14NYU24 symbolstartendconstituents N01-

25 Bottom-up parser We initialize the table with the parts-of – speech of each word … 1/16/14NYU25 symbolstartendconstituents ART01- N12- V23- N34-

26 Bottom-up parser We initialize the table with the parts-of – speech of each word … remembering that many English words have several parts of speech 1/16/14NYU26 symbolstartendconstituents ART01- N12- V23- N23- N34-

27 Bottom-up parser Then if these is a production A  B C and we have entries for B and C with end B = start C, we add an entry for A with start = start B and end = end C [see lecture notes for handling general productions] 1/16/14NYU27 node #symbolstartendconstituents 0ART01- 1N12- 2V23- 3N23- 4N34- 5NP02[0, 1]

28 Bottom-up parser 1/16/14NYU28 node #symbolstartendconstituents 0ART01- 1N12- 2V23- 3N23- 4N34- 5NP02[0, 1] 6NP12[1] 7NP23[3] 8NP34[4] 9S04[5, 2, 8]  10S14[6, 2, 8] several more S’s parse!


Download ppt "Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU."

Similar presentations


Ads by Google