Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.

Slides:



Advertisements
Similar presentations
Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky.
Advertisements

Natural Language Processing - Parsing 1 - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment / Binding Bottom vs. Top Down Parsing.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
PARSING WITH CONTEXT-FREE GRAMMARS
Parsing with Context Free Grammars Reading: Chap 13, Jurafsky & Martin
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language, Syntax, Parsing Problems in Parsing Ambiguity, Attachment.
1 Earley Algorithm Chapter 13.4 October 2009 Lecture #9.
 Christel Kemke /08 COMP 4060 Natural Language Processing PARSING.
CS Basic Parsing with Context-Free Grammars.
Parsing context-free grammars Context-free grammars specify structure, not process. There are many different ways to parse input in accordance with a given.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
Earley’s algorithm Earley’s algorithm employs the dynamic programming technique to address the weaknesses of general top-down parsing. Dynamic programming.
CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Syntax Analysis Mooly Sagiv html:// Textbook:Modern Compiler Design Chapter 2.2 (Partial) Hashlama 11:00-14:00.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
CS 4705 Basic Parsing with Context-Free Grammars.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
LIN LIN6932: Topics in Computational Linguistics Hana Filip.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
CS 4705 Parsing More Efficiently and Accurately. Review Top-Down vs. Bottom-Up Parsers Left-corner table provides more efficient look- ahead Left recursion.
中文信息处理 Chinese NLP Lecture 9.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
1 CKY and Earley Algorithms Chapter 13 October 2012 Lecture #8.
Chapter 10. Parsing with CFGs From: Chapter 10 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.
LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
Chapter 13: Parsing with Context-Free Grammars Heshaam Faili University of Tehran.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
6/2/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 9 Giuseppe Carenini.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
1 Chart Parsing Allen ’ s Chapter 3 J & M ’ s Chapter 10.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Natural Language - General
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
Quick Speech Synthesis CMSC Natural Language Processing April 29, 2003.
CPSC 503 Computational Linguistics
CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Computerlinguistik II / Sprachtechnologie Vorlesung im SS 2010 (M-GSW-10) Prof. Dr. Udo Hahn Lehrstuhl für Computerlinguistik Institut für Germanistische.
CS 4705 Lecture 7 Parsing with Context-Free Grammars.
Click to edit Master title style Instructor: Nick Cercone CSEB - CSE Introduction to Computational Linguistics Tuesdays,
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
October 2005CSA3180: Parsing Algorithms 21 CSA3050: NLP Algorithms Parsing Algorithms 2 Problems with DFTD Parser Earley Parsing Algorithm.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Speech and Language Processing SLP Chapter 13 Parsing.
Parsing with Context Free Grammars. Slide 1 Outline Why should you care? Parsing Top-Down Parsing Bottom-Up Parsing Bottom-Up Space (an example) Top -
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Basic Parsing with Context Free Grammars Chapter 13
Natural Language - General
Parsing and More Parsing
Parsing I: CFGs & the Earley Parser
Presentation transcript:

Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05

Outline Parsing Grammar and language Parsing algorithms for CFG: –Top-down –Bottom-up –Top-down with bottom-up filter –Earley algorithm –CYK algorithm (will cover in Week 3)

Parsing

What is parsing? A sentence  parse tree (s) Two kinds of parse trees: Phrase structure Dependency structure Ex: book that flight

Good parsers Accuracy: handle ambiguity well –Precision, recall, F-measure –Percent of sentences correctly parsed Robustness: handle “ungrammatical” sentences or sentences out of domain Resources needed: treebanks, grammars Efficiency: the speed Richness: trace, functional tags, etc.

Types of parsers What kind of parse trees? Phrase-structure parsers Dependency parsers Use statistics? Statistical parsers Rule-based parsers

Types of parsers (cont) Use grammars? Grammar-based parsers: CFG, HPSG, … Parsers that do not use grammars explicitly: Ratnaparki’s parser (1997) Require treebanks? Supervised parsers Unsupervised parsers

Our focus Parsers: –Phrase-structure –Mainly statistical –Grammar-based: mainly CFG –Supervised Where grammars come from: –Built by hand –Extracted from treebanks –Induced from text

Grammar and language

Chomsky hierarchy G = (N, T, P, S) A set of non-terminal symbols N A set of terminals: T A set of productions: P A designated start symbol: S

Chomsky Hierarchy (cont) –Unstricted grammar: –Context-sensitve grammar: –Context-free grammar: –Regular grammar: or

A regular grammar G = (N, T, P, S) N = {S, A} T = {a, b, c} P = { S  a S S  b A A  c A  c A } S = { S }

Derivation

Languages A sentence is a sequence of terminals that can be derived from start symbol. L(G): a set of sentences generated by G.

A CFG G = (N, T, P, S) N = {S, A, B} T = {a, b} P = { S  ab S  aSb } S = { S }

Derivation

Another CFG N = { S } T = {a,b} P = { S  a S a, S  b S b } Nesting:

Grammars and languages GrammarLanguageAutomataRecognitionDependency Regular grammar Regular language Finite-state automata linearstrict local Context-free grammar Context-free language Pushdown automata polynomialnested Context- sensitive grammar Context- sensitive language Linear bounded automata NP- complete crossing Unstricted grammar Recursively enumerable languages Turing machines undecidablearbitrary

Language complexity Given a language L, is it regular? Is it context-free? Given a language, how to find a grammar? Are human languages context-free?

What about human languages? Nesting => beyond regular language: –The book was lost: N1 V1 –The book that the student bought was lost: N1 N2 V2 V1 –The moment when …. has passed: N1 N2 N3 V3 V2 V1 Crossing => beyond context-free –Pattern in Dutch: N1 N2 N3 V1 V2 V3

Summary of Chomsky Hierarchy There are four types of grammars Each type has its own generative power Human language is not context-free But in order to process human languages, we often use CFG as an approximation.

Other grammar formalisms Phrase structure based: –CFG-based grammars: HPSG, LFG –Tree grammars: TAG, D-grammar Dependency based: –Dependency grammars

Equivalence of two grammars Weak Equivalence: L(G1) = L(G2) Strong Equivalence: –L(G1) = L(G2) and –the parse trees for every sentence are identical other than renaming.

Context-free grammar

A CFG (1)S -> NP VP (2)S -> Aux NP VP (3)S -> VP (4) VP -> V (5) NP -> Det N (6) V -> book (7) N -> book/flight (8) Det -> a/the/that (9) Aux -> do

Parsing algorithms Top-down Bottom-up Top-down with bottom-up filtering Earley algorithm CYK algorithm....

Top-down parsing Start from the start symbol, and apply rules Top-down, depth-first, left-to-right parsing Never explore trees that do not result in S => goal-directed search Waste time on trees that do not match input sentence.

An example Book that flight

Bottom-up parsing Use the input to guide => data-driven search Find rules whose right-hand sides match the current nodes. Waste time on trees that don’t result in S.

The example (cont) Book that flight

Top-down parsing with bottom-up look-ahead filtering Both top-down and bottom-up generate too many useless trees. Combine the two to reduce over-generation B is a left-corner of A if Left-corner table provides more efficient look-ahead –Pre-compute all POS that can serve as the leftmost POS in the derivations of each non-terminal category

The example Book that flight

Remaining problems Left-recursion: NP -> NP PP Ambiguity Repeated parsing of subtrees

Dynamic programming (DP) DP: –Dividable: The optimal solution of a sub- problem is part of the optimal solution of the whole problem. –Memorization: Solve small problems only once and remember the answers. Example: T(n) = T(n-1) + T(n-2)

Parsing with DP Three well-known CFG parsing algorithms: –Earley algorithm (1970) –Cocke-Younger-Kasami (CYK) (1960) –Graham-Harrison-Ruzzo (GHR) (1980)

Earley algorithm Use DP to do top-down search A single left-to-right pass that fills out an array (called a chart) that has N+1 entries. An entry is a list of states: it represents all partial trees generated so far.

A state A state contains: –A single dotted grammar rule: –[i, j]: i: where the state begins w.r.t. the input j: the position of dot w.r.t. the input In order to retrieve parse trees, we need to keep a list of pointers, which point to older states.

Dotted rules 0 Book 1 that 2 flight 3 S --> VP, [0,0] –S begins position 0 –The dot is at position 0, too. –So, nothing has been covered so far. –We need cover VP next. NP --> Det Nom, [1,2] –the NP begins at position 1 –the dot is currently at position 2 –so, Det has been successfully covered. –We need to cover Nom next.

Parsing procedure From left to right, for each entry chart[i]: apply one of three operators to each state: predictor: predict the expansion scanner: match input word with the POS after the dot. completer: advance previous created states.

Predicator Why this operation: create new states to represent top- down expectations When to apply: the symbol after the dot is a non-POS. –Ex: S --> NP VP [i, j] What to do: Adds new states to current chart: One new state for each expansion of the non-terminal –Ex: VP  V [j, j] VP  V NP [j, j]

Scanner Why: match the input word with the POS in a rule When: the symbol after the dot is a POS –Ex: VP --> V NP [ i, j ], word[ j ] = “book” What: if matches, adds state to next entry –Ex: V  book [ j, j+1 ]

Completer Why: parser has discovered a constituent, so we must find and advance states that were waiting for this When: dot has reached right end of rule –Ex: NP --> Det Nom [ i, j ] What: Find every state w/ dot at i and expecting an NP, e.g., VP --> V NP [ h, i ] –Adds new states to current entry VP  V NP [ h, j ]

Retrieving parse trees Augment the Completer to add pointers to older states from which it advances To retrieve parse trees, do a recursive retrieval from a complete S in the final chart entry.

An example: Book that flight Rules: (1) S  NP VP (9) N  book/cards/flight (2) S  VP (10) Det  that (3) VP  V NP (11) P  with (4) VP  VP PP (12) V  book (5) NP  NP PP (6) NP  N (7) NP  Det N (8) PP  P NP

Chart [0], word[0]=book S0: Start .S [0,0] init pred S1: S .NP VP [0,0] S0 pred S2: S .VP [0,0] S0 pred S3: NP .NP PP [0,0] S1 pred S4: NP .Det N [0,0] S1 pred S5: NP .N [0,0] S1 pred S6: VP .V NP [0,0] S2 pred S7: VP .VP PP [0,0] S2 pred

Chart[1], word[1]=that S8: N  book. [0,1] S5 scan S9: V  book. [0,1] S6 scan S10: NP  N. [0, 1] S8 comp [S8] S11: VP  V. NP [0,1] S9 comp [S9] S12: S  NP. VP [0,1] S10 comp [S10] S13: NP  NP. PP [0,1] S10 comp [S10] S14: NP .NP PP [1,1] S11 pred S15: NP .Det N [1,1] S11 pred S16: NP .N [1,1] S11 pred S17: VP .V NP [1,1] S12 pred S18: VP .VP PP [1,1] S12 pred S19: PP .P NP [1,1] S13 pred

Chart[2] word[2]=flight S20: Det  that. [1,2] S15 scan S21: NP  Det. N [1,2] S20 comp [S20]

Chart[3] S22: N  flight. [2,3] S21 scan S23: NP  Det N. [1,3] S22 comp [S20,S22] S24: VP  V NP. [0,3] S23 comp [S9,S23] S25: NP  NP. PP [1,3] S23 comp [S23] S26: S  VP. [0,3] S24 comp [S24] S27: VP  VP. PP [0,3] S24 comp [S24] S28: PP .P NP [3,3] S25 pred S29: start  S. [0,3] S26 comp [S26]

Retrieving parse trees Start from chart[3], look for start  S. [0,3] S26 S24 S9, S23 S20, S22

Summary of Earley algorithm Top-down search with DP A single left-to-right pass that fills out a chart Complexity: A: number of entries: B: number of states within an entry: C: time to process a state: 