Amirkabir University of Technology Computer Engineering Faculty AILAB Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing Course,

Slides:



Advertisements
Similar presentations
Parsing V: Bottom-up Parsing
Advertisements

Compiler Construction
Augmented Transition Networks
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
1 Parsing The scanner recognizes words The parser recognizes syntactic units Parser operations: Check and verify syntax based on specified syntax rules.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
PARSING WITH CONTEXT-FREE GRAMMARS
Parsing I Context-free grammars and issues for parsers.
Top-down Parsing By Georgi Boychev, Rafal Kala, Ildus Mukhametov.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
Context-Free Parsing. 2/37 Basic issues Top-down vs. bottom-up Handling ambiguity –Lexical ambiguity –Structural ambiguity Breadth first vs. depth first.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
Context-Free Grammars Lecture 7
Fall 2007CS 2251 Miscellaneous Topics Deque Recursion and Grammars.
Parsing III (Eliminating left recursion, recursive descent parsing)
1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.
Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)
Prof. Fateman CS 164 Lecture 91 Bottom-Up Parsing Lecture 9.
1 The Parser Its job: –Check and verify syntax based on specified syntax rules –Report errors –Build IR Good news –the process can be automated.
COP4020 Programming Languages
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
Review: Search problem formulation Initial state Actions Transition model Goal state (or goal test) Path cost What is the optimal solution? What is the.
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Parsing IV Bottom-up Parsing Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
4 4 (c) parsing. Parsing A grammar describes the strings of tokens that are syntactically legal in a PL A recogniser simply accepts or rejects strings.
1 Natural Language Processing Lecture 11 Efficient Parsing Reading: James Allen NLU (Chapter 6)
KU NLP Structures and Strategies for State Space Search Depth-First and Breadth-First Search q A search algorithm must determine the order in which.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
PART I: overview material
Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
CS 415 – A.I. Slide Set 5. Chapter 3 Structures and Strategies for State Space Search – Predicate Calculus: provides a means of describing objects and.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Transition Network Grammars for Natural Language Analysis - W. A. Woods In-Su Yoon Pusan National University School of Electrical and Computer Engineering.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
Introduction to Parsing
1 Chart Parsing Allen ’ s Chapter 3 J & M ’ s Chapter 10.
Parsing with Context-Free Grammars References: 1.Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2.Speech and Language Processing, chapters 9,
CS 3240 – Chapter 5. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Basic Problem Solving Search strategy  Problem can be solved by searching for a solution. An attempt is to transform initial state of a problem into some.
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
Top-down Parsing lecture slides from C OMP 412 Rice University Houston, Texas, Fall 2001.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Top-Down Parsing CS 671 January 29, CS 671 – Spring Where Are We? Source code: if (b==0) a = “Hi”; Token Stream: if (b == 0) a = “Hi”; Abstract.
Top-down Parsing. 2 Parsing Techniques Top-down parsers (LL(1), recursive descent) Start at the root of the parse tree and grow toward leaves Pick a production.
Top-Down Parsing.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 3.
Parsing and Code Generation Set 24. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program,
1 Introduction to Parsing. 2 Outline l Regular languages revisited l Parser overview Context-free grammars (CFG ’ s) l Derivations.
GRAMMARS & PARSING. Parser Construction Most of the work involved in constructing a parser is carried out automatically by a program, referred to as a.
COMP 3438 – Part II-Lecture 5 Syntax Analysis II Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 2 Context-Free Languages Some slides are in courtesy.
UMBC  CSEE   1 Chapter 4 Chapter 4 (b) parsing.
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing.
Syntax Analysis By Noor Dhia Syntax analysis:- Syntax analysis or parsing is the most important phase of a compiler. The syntax analyzer considers.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Parsing IV Bottom-up Parsing
Basic Parsing with Context Free Grammars Chapter 13
4 (c) parsing.
Lecture 7: Introduction to Parsing (Syntax Analysis)
Parsing IV Bottom-up Parsing
CSA2050 Introduction to Computational Linguistics
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Amirkabir University of Technology Computer Engineering Faculty AILAB Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Grammar and Parsing To Examine how the syntatic structure of a sentence can be computed, you must consider two things:  Grammar: A formal specification of the allowable structures in the language.  Parsing: The method of analysing a sentence to determine its structure according to the grammar. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Parsing Algorithm  Parsing Algorithm A parsing algorithm is a procedure that searches through various ways of combining grammatical rules to find a combination that generates a tree that could be the structure of the input sentence. For simplification, the parser simply returns a yes or no answer as to whether a certain sentence is accepted by grammar or not.  Top-Down Parser A top-down parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that matches the classes of the words in the input sentence.  Bottom-up Parser A bottom-up parser Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Parsing Techniques  Top-Down Parsing A top-down parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that matches the classes of the words in the input sentence.  Bottom-up Parsing A bottom-up parser starts with the terminal symbols in the input sentence The parser successively rewrite a sequence of terminal symbols and/or terminal symbols with the left hand side of a grammar rule. (always a single non-terminal in CFG) Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Top-down Parser  Symbol list: State of parse The state of the parser at any given time that is represented as a list of symbols that are the result of operations applied so far and a number indicating the current position in the sentence. Example: Parser starts with ((S)1). Rule Current State S  NP VP ((NP VP) 1) NP  ART N ((ART N VP) 1)  Lexicon Efficiently stores the possible categories for each word. With lexicon the grammar need not contain any lexical rules. Example: cried: Vdogs: N,Vthe: ART Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

State of Parser The number indicates the current position in the sentence. Positions fall between the words, with 1 being the position before the first word. Example: 1 The 2 dogs 3 cried 4 Parse state: ((N VP) 2) The parser needs to find an N followed by a VP, starting at position 2. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

New State Generation a) If the first symbol is a lexical symbol and the next word can belong that lexical category, then update the state by removing the first symbol And position number. Example: dogs is N in the lexicon Next parser state: ((VP) 3) : The parser needs to find a VP starting at position 3. b) If the first symbol is a non-terminal then it is rewritten using a rule from the grammar. Example: New state by applying VP  V is ((V) 3) or by applying V  V NP is ((V NP) 3) Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Backtracking Guarantee for finding a parse: Systematically explore every possible new state. Backtracking is simple technique for exploring every possible new state. In backtracking all possible new states all generated from the current state. One of these is picked up as the next state and the rest are saved as backup states. If the current state can not lead to a solution, a new current state is picked up from the list of backup states. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

A Top-down Parsing Algorithm (1/2) Possibilities list: a list of possible states. current state backup states Example: (((N) 2) ((NAME) 1) ((ADJ N) 1)) Current state: The first element of possibilities list in form of symbol list Backup states: Remaining elements of search states. An alternate symbol list-word position pair. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

A Top-down Parsing Algorithm (2/2) 1- Select the current state: Take the first state off possibilities list list and call it C. If the possibilities list is empty, then the algorithm fails. 2- If C consists of an empty symbol list and the word position at the end of the sentence, then the algorithm succeeds. 3- Otherwise, generate the next possible states. 3.1 If the first symbol is a lexical symbol and the next word can belong that lexical category, then update the state by removing the first symbol and position number and add it to the possibilities list. 3.2 Otherwise, If the first symbol on the symbol list of C is a non-terminal then it is rewritten for each rule of the grammar that can rewrite that non-terminal symbol and add them all to the possibilities list. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

An Example Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, S  NP VP4. VP  V 2.NP  ART N5. VP  V NP 3.NP  ART ADJ N] Sentence: The dogs cried Step Current StateBackup State 1. ((S) 1) 2. ((NP VP) 1) 3.((ART N VP) 1) ((ART ADJ N VP) 1) 4. ((N VP)) 2) ((ART ADJ N VP) 1) 5. ((VP) 3) ((ART ADJ N VP) 1 6. ((V) 3) ((V NP) 3) V is matched with cried((ART ADJ N VP) 1

An Example (Contd.) Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Parsing as Search Procedure We can think parsing as a special case of a search problem. A top-down parser can be described in terms of the following generalized search procedure. The possibilities list is initially set to the start state of the parse. Then you repeat the following steps until you have success or failure: 1. Select the first state from the possibilities list (and remove it from the list). 2. Generate the new states by trying every possible option from the selected state (there may be none if we are on a bad path). 3. Add the states generated in step 2 to the possibilities list. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Parse Strategies In a depth-first strategy, the possibilities list is a stack. In other words, step 1 always takes the first element off the list, and step 3 always puts the new states on the front of the list, yielding a last-in first-out (LIFO) strategy. In a breadth-first strategy the possibilities list is manipulated as a queue. Step 3 adds the new positions onto the end of the list, rather than the beginning, yielding a first-in first-out (FIFO) strategy. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Search Tree for Two Parse Strategies Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Difference Between Parse Strategies With the depth-first strategy, oneinterpretation is considered and expanded until it fails; only then is thesecond one considered. With the breadth-first strategy, both interpretationsare considered alternately, each being expanded one step at a time. Both depth-first and breadth-first searches found the solution but searched the space in a different order. A depth-first search often moves quickly to a solution but in other cases may spend considerable time pursuing futile paths. The breadth-first strategy explores each possible solution to a certain depth before moving on. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Bottom-Up Parsing The main difference between top-down and bottom-up parsers is the way the grammar rules are used. For example, consider the rule NP -> ART ADJ N In a bottom-up parser you use the rule to take a sequence ART ADJ N that you have found and identify it as an NP. The basic operation in bottom-up parsing then is to take a sequence of symbols and match it to the right-hand side of the rules. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Bottom-Up Parser You could build a bottom-up parser simply by formulating matching process as a search process. The state would simply consist of a symbol list, starting with the words in the sentence. Successor states could be generated by exploring all possible ways to rewrite a word by its possible lexical categories replace a sequence of symbols that matches the right-hand side of a grammar rule by its left-hand side symbol Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Motivation for Chart Parsers Such a simple implementation would be prohibitively expensive, as the parser would tend to try the same matches again and again, thus duplicating much of its work unnecessarily. To avoid this problem, a data structure called a chart is introduced that Allows the parser to store the partial results of the matching it has done so far so that the work need not be reduplicated. Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Transition Network Grammars Context-free rewrite rules as a formalism for representing grammars. Transition network grammars are another formalism for representing grammars. This formalism is based on the notion of a transition network consisting of nodes and labeled arcs. Recursive Transition Network (RTN) Augmented Transition Network (ATN) Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Recursive Transition Network Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology,  Simple transition networks are often called finite state machines (FSMs).  Finite state machines are equivalent in expressive power to regular  grammars and thus are not powerful enough to describe all languages that can be described by a CFG.  To get the descriptive power of CFGs, you need a notion of recursion  in the network grammar.  A recursive transition network (RTN) is like a simple transition  network, except that it allows arc labels to refer to other networks  as well as word categories.

An Example of RTN Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, Arc Type ExampleArc Type Example How Used CATnounsucceeds only if current word is of the named category WRDofsucceeds only if current word is identical to the label PUSHNPsucceeds only if named network can be successfully traversed JUMPjumpalways succeeds POPpopsucceeds and signals the successful end of the network

Top-down Parsing with RTN Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, An algorithm for parsing with RTNs can be developed along the same lines as the algorithms for parsing CFGs. The state of the parse at any moment can be represented by the following: current position - a pointer to the next word to be parsed. current node - the node at which you are located in the network. return points - a stack of nodes in other networks where you will continue if you pop from the current network.

An Algorithm for Searching an RTN Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, Assumes if you can follow an arc, it will be the correct one in the final parse. Algorithm: Say you are in the middle of a parse and know the three pieces of information just cited. You can leave the current node and traverse an arc in the following cases: Case 1: If arc names word category and next word in sentence is in that category, Then (1) update current position to start at the next word; (2) update current node to the destination of the arc. Case 2: If arc is a push arc to a network N, Then (1) add the destination of the arc onto return points; (2) update current node to the starting node in network N. Case 3: If arc is a pop arc and return points list is not empty, Then (1) remove first return point and make it current node. Case 4: If arc is a pop arc, return points list is empty and there are no words left, Then (1) parse completes successfully.

An RTN Grammar Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, 1381.

Trace of Top-Down Parse Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, The 2 old 3 man 4 cried 5

Trace of Top-Down Parse with Backtrack Natural Language Processing Course, Parsing, Ahmad Abdollahzadeh, Computer Engineering Faculty, Amirkabir University of Technology, StepCurrent StateArc to be FollowedBackup States 1.(S, 1, NIL)S/1NIL 2.(NP, 1, (S1))NP/2 (& NP/3 for backup) NIL 3.(NP1, 2, (S1))NPl/2(NP2, 2, (S1)) 4.(NP2, 3, (S1))NP2/l(NP2, 2, (S1)) 5.(S1, 3, NIL)no arc can be followed (NP2, 2, (S1)) 6.(NP2, 2, (S1))NP2/lNIL 7.(S1, 2, NIL)S1/lNIL 8.(S2, 3, NIL)S2/2NIL 9.(NP, 3, (S2))NP/1NIL 10.(N7PI, 4, (S2))NP1/2NIL 11.(NP2, 5, (S2))NP2/1NIL 12.(S2, 5, NIL)S2/1NIL 13.parse succeeds NIL