CSA2050 Introduction to Computational Linguistics Parsing I.

Slides:



Advertisements
Similar presentations
Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Advertisements

 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
CSA2050: DCG I1 CSA2050 Introduction to Computational Linguistics Lecture 8 Definite Clause Grammars.
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
PSY 369: Psycholinguistics Some basic linguistic theory part2.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Natural Language Processing - Feature Structures - Feature Structures and Unification.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Features and Unification
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Syntax LING October 11, 2006 Joshua Tauberer.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
1 Introduction: syntax and semantics Syntax: a formal description of the structure of programs in a given language. Semantics: a formal description of.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Lecture 6 : Revision.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Albert Gatt Corpora and Statistical Methods Lecture 11.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Rules, Movement, Ambiguity
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
Natural Language Processing
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
SYNTAX.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
NATURAL LANGUAGE PROCESSING
NLP. Introduction to NLP #include int main() { int n, reverse = 0; printf("Enter a number to reverse\n"); scanf("%d",&n); while (n != 0) { reverse =
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
1 Statistical methods in NLP Course 5 Diana Trandab ă ț
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 King Faisal University.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Basic Parsing with Context Free Grammars Chapter 13
SYNTAX.
CS 388: Natural Language Processing: Statistical Parsing
CS 388: Natural Language Processing: Syntactic Parsing
LING/C SC 581: Advanced Computational Linguistics
Natural Language - General
CSA2050 Introduction to Computational Linguistics
David Kauchak CS159 – Spring 2019
David Kauchak CS159 – Spring 2019
Presentation transcript:

CSA2050 Introduction to Computational Linguistics Parsing I

Apr MRCSA Parsing I2 Why Is Syntax Important? The presidential candidate who was extremely popular smiled broadly. How many presidential candidates are implied? 1 or >1?

Apr MRCSA Parsing I3 Why Is Syntax Important? The presidential candidate, who was extremely popular, smiled broadly. How many presidential candidates are implied? 1 or >1?

Apr MRCSA Parsing I4 Why Is Syntax Important? The presidential candidate, who was extremely popular, smiled broadly. The presidential candidate who was extremely popular smiled broadly. …because the syntactic structure has an important bearing on the meaning

Apr MRCSA Parsing I5 PP Attachment The policeman saw a burglar with a gun The policemen saw a burglar with a telescope PP can modify V or N In the first case, it modifes V In the second, it modifies N

Apr MRCSA Parsing I6 PP modifies V D N V D N P D N The policemen saw the burglar with a telescope S NP VP PP NP

Apr MRCSA Parsing I7 PP modifies N D N V D N P D N The policemen saw a burglar with a gun S NP VP PP NP

Apr MRCSA Parsing I8 Issue In general, how can we determine whether a prepositional phrase modifies the preceding noun or verb? Knowledge based approach must encode, for example burglars often have guns people can see things with a telescope + a lot of other things Statistical approach

Apr MRCSA Parsing I9 PP Attachment – Statistical Approach The Prepositional Phrase Attachment Corpus, included with NLTK as ppattach, makes it possible for us to study this question systematically. Derived from the IBM-Lancaster Treebank of Computer Manuals and the Penn Treebank, Distils only the essential information about PP attachment.

Apr MRCSA Parsing I10 Corpus Example Sentence Original Four of the five surviving workers have asbestos- related diseases, including three with recently diagnosed cancer. including three with recently diagnosed cancer versus including three by adding two and one

Apr MRCSA Parsing I11 Distilled Information in Corpus Original Four of the five surviving workers have asbestos- related diseases, including three with recently diagnosed cancer. ppattach corpus 16 including three with cancer N i/d head verb head of obj prep head of pp’s np N or V

Apr MRCSA Parsing I12 Further examples allow visits between families N allow visits on peninsula V acquired interest in firm N acquired interest in 1986 V Etc.

Apr MRCSA Parsing I13 Minimal Pair Extraction NLTK contains primitives that allow us to to extract minimal pairs where we hold NP1, PREP and NP2 constant and get different attachments with respect to verb, e.g. received (NP offer) (PP from group) V rejected (NP offer (PP from group)) N receive x from y reject x

Apr MRCSA Parsing I14 Why Syntactic Structure? Helps to make explicit how a sentence says who did what to whom The fierce dog bit the man Key idea is to identify noun phrases around the verb We can do this in terms of sequences of POS tags, e.g. D JJ* N But there are limitations to this approach The child with a fierce dog bit the man Here child is biting but D JJ* N still precedes “bit” so fierce dog remains the thing doing the biting.

Apr MRCSA Parsing I15 Constituency We could repair with a more complex regular expression such as DT JJ* NN (IN DT JJ* NN)* But this is defeated by The seagull that attacked the child with the fierce dog bit the man Basic problem is that we need a richer notion of constituency – how the words fit together to form a noun phrase.

Apr MRCSA Parsing I16 Recursion – Central Embedding The dog barked

Apr MRCSA Parsing I17 Recursion – Central Embedding The dog barked The dog the cat scratched barked

Apr MRCSA Parsing I18 Recursion – Central Embedding The dog barked The dog the cat scratched barked The dog the cat the horse liked scratched barked.

Apr MRCSA Parsing I19 Recursion – Central Embedding The dog barked The dog the cat scratched barked The dog the cat the horse liked scratched barked. The dog the cat the horse the man rode liked scratched barked.

Apr MRCSA Parsing I20 Chomsky Hierarchy

Apr MRCSA Parsing I21 CFG Review A CFG is a 4-tuple (N, Σ, P, S), where: N is a set of non-terminal symbols (the category labels); Σ is a set of terminal symbols (e.g., lexical items); P is a set of productions of the form A → α, where – A is a non-terminal, and – α is a string of symbols from (N U Σ)* (i.e., strings of either terminals or non-terminals); S is the start symbol. A derivation of a string from a non-terminal N in P is the result or trace of successively applying individual productions in P to A.

Apr MRCSA Parsing I22 Different Derivations for the Same Sentence Derivation 1 NP Det N PP the N PP the dog PP the dog P NP the dog with NP the dog with Det N the dog with a N the dog with a telescope Derivation 2 NP Det N PP Det N P NP Det N with NP The N with NP The N with a N

Apr MRCSA Parsing I23 What Does Context Free Mean? LHS of rule is just one symbol. Can have NP -> Det N Cannot have X NP Y -> X Det N Y

Apr MRCSA Parsing I24 Grammar Symbols Symbols of the grammar fall into three categories: 1. Non Terminal Symbols 2. Terminal Symbols 3. Parts of Speech We will sometimes not distinguish between 2 and 3

Apr MRCSA Parsing I25 Technical Aspects of CFGs Rules of the form LHS -> RHS LHS comprises at most one NT symbol RHS any combination of NT and T symbols Finite State (type 3) grammars have different restrictions LHS comprises at most one NT symbol RHS combination of T symbols with at most one NT. Right linear grammar: NT must come at extreme left Left linear grammar: NT must come at extreme right

Apr MRCSA Parsing I26 A Simple Grammar + Lexicon grammar: S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill S NP N Johnkicks NPV VP N Bill

Apr MRCSA Parsing I27 Grammar versus Parser A grammar/lexicon defines a relation between sentences generated by the grammar and their respective syntactic structures. The grammar does not tell us how to actually go about discovering the structure of a sentence. A parsing algorithm is an effective procedure for carrying out that discovery. A parser implements a parsing algorithm. Recursive descent parsing.