1 Statistical methods in NLP Course 5 Diana Trandab ă ț 2015-2016.

Slides:



Advertisements
Similar presentations
Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
Advertisements

Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
1 CONTEXT-FREE GRAMMARS. NLE 2 Syntactic analysis (Parsing) S NPVP ATNNSVBD NP AT NNthechildrenate thecake.
Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.
Chapter 3: Formal Translation Models
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
Lecture 16 Oct 18 Context-Free Languages (CFL) - basic definitions Examples.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Natural Language Processing Lecture 6 : Revision.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
A sentence (S) is composed of a noun phrase (NP) and a verb phrase (VP). A noun phrase may be composed of a determiner (D/DET) and a noun (N). A noun phrase.
Context Free Grammars CIS 361. Introduction Finite Automata accept all regular languages and only regular languages Many simple languages are non regular:
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
The Chomsky Hierarchy. Sentences The sentence as a string of words E.g I saw the lady with the binoculars string = a b c d e b f.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
Albert Gatt Corpora and Statistical Methods Lecture 11.
For Wednesday Read chapter 23 Homework: –Chapter 22, exercises 1,4, 7, and 14.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
November 2011CLINT-LN CFG1 Computational Linguistics Introduction Context Free Grammars.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
Rules, Movement, Ambiguity
Chapter 3 Describing Syntax and Semantics
CSA2050 Introduction to Computational Linguistics Parsing I.
Natural Language - General
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Introduction Finite Automata accept all regular languages and only regular languages Even very simple languages are non regular (  = {a,b}): - {a n b.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
CSA3050: NLP Algorithms Sentence Grammar NLP Algorithms.
The Chomsky Hierarchy.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Computational Lexicology, Morphology and Syntax Diana Trandab ă ț Academic year
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
CSCI 2670 Introduction to Theory of Computing September 16, 2004.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.
Natural Language Processing Vasile Rus
CS 388: Natural Language Processing: Statistical Parsing
CS 388: Natural Language Processing: Syntactic Parsing
Probabilistic and Lexicalized Parsing
CHAPTER 2 Context-Free Languages
CS : Language Technology for the Web/Natural Language Processing
CSA2050 Introduction to Computational Linguistics
David Kauchak CS159 – Spring 2019
David Kauchak CS159 – Spring 2019
Presentation transcript:

1 Statistical methods in NLP Course 5 Diana Trandab ă ț

The sentence as a string of words E.g I saw the lady with the binoculars string = a b c d e b f

The relations of parts of a string to each other may be different I saw the lady with the binoculars is stucturally ambiguous Who has the binoculars?

[ I ] saw the lady [ with the binoculars ] = [a] b c d [e b f] I saw [ the lady with the binoculars] = a b [c d e b f]

How can we represent the difference? By assigning them different structures. We can represent structures with 'trees'. I read thebook

a. I saw the lady with the binoculars S NP VP VNP NPPP Isaw the lady with the binoculars I saw [the lady with the binoculars]

b. I saw the lady with the binoculars S NP VP VPPP I saw the lady with the binoculars I [ saw the lady ] with the binoculars

birds fly S NPVP NV birdsfly S →NPVP NP → N VP → V Syntactic rules

S NPVP birdsfly a b ab = string

S A B a b S → A B A → a B → b

Rules Assumption: natural language grammars are a rule-based systems What kind of grammars describe natural language phenomena? What are the formal properties of grammatical rules?

The Chomsky Hierarchy

Chomsky (1957) Syntactic Structures. The Hague: Mouton Chomsky, N. and G.A. Miller (1958) Finite-state languages Information and Control 1, Chomsky (1959) On certain formal properties of languages. Information and Control 2, The Chomsky Hierarchy

SYNTAX (phrase/sentence formation) SENTENCE : The boykissed the girl S UBJECTPREDICATE NOUN PHRASEVERB PHRASE ART + NOUNVERB + NOUN PHRASE S→NPVP VP→VNP NP→ARTN

Chomsky Hierarchy 0.Type 0 (recursively enumerable) languages The only restriction on rules: left-hand side cannot be the empty string (* Ø  …….) 1. Context-Sensitive languages - Context-Sensitive (CS) rules 2. Context-Free languages - Context-Free (CF) rules 3. Regular languages - Non-Context-Free (CF) rules 0 ⊇ 1 ⊇ 2 ⊇ 3 a ⊇ b meaning a properly includes b (a is a superset of b), i.e. b is a proper subset of a or b is in a

a b a c b d f g Superset/subset relation S 1 S 2 S 1 is a subset of S 2 ; S 2 is a superset of S 1

Generative power 0.Type 0 (recursively enumerable) languages - is the most powerful system Type 3(regular language) - is the least powerful

Rule Type – 3 Name: Regular Example: Finite State Automata (Markov-process Grammar) Rule type: a) right-linearb) or left-linear A  xB orA  Bx or A  xA  x with: A, B = auxiliary nodes and x = terminal node Generates: a m b n with m,n  1 Cannot guarantee that there are as many a’s as b’s; no embedding

Example of regular grammar S→theA A→catB A→mouseB A→duckB B→bitesC B→seesC B→eatsC C→theD D→boy D→girl D→monkey the cat bites the boy the mouse eats the monkey the duck sees the girl

More regular grammars Grammar 1: Grammar 2:A → a A → a BA → B a B → b AB → A b Grammar 3:Grammar 4: S→a AA → A a S→b BA → B a A→a SB → b B→b b SB → A b S→  A → a

Example of non regular grammars Grammar 5:Grammar 6: S→A BA → a S→b BA → B a A→a SB → b B→b b SB → b A S→ 

NP article NP1 adjectiveNP1 nounNP2 NP → article NP1 NP1 →adjective NP1 NP1 → noun NP2

A parse tree S root node NPVP non- terminal NP nodes n v detn terminal nodes

Rule Type – 2 Name: Context Free Example: Phrase Structure Grammars/ Push-Down Automata Rule type: A   with: A = auxiliary node  = any number of terminal or auxiliary nodes Recursiveness (centre embedding) allowed: A   A 

Rule Type – 1 The following languages cannot be generated by a CF grammar (by pumping lemma): a n b m c n d m Swiss German: A string of dative nouns (e.g. aa), followed by a string of accusative nouns (e.g. bbb), followed by a string of dative- taking verbs (cc), followed by a string of accusative-taking verbs (ddd) = aabbbccddd = a n b m c n d m

More on Context Free Grammars (CFGs) Sets of rules expressing how symbols of the language fit together, e.g. S -> NP VP NP -> Det N Det -> the N -> dog

What Does Context Free Mean? LHS of rule is just one symbol. Can have NP -> Det N Cannot have X NP Y -> X Det N Y

Grammar Symbols Non Terminal Symbols Terminal Symbols – Words – Preterminals

Non Terminal Symbols Symbols which have definitions Symbols which appear on the LHS of rules S -> NP VP NP -> Det N Det -> the N -> dog

Non Terminal Symbols Same Non Terminals can have several definitions S -> NP VP NP -> Det N NP -> N Det -> the N -> dog

Terminal Symbols Symbols which appear in final string Correspond to words Are not defined by the grammar S -> NP VP NP -> Det N Det -> the N -> dog

Parts of Speech (POS) NT Symbols which produce terminal symbols are sometimes called pre-terminals S -> NP VP NP -> Det N Det -> the N -> dog Sometimes we are interested in the shape of sentences formed from pre-terminals Det N V Aux N V D N

CFG - formal definition A CFG is a tuple (N, ,R,S) N is a set of non-terminal symbols  is a set of terminal symbols disjoint from N R is a set of rules each of the form A   where A is non-terminal S is a designated start-symbol

CFG - Example grammar: S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill N = {S, NP, VP, N, V}  = {kicks, John, Bill} R = (see opposite) S = “S”

Exercise Write grammars that generate the following languages, for m > 0 (ab) m a n b m a n b n Which of these are Regular? Which of these are Context Free?

(ab) m for m > 0 S -> a b S -> a b S

(ab) m for m > 0 S -> a b S -> a b S S -> a X X -> b Y Y -> a b Y -> S

anbmanbm S -> A B A -> a A -> a A B -> b B -> b B S -> a AB AB -> a AB AB -> B B -> b B -> b B

Grammar Defines a Structure grammar: S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill S NP N Johnkicks NPV VP N Bill

Different Grammar Different Stucture grammar: S  NP NP NP  N V NP  N lexicon: V  kicks N  John N  Bill S NP N Bill John V N NP kicks

Which Grammar is Best? The structure assigned by the grammar should be appropriate. The structure should – Be understandable – Allow us to make generalisations. – Reflect the underlying meaning of the sentence.

Ambiguity A grammar is ambiguous if it assigns two or more structures to the same sentence. NP  NP CONJ NP NP  N lexicon: CONJ  and N  John N  Bill The grammar should not generate too many possible structures for the same sentence.

Criteria for Evaluating Grammars Does it undergenerate? Does it overgenerate? Does it assign appropriate structures to sentences it generates? Is it simple to understand? How many rules are there? Does it contain just a few generalisations or is it full of special cases? How ambiguous is it? How many structures does it assign for a given sentence?

Probabilistic Context Free Grammar (PCFG) A PCFG is a probabilistic version of a CFG where each production has a probability. String generation is now probabilistic where production probabilities are used to non- deterministically select a production for rewriting a given non-terminal.

Characteristics of PCFGs In a PCFG, the probability P(A  β) expresses the likelihood that the non-terminal A will expand as β. – e.g. the likelihood that S  NP VP (as opposed to S  VP, or S  NP VP PP, or… ) can be interpreted as a conditional probability: – probability of the expansion, given the LHS non-terminal – P(A  β) = P(A  β|A) Therefore, for any non-terminal A, probabilities of every rule of the form A  β must sum to 1 – If this is the case, we say the PCFG is consistent

Simple PCFG for English S → NP VP S → Aux NP VP S → VP NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP Grammar Prob Det → the | a | that | this Noun → book | flight | meal | money Verb → book | include | prefer Pronoun → I | he | she | me Proper-Noun → Houston | NWA Aux → does 1.0 Prep → from | to | on | near | through Lexicon

Parse tree and Sentence Probability Assume productions for each node are chosen independently. Probability of a parse tree (derivation) is the product of the probabilities of its productions. Resolve ambiguity by picking most probable parse tree. Probability of a sentence is the sum of the probabilities of all of its derivations.

Probability of a tree vs. a sentence – simply the multiplication of the probability of every rule (node) that gives rise to t (i.e. the derivation of t) – this is both the joint probability of t and s, and the probability of t alone why?

P(t,s) = P(t) But P(s|t) must be 1, since the tree t is a parse of all the words of s

Picking the best parse in a PCFG A sentence will usually have several parses – we usually want them ranked, or only want the n-best parses – we need to focus on P(t|s,G) probability of a parse tree, given our sentence and our grammar – definition of the best parse for s:

Probability of a sentence Simply the sum of probabilities of all parses of that sentence – since s is only a sentence if it’s recognised by G, i.e. if there is some t for s under G all those trees which “yield” s

Example PCFG Rules & Probabilities S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0 P  with1.0

Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = VP 0.4

Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = The gunman sprayed the building with bullets.

Some Features of PCFGs A PCFG gives some idea of the plausibility of different parses. However, the probabilities are based on structural factors and not lexical ones. PCFG are good for grammar induction. PCFGs are robust. PCFGs give a probabilistic language model for English. The predictive power of a PCFG (measured by entropy) tends to be greater than for an HMM. PCFGs are not good models alone but they can be combined with a tri-gram model. PCFGs have certain biases which may not be appropriate.

Restrictions PCFG only consider the case of Chomsky Normal Form Grammars, a Context-Free Grammar in which: Every rule LHS is a non-terminal Every rule RHS consists of either a single terminal or two non- terminals. Examples: » A  B C » NP  Nominal PP » A  a » Noun  man But not: » NP  the Nominal » S  VP

Converting a CFG to CNF 1.Rules that mix terminals and non-terminals on the RHS – E.g. NP  the Nominal – Solution: Introduce a dummy non-terminal to cover the original terminal – E.g. Det  the Re-write the original rule: – NP  Det Nominal – Det  the

Converting a CFG to CNF 2.Rules with a single non-terminal on the RHS (called unit productions) such as NP  Nominal – Solution: Find all rules that have the form Nominal ... – Nominal  Noun PP – Nominal  Det Noun Re-write the above rule several times to eliminate the intermediate non-terminal: – NP  Noun PP – NP  Det Noun – Note that this makes our grammar “flatter”

Converting a CFG to CNF 3.Rules which have more than two items on the RHS – E.g. NP  Det Noun PP Solution: – Introduce new non-terminals to spread the sequence on the RHS over more than 1 rule. Nominal  Noun PP NP  Det Nominal

The outcome If we parse a sentence with a CNF grammar, we know that: – Every phrase-level non-terminal (above the part of speech level) will have exactly 2 daughters. NP  Det N – Every part-of-speech level non-terminal will have exactly 1 daughter, and that daughter is a terminal: N  lady

Problems with Probabilistic CFG Models Main problem with Probabilistic CFG Model: it does not take contextual effects into account. Example: Pronouns are much more likely to appear in the subject position of a sentence than an object position. But in a PCFG, the rule NP  Pronoun has only one probability. One simple possible extension -- make probabilities dependent on first word of the constituent. Instead of P(C   i |C), use P(C   i |C,w) where w is the first word in C. Example: the rule VP  V NP PP is used 93% of the time with the verb put, but only 10% of the time for like. Requires estimating a much larger set of probabilities, and can significantly improve disambiguation performance.

Probabilistic Lexicalized CFGs A solution to some of the problems with Probabilistic CFGs is to use Probabilistic Lexicalized CFGs. Use the probabilities of particular words in the computation of the probabilities in the derivation

Lexicalised PCFGs Attempt to weaken the lexical independence assumption. Most common technique: –mark each phrasal head (N,V, etc) with the lexical material –this is based on the idea that the most crucial lexical dependencies are between head and dependent –E.g.: Charniak 1997, Collins 1999

Lexicalised PCFGs: Matt walks Makes probabilities partly dependent on lexical content. P(VP  VBD|VP) becomes: P(VP  VBD|VP, h(VP)=walk) NB: normally, we can’t assume that all heads of a phrase of category C are equally probable. S(walks) NP(Matt) NNP(Matt) Matt VP(walk) VBD(walk) walks

Example

Great! See you upstairs!