Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Statistical methods in NLP Course 5 Diana Trandab ă ț 2015-2016.

Similar presentations


Presentation on theme: "1 Statistical methods in NLP Course 5 Diana Trandab ă ț 2015-2016."— Presentation transcript:

1 1 Statistical methods in NLP Course 5 Diana Trandab ă ț 2015-2016

2 The sentence as a string of words E.g I saw the lady with the binoculars string = a b c d e b f

3 The relations of parts of a string to each other may be different I saw the lady with the binoculars is stucturally ambiguous Who has the binoculars?

4 [ I ] saw the lady [ with the binoculars ] = [a] b c d [e b f] I saw [ the lady with the binoculars] = a b [c d e b f]

5 How can we represent the difference? By assigning them different structures. We can represent structures with 'trees'. I read thebook

6 a. I saw the lady with the binoculars S NP VP VNP NPPP Isaw the lady with the binoculars I saw [the lady with the binoculars]

7 b. I saw the lady with the binoculars S NP VP VPPP I saw the lady with the binoculars I [ saw the lady ] with the binoculars

8 birds fly S NPVP NV birdsfly S →NPVP NP → N VP → V Syntactic rules

9 S NPVP birdsfly a b ab = string

10 S A B a b S → A B A → a B → b

11 Rules Assumption: natural language grammars are a rule-based systems What kind of grammars describe natural language phenomena? What are the formal properties of grammatical rules?

12 The Chomsky Hierarchy

13 Chomsky (1957) Syntactic Structures. The Hague: Mouton Chomsky, N. and G.A. Miller (1958) Finite-state languages Information and Control 1, 99-112 Chomsky (1959) On certain formal properties of languages. Information and Control 2, 137-167 The Chomsky Hierarchy

14 SYNTAX (phrase/sentence formation) SENTENCE : The boykissed the girl S UBJECTPREDICATE NOUN PHRASEVERB PHRASE ART + NOUNVERB + NOUN PHRASE S→NPVP VP→VNP NP→ARTN

15 Chomsky Hierarchy 0.Type 0 (recursively enumerable) languages The only restriction on rules: left-hand side cannot be the empty string (* Ø  …….) 1. Context-Sensitive languages - Context-Sensitive (CS) rules 2. Context-Free languages - Context-Free (CF) rules 3. Regular languages - Non-Context-Free (CF) rules 0 ⊇ 1 ⊇ 2 ⊇ 3 a ⊇ b meaning a properly includes b (a is a superset of b), i.e. b is a proper subset of a or b is in a

16 a b a c b d f g Superset/subset relation S 1 S 2 S 1 is a subset of S 2 ; S 2 is a superset of S 1

17 Generative power 0.Type 0 (recursively enumerable) languages - is the most powerful system... 3. Type 3(regular language) - is the least powerful

18 Rule Type – 3 Name: Regular Example: Finite State Automata (Markov-process Grammar) Rule type: a) right-linearb) or left-linear A  xB orA  Bx or A  xA  x with: A, B = auxiliary nodes and x = terminal node Generates: a m b n with m,n  1 Cannot guarantee that there are as many a’s as b’s; no embedding

19 Example of regular grammar S→theA A→catB A→mouseB A→duckB B→bitesC B→seesC B→eatsC C→theD D→boy D→girl D→monkey the cat bites the boy the mouse eats the monkey the duck sees the girl

20 More regular grammars Grammar 1: Grammar 2:A → a A → a BA → B a B → b AB → A b Grammar 3:Grammar 4: S→a AA → A a S→b BA → B a A→a SB → b B→b b SB → A b S→  A → a

21 Example of non regular grammars Grammar 5:Grammar 6: S→A BA → a S→b BA → B a A→a SB → b B→b b SB → b A S→ 

22 NP article NP1 adjectiveNP1 nounNP2 NP → article NP1 NP1 →adjective NP1 NP1 → noun NP2

23 A parse tree S root node NPVP non- terminal NP nodes n v detn terminal nodes

24 Rule Type – 2 Name: Context Free Example: Phrase Structure Grammars/ Push-Down Automata Rule type: A   with: A = auxiliary node  = any number of terminal or auxiliary nodes Recursiveness (centre embedding) allowed: A   A 

25 Rule Type – 1 The following languages cannot be generated by a CF grammar (by pumping lemma): a n b m c n d m Swiss German: A string of dative nouns (e.g. aa), followed by a string of accusative nouns (e.g. bbb), followed by a string of dative- taking verbs (cc), followed by a string of accusative-taking verbs (ddd) = aabbbccddd = a n b m c n d m

26 More on Context Free Grammars (CFGs) Sets of rules expressing how symbols of the language fit together, e.g. S -> NP VP NP -> Det N Det -> the N -> dog

27 What Does Context Free Mean? LHS of rule is just one symbol. Can have NP -> Det N Cannot have X NP Y -> X Det N Y

28 Grammar Symbols Non Terminal Symbols Terminal Symbols – Words – Preterminals

29 Non Terminal Symbols Symbols which have definitions Symbols which appear on the LHS of rules S -> NP VP NP -> Det N Det -> the N -> dog

30 Non Terminal Symbols Same Non Terminals can have several definitions S -> NP VP NP -> Det N NP -> N Det -> the N -> dog

31 Terminal Symbols Symbols which appear in final string Correspond to words Are not defined by the grammar S -> NP VP NP -> Det N Det -> the N -> dog

32 Parts of Speech (POS) NT Symbols which produce terminal symbols are sometimes called pre-terminals S -> NP VP NP -> Det N Det -> the N -> dog Sometimes we are interested in the shape of sentences formed from pre-terminals Det N V Aux N V D N

33 CFG - formal definition A CFG is a tuple (N, ,R,S) N is a set of non-terminal symbols  is a set of terminal symbols disjoint from N R is a set of rules each of the form A   where A is non-terminal S is a designated start-symbol

34 CFG - Example grammar: S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill N = {S, NP, VP, N, V}  = {kicks, John, Bill} R = (see opposite) S = “S”

35 Exercise Write grammars that generate the following languages, for m > 0 (ab) m a n b m a n b n Which of these are Regular? Which of these are Context Free?

36 (ab) m for m > 0 S -> a b S -> a b S

37 (ab) m for m > 0 S -> a b S -> a b S S -> a X X -> b Y Y -> a b Y -> S

38 anbmanbm S -> A B A -> a A -> a A B -> b B -> b B S -> a AB AB -> a AB AB -> B B -> b B -> b B

39 Grammar Defines a Structure grammar: S  NP VP NP  N VP  V NP lexicon: V  kicks N  John N  Bill S NP N Johnkicks NPV VP N Bill

40 Different Grammar Different Stucture grammar: S  NP NP NP  N V NP  N lexicon: V  kicks N  John N  Bill S NP N Bill John V N NP kicks

41 Which Grammar is Best? The structure assigned by the grammar should be appropriate. The structure should – Be understandable – Allow us to make generalisations. – Reflect the underlying meaning of the sentence.

42 Ambiguity A grammar is ambiguous if it assigns two or more structures to the same sentence. NP  NP CONJ NP NP  N lexicon: CONJ  and N  John N  Bill The grammar should not generate too many possible structures for the same sentence.

43 Criteria for Evaluating Grammars Does it undergenerate? Does it overgenerate? Does it assign appropriate structures to sentences it generates? Is it simple to understand? How many rules are there? Does it contain just a few generalisations or is it full of special cases? How ambiguous is it? How many structures does it assign for a given sentence?

44 Probabilistic Context Free Grammar (PCFG) A PCFG is a probabilistic version of a CFG where each production has a probability. String generation is now probabilistic where production probabilities are used to non- deterministically select a production for rewriting a given non-terminal.

45 Characteristics of PCFGs In a PCFG, the probability P(A  β) expresses the likelihood that the non-terminal A will expand as β. – e.g. the likelihood that S  NP VP (as opposed to S  VP, or S  NP VP PP, or… ) can be interpreted as a conditional probability: – probability of the expansion, given the LHS non-terminal – P(A  β) = P(A  β|A) Therefore, for any non-terminal A, probabilities of every rule of the form A  β must sum to 1 – If this is the case, we say the PCFG is consistent

46 Simple PCFG for English S → NP VP S → Aux NP VP S → VP NP → Pronoun NP → Proper-Noun NP → Det Nominal Nominal → Noun Nominal → Nominal Noun Nominal → Nominal PP VP → Verb VP → Verb NP VP → VP PP PP → Prep NP Grammar 0.8 0.1 0.2 0.6 0.3 0.2 0.5 0.2 0.5 0.3 1.0 Prob + + + + 1.0 Det → the | a | that | this 0.6 0.2 0.1 0.1 Noun → book | flight | meal | money 0.1 0.5 0.2 0.2 Verb → book | include | prefer 0.5 0.2 0.3 Pronoun → I | he | she | me 0.5 0.1 0.1 0.3 Proper-Noun → Houston | NWA 0.8 0.2 Aux → does 1.0 Prep → from | to | on | near | through 0.25 0.25 0.1 0.2 0.2 Lexicon

47 Parse tree and Sentence Probability Assume productions for each node are chosen independently. Probability of a parse tree (derivation) is the product of the probabilities of its productions. Resolve ambiguity by picking most probable parse tree. Probability of a sentence is the sum of the probabilities of all of its derivations.

48 Probability of a tree vs. a sentence – simply the multiplication of the probability of every rule (node) that gives rise to t (i.e. the derivation of t) – this is both the joint probability of t and s, and the probability of t alone why?

49 P(t,s) = P(t) But P(s|t) must be 1, since the tree t is a parse of all the words of s

50 Picking the best parse in a PCFG A sentence will usually have several parses – we usually want them ranked, or only want the n-best parses – we need to focus on P(t|s,G) probability of a parse tree, given our sentence and our grammar – definition of the best parse for s:

51 Probability of a sentence Simply the sum of probabilities of all parses of that sentence – since s is only a sentence if it’s recognised by G, i.e. if there is some t for s under G all those trees which “yield” s

52 Example PCFG Rules & Probabilities S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0 P  with1.0

53 Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225 VP 0.4

54 Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015 The gunman sprayed the building with bullets.

55 Some Features of PCFGs A PCFG gives some idea of the plausibility of different parses. However, the probabilities are based on structural factors and not lexical ones. PCFG are good for grammar induction. PCFGs are robust. PCFGs give a probabilistic language model for English. The predictive power of a PCFG (measured by entropy) tends to be greater than for an HMM. PCFGs are not good models alone but they can be combined with a tri-gram model. PCFGs have certain biases which may not be appropriate.

56 Restrictions PCFG only consider the case of Chomsky Normal Form Grammars, a Context-Free Grammar in which: Every rule LHS is a non-terminal Every rule RHS consists of either a single terminal or two non- terminals. Examples: » A  B C » NP  Nominal PP » A  a » Noun  man But not: » NP  the Nominal » S  VP

57 Converting a CFG to CNF 1.Rules that mix terminals and non-terminals on the RHS – E.g. NP  the Nominal – Solution: Introduce a dummy non-terminal to cover the original terminal – E.g. Det  the Re-write the original rule: – NP  Det Nominal – Det  the

58 Converting a CFG to CNF 2.Rules with a single non-terminal on the RHS (called unit productions) such as NP  Nominal – Solution: Find all rules that have the form Nominal ... – Nominal  Noun PP – Nominal  Det Noun Re-write the above rule several times to eliminate the intermediate non-terminal: – NP  Noun PP – NP  Det Noun – Note that this makes our grammar “flatter”

59 Converting a CFG to CNF 3.Rules which have more than two items on the RHS – E.g. NP  Det Noun PP Solution: – Introduce new non-terminals to spread the sequence on the RHS over more than 1 rule. Nominal  Noun PP NP  Det Nominal

60 The outcome If we parse a sentence with a CNF grammar, we know that: – Every phrase-level non-terminal (above the part of speech level) will have exactly 2 daughters. NP  Det N – Every part-of-speech level non-terminal will have exactly 1 daughter, and that daughter is a terminal: N  lady

61 Problems with Probabilistic CFG Models Main problem with Probabilistic CFG Model: it does not take contextual effects into account. Example: Pronouns are much more likely to appear in the subject position of a sentence than an object position. But in a PCFG, the rule NP  Pronoun has only one probability. One simple possible extension -- make probabilities dependent on first word of the constituent. Instead of P(C   i |C), use P(C   i |C,w) where w is the first word in C. Example: the rule VP  V NP PP is used 93% of the time with the verb put, but only 10% of the time for like. Requires estimating a much larger set of probabilities, and can significantly improve disambiguation performance.

62 Probabilistic Lexicalized CFGs A solution to some of the problems with Probabilistic CFGs is to use Probabilistic Lexicalized CFGs. Use the probabilities of particular words in the computation of the probabilities in the derivation

63 Lexicalised PCFGs Attempt to weaken the lexical independence assumption. Most common technique: –mark each phrasal head (N,V, etc) with the lexical material –this is based on the idea that the most crucial lexical dependencies are between head and dependent –E.g.: Charniak 1997, Collins 1999

64 Lexicalised PCFGs: Matt walks Makes probabilities partly dependent on lexical content. P(VP  VBD|VP) becomes: P(VP  VBD|VP, h(VP)=walk) NB: normally, we can’t assume that all heads of a phrase of category C are equally probable. S(walks) NP(Matt) NNP(Matt) Matt VP(walk) VBD(walk) walks

65 Example

66 Great! See you upstairs!


Download ppt "1 Statistical methods in NLP Course 5 Diana Trandab ă ț 2015-2016."

Similar presentations


Ads by Google