Presentation is loading. Please wait.

Presentation is loading. Please wait.

12/7/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.

Similar presentations


Presentation on theme: "12/7/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini."— Presentation transcript:

1 12/7/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini

2 12/7/2015CPSC503 Winter 20082 Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

3 12/7/2015CPSC503 Winter 20083 Today Sep 17 Dealing with spelling errors –Noisy channel model –Bayes rule applied to Noisy channel model (single and multiple spelling errors) Start n-grams models: Language Models (LM)

4 12/7/2015CPSC503 Winter 20084 Background knowledge Morphological analysis P(x) (prob. distribution) joint p(x,y) conditional p(x|y) Bayes rule Chain rule

5 12/7/2015CPSC503 Winter 20085 Spelling: the problem(s) Non-word isolated Non-word context Detection Correction Find the most likely correct word funn -> funny, fun,... …in this context –trust funn –a lot of funn Real-word isolated Real-word context ?! Is it an impossible (or very unlikely) word in this context?.. a wild big. Find the most likely substitution word in this context

6 12/7/2015CPSC503 Winter 20086 Spelling: Data 05% -3% - 38% 80% of misspelled words, single error –Insertion (toy -> tony) –deletion (tuna -> tua) –substitution (tone -> tony) –transposition (length -> legnth) Types of errors –Typographic (more common, user knows the correct spelling… the -> rhe) –Cognitive (user doesn’t know…… piece -> peace)

7 12/7/2015CPSC503 Winter 20087 Noisy Channel An influential metaphor in language processing is the noisy channel model Special case of Bayesian classification signal noisy signal

8 12/7/2015CPSC503 Winter 20088 Goal: Find the most likely word given some observed (misspelled) word Bayes and the Noisy Channel: Spelling Non-word isolated

9 12/7/2015CPSC503 Winter 20089 Problem P(w|O) is hard/impossible to get (why?) P(wine|winw)=

10 12/7/2015CPSC503 Winter 200810 Solution 1. Apply Bayes Rule 2. Simplify priorlikelihood

11 12/7/2015CPSC503 Winter 200811 Estimate of prior P(w) (Easy) smoothing Always verify…

12 12/7/2015CPSC503 Winter 200812 Estimate of P(O|w) is feasible (Kernighan et. al ’90) For one-error misspelling: Estimate the probability of each possible error type e.g., insert a after c, substitute f with h P(O|w) equal to the probability of the error that generated O from w e.g., P( cbat| cat) = P(insert b after c)

13 12/7/2015CPSC503 Winter 200813 Estimate P(error type) (e.g substitution: sub[x,y]) and count matrix ……… a b c d … abc 5 8 8 15 #Times b was incorrectly used for a Large corpus compute confusion matrices Count(a)= # of a in corpus

14 12/7/2015CPSC503 Winter 200814 Corpus: Example … On 16 January, he sais [sub[i,y] 3] that because of astronaut safety tha [del[a,t] 4] would be no more space shuttle missions to miantain [tran[a,i] 2] and upgrade the orbiting telescope……..

15 12/7/2015CPSC503 Winter 200815 Final Method single error (1) Given O, collect all the w i that could have generated O by one error. E.g., O=acress => w 1 = actress (t deletion), w 2 = across (sub o with e), … … (3) Sort and display top-n to user word prior Probability of the error generating O from w 1 (2) For all the w i compute:

16 12/7/2015CPSC503 Winter 200816 Example: O = acress …stellar and versatile acress whose… _ _ _ _ _ 1988 AP newswire corpus 44 million words

17 12/7/2015CPSC503 Winter 200817 Evaluation “correct” system 0 1 2 other

18 12/7/2015CPSC503 Winter 200818 Corpora: issues to remember Zero counts in the corpus: Just because an event didn’t happen in the corpus doesn’t mean it won’t happen e.g., cress has not really zero probability Getting a corpus that matches the actual use. e.g., Kids don’t misspell the same way that adults do

19 12/7/2015CPSC503 Winter 200819 Multiple Spelling Errors (BEFORE) Given O collect all the w i that could have generated O by one error……. (NOW) Given O collect all the w i that could have generated O by 1..k errors General Solution: How to compute # and type of errors “between” O and w i ?

20 12/7/2015CPSC503 Winter 200820 Minimum Edit Distance Def. Minimum number of edit operations (insertion, deletion and substitution) needed to transform one string into another. gumbo gumb gum gam delete o delete b substitute u by a w O

21 12/7/2015CPSC503 Winter 200821 Minimum Edit Distance Algorithm Dynamic programming (very common technique in NLP) High level description: –Fills in a matrix of partial comparisons –Value of a cell computed as “simple” function of surrounding cells –Output: not only number of edit operations but also sequence of operations

22 12/7/2015CPSC503 Winter 200822 target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0))

23 12/7/2015CPSC503 Winter 200823 target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0))

24 12/7/2015CPSC503 Winter 200824 Min edit distance and alignment See demo

25 12/7/2015CPSC503 Winter 200825 Final Method multiple errors (1) Given O, for each w i compute: me i =min-edit distance(w i,O) if me i <k save corresponding edit operations in EdOp i (3) Sort and display top-n to user word prior Probability of the errors generating O from w i (2) For all the w i compute:

26 12/7/2015CPSC503 Winter 200826 Spelling: the problem(s) Non-word isolated Non-word context Detection Correction Find the most likely correct word funn -> funny, funnel... …in this context –trust funn –a lot of funn Real-word isolated Real-word context ?! Is it an impossible (or very unlikely) word in this context?.. a wild big. Find the most likely sub word in this context

27 12/7/2015CPSC503 Winter 200827 Real Word Spelling Errors Collect a set of common sets of confusions: C={C 1.. C n } e.g.,{(Their/they’re/there), (To/too/two), (Weather/whether), (lave, have)..} Whenever c’  C i is encountered Compute the probability of the sentence in which it appears Substitute all c  C i (c ≠ c’) and compute the probability of the resulting sentence Choose the higher one

28 Want to play with Spelling Correction: minimal noisy channel model implementation (Python) http://www.norvig.com/spell-correct.html 12/7/2015CPSC503 Winter 200828 By the way Peter Norvig is Director of Research at Google Inc.

29 12/7/2015CPSC503 Winter 200829 Key Transition Up to this point we’ve mostly been discussing words in isolation Now we’re switching to sequences of words And we’re going to worry about assigning probabilities to sequences of words

30 12/7/2015CPSC503 Winter 200830 Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

31 12/7/2015CPSC503 Winter 200831 Only Spelling? A.Assign a probability to a sentence Part-of-speech tagging Word-sense disambiguation Probabilistic Parsing B.Predict the next word Speech recognition Hand-writing recognition Augmentative communication for the disabled ABAB Impossible to estimate 

32 12/7/2015CPSC503 Winter 200832 Decompose: apply chain rule Chain Rule: Applied to a word sequence from position 1 to n:

33 12/7/2015CPSC503 Winter 200833 Example Sequence “The big red dog barks” P(The big red dog barks)= P(The) * P(big|the) * P(red|the big)* P(dog|the big red)* P(barks|the big red dog) Note - P(The) is better expressed as: P(The| ) written as P(The| )

34 12/7/2015CPSC503 Winter 200834 Not a satisfying solution  Even for small n (e.g., 6) we would need a far too large corpus to estimate: Markov Assumption: the entire prefix history isn’t necessary. unigram bigram trigram

35 12/7/2015CPSC503 Winter 200835 Prob of a sentence: N-Grams unigram bigram trigram

36 12/7/2015CPSC503 Winter 200836 Bigram The big red dog barks P(The big red dog barks)= P(The| ) * P(big|the) * P(red|big)* P(dog|red)* P(barks|dog) Trigram?

37 12/7/2015CPSC503 Winter 200837 Estimates for N-Grams bigram..in general

38 12/7/2015CPSC503 Winter 200838 Next Time Finish N-Grams (Chp. 4) Model Evaluation (sec. 4.4) No smoothing 4.5-4.7 Start Hidden Markov-Model Assignment 1 is due


Download ppt "12/7/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini."

Similar presentations


Ads by Google