2/29/2016CPSC503 Winter 20101 CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

N-gram model limitations Important question was asked in class: what do we do about N-grams which were not in our training corpus? Answer given: we distribute.
Albert Gatt Corpora and Statistical Methods – Lecture 7.
SI485i : NLP Set 4 Smoothing Language Models Fall 2012 : Chambers.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
September BASIC TECHNIQUES IN STATISTICAL NLP Word prediction n-grams smoothing.
Midterm Review CS4705 Natural Language Processing.
Fall BASIC TECHNIQUES IN STATISTICAL NLP Word prediction n-grams smoothing.
N-gram model limitations Q: What do we do about N-grams which were not in our training corpus? A: We distribute some probability mass from seen N-grams.
1 Smoothing LING 570 Fei Xia Week 5: 10/24/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A AA A A A.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
CS 4705 Lecture 15 Corpus Linguistics III. Training and Testing Probabilities come from a training corpus, which is used to design the model. –overly.
CPSC 322, Lecture 31Slide 1 Probability and Time: Markov Models Computer Science cpsc322, Lecture 31 (Textbook Chpt 6.5) March, 25, 2009.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
CS 4705 Lecture 14 Corpus Linguistics II. Relating Conditionals and Priors P(A | B) = P(A ^ B) / P(B) –Or, P(A ^ B) = P(A | B) P(B) Bayes Theorem lets.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Advanced Smoothing, Evaluation of Language Models.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
1 LIN6932 Spring 2007 LIN6932: Topics in Computational Linguistics Hana Filip Lecture 5: N-grams.
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
9/8/20151 Natural Language Processing Lecture Notes 1.
Text Models. Why? To “understand” text To assist in text search & ranking For autocompletion Part of Speech Tagging.
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
Chapter 6: N-GRAMS Heshaam Faili University of Tehran.
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
Tokenization & POS-Tagging
Lecture 4 Ngrams Smoothing
N-gram Models CMSC Artificial Intelligence February 24, 2005.
Yuya Akita , Tatsuya Kawahara
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
CPSC 503 Computational Linguistics
12/6/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
12/7/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 4 Giuseppe Carenini.
Estimating N-gram Probabilities Language Modeling.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Statistical Inference: n-grams
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
N-Grams Chapter 4 Part 2.
CSCI 5832 Natural Language Processing
CPSC 503 Computational Linguistics
Probability and Time: Markov Models
Probability and Time: Markov Models
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
Probability and Time: Markov Models
CSCI 5832 Natural Language Processing
CSCE 771 Natural Language Processing
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Probability and Time: Markov Models
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Presentation transcript:

2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini

2/29/2016CPSC503 Winter Today 23/9 Min Edit Distance Language models (n-grams)

2/29/2016CPSC503 Winter Minimum Edit Distance Def. Minimum number of edit operations (insertion, deletion and substitution) needed to transform one string into another. gumbo gumb gum gam delete o delete b substitute u by a

2/29/2016CPSC503 Winter Minimum Edit Distance Algorithm Dynamic programming (very common technique in NLP) High level description: –Fills in a matrix of partial comparisons –Value of a cell computed as “simple” function of surrounding cells –Output: not only number of edit operations but also sequence of operations

2/29/2016CPSC503 Winter target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0))

2/29/2016CPSC503 Winter target source i j Minimum Edit Distance Algorithm Details ed[i,j] = min distance between first i chars of the source and first j chars of the target del-cost =1 sub-cost=2 ins-cost=1 update x y z del ins sub or equal ? i-1, j i-1, j-1 i, j-1 MIN(z+1,y+1, x + (2 or 0))

2/29/2016CPSC503 Winter Min edit distance and alignment See demo

2/29/2016CPSC503 Winter Key Transition Up to this point we’ve mostly been discussing words in isolation Now we’re switching to sequences of words And we’re going to worry about assigning probabilities to sequences of words

2/29/2016CPSC503 Winter Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

2/29/2016CPSC503 Winter Only Spelling? A.Assign a probability to a sentence Part-of-speech tagging Word-sense disambiguation Probabilistic Parsing B.Predict the next word Speech recognition Hand-writing recognition Augmentative communication for the disabled ABAB Impossible to estimate 

2/29/2016CPSC503 Winter Impossible to estimate! Assuming 10 5 word types and average sentence contains 10 words -> sample space? Google language model Update (22 Sept. 2006): was based on corpus Number of sentences: 95,119,665,584 Most sentences will not appear or appear only once  Key point in Stat. NLP: your corpus should be >> than your sample space!

2/29/2016CPSC503 Winter Decompose: apply chain rule Chain Rule: Applied to a word sequence from position 1 to n:

2/29/2016CPSC503 Winter Example Sequence “The big red dog barks” P(The big red dog barks)= P(The) x P(big|the) x P(red|the big) x P(dog|the big red) x P(barks|the big red dog) Note - P(The) is better expressed as: P(The| ) written as P(The| )

2/29/2016CPSC503 Winter Not a satisfying solution  Even for small n (e.g., 6) we would need a far too large corpus to estimate: Markov Assumption: the entire prefix history isn’t necessary. unigram bigram trigram

2/29/2016CPSC503 Winter Prob of a sentence: N-Grams unigram bigram trigram Chain-rule simplifications

2/29/2016CPSC503 Winter Bigram The big red dog barks P(The big red dog barks)= P(The| ) x Trigram?

2/29/2016CPSC503 Winter Estimates for N-Grams bigram..in general

2/29/2016CPSC503 Winter Estimates for Bigrams Silly Corpus : “ The big red dog barks against the big pink dog” Word types vs. Word tokens

2/29/2016CPSC503 Winter Berkeley ____________Project (1994) Table: Counts Corpus: ~10,000 sentences, 1616 word types What domain? Dialog? Reviews?

2/29/2016CPSC503 Winter BERP Table:

2/29/2016CPSC503 Winter BERP Table Comparison Counts Prob. 1?

2/29/2016CPSC503 Winter Some Observations What’s being captured here? –P(want|I) =.32 –P(to|want) =.65 –P(eat|to) =.26 –P(food|Chinese) =.56 –P(lunch|eat) =.055

2/29/2016CPSC503 Winter Some Observations P(I | I) P(want | I) P(I | food) I I I want I want I want to The food I want is Speech-based restaurant consultant!

2/29/2016CPSC503 Winter Generation Choose N-Grams according to their probabilities and string them together I want -> want to -> to eat -> eat lunch I want -> want to -> to eat -> eat Chinese -> Chinese food

2/29/2016CPSC503 Winter Two problems with applying: to

2/29/2016CPSC503 Winter Problem (1) We may need to multiply many very small numbers (underflows!) Easy Solution: –Convert probabilities to logs and then do …… –To get the real probability (if you need it) go back to the antilog.

2/29/2016CPSC503 Winter Problem (2) The probability matrix for n-grams is sparse How can we assign a probability to a sequence where one of the component n-grams has a value of zero? Solutions: –Add-one smoothing –Good-Turing –Back off and Deleted Interpolation

2/29/2016CPSC503 Winter Add-One Make the zero counts 1. Rationale: If you had seen these “rare” events chances are you would only have seen them once. unigram discount

2/29/2016CPSC503 Winter Add-One: Bigram …… Counts

2/29/2016CPSC503 Winter BERP Original vs. Add-one smoothed Counts

2/29/2016CPSC503 Winter Add-One Smoothed Problems An excessive amount of probability mass is moved to the zero counts When compared empirically with other smoothing techniques, it performs poorly -> Not used in practice Detailed discussion [Gale and Church 1994]

2/29/2016CPSC503 Winter Better smoothing technique Good-Turing Discounting (extremely clear example on textbook) More advanced: Backoff and Interpolation To estimate an ngram use “lower order” ngrams –Backoff: Rely on the lower order ngrams only when we have zero counts for a higher order one; e.g.,use unigrams to estimate zero count bigrams –Interpolation: Mix the prob estimate from all the ngrams estimators; e.g., we do a weighted interpolation of trigram, bigram and unigram counts

2/29/2016CPSC503 Winter Impossible to estimate! Sample space much bigger than any realistic corpus  Chain rule does not help  Markov assumption : Unigram… sample space? Bigram … sample space? Trigram … sample space? Sparse matrix: Smoothing techniques N-Grams Summary: final Look at practical issues: sec. 4.8

2/29/2016CPSC503 Winter You still need a big corpus… The biggest one is the Web! Impractical to download every page from the Web to count the ngrams => Rely on page counts (which are only approximations) –Page can contain an ngram multiple times –Search Engines round-off their counts Such “noise” is tolerable in practice

2/29/2016CPSC503 Winter Knowledge-Formalisms Map (including probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

2/29/2016CPSC503 Winter Next Time Language Model Evaluation Hidden Markov-Models (HMM) Part-of-Speech (POS) tagging