Lecture 4 Ngrams Smoothing

Slides:



Advertisements
Similar presentations
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Advertisements

1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Language Modeling.
N-gram model limitations Important question was asked in class: what do we do about N-grams which were not in our training corpus? Answer given: we distribute.
Albert Gatt Corpora and Statistical Methods – Lecture 7.
SI485i : NLP Set 4 Smoothing Language Models Fall 2012 : Chambers.
Smoothing Techniques – A Primer
Smoothing N-gram Language Models Shallow Processing Techniques for NLP Ling570 October 24, 2011.
CSC 9010: Special Topics, Natural Language Processing. Spring, Matuszek & Papalaskari 1 N-Grams CSC 9010: Special Topics. Natural Language Processing.
1 N-Grams and Corpus Linguistics September 2009 Lecture #5.
September BASIC TECHNIQUES IN STATISTICAL NLP Word prediction n-grams smoothing.
N-Gram Language Models CMSC 723: Computational Linguistics I ― Session #9 Jimmy Lin The iSchool University of Maryland Wednesday, October 28, 2009.
Ngram models and the Sparsity problem John Goldsmith November 2002.
Smoothing Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Fall BASIC TECHNIQUES IN STATISTICAL NLP Word prediction n-grams smoothing.
N-gram model limitations Q: What do we do about N-grams which were not in our training corpus? A: We distribute some probability mass from seen N-grams.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/8.
1 Smoothing LING 570 Fei Xia Week 5: 10/24/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A AA A A A.
Part 5 Language Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
CS 4705 Lecture 15 Corpus Linguistics III. Training and Testing Probabilities come from a training corpus, which is used to design the model. –overly.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 19: 10/31.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
Language Models Data-Intensive Information Processing Applications ― Session #9 Nitin Madnani University of Maryland Tuesday, April 6, 2010 This work is.
CS 4705 Lecture 14 Corpus Linguistics II. Relating Conditionals and Priors P(A | B) = P(A ^ B) / P(B) –Or, P(A ^ B) = P(A | B) P(B) Bayes Theorem lets.
Learning Bit by Bit Class 4 - Ngrams. Ngrams Counting words Using observation to make predictions.
LIN3022 Natural Language Processing Lecture 5 Albert Gatt LIN Natural Language Processing.
1 Advanced Smoothing, Evaluation of Language Models.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Natural Language Processing Lecture 6—9/17/2013 Jim Martin.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Lecture 3 Ngrams Topics Python NLTK N – grams SmoothingReadings: Chapter 4 – Jurafsky and Martin January 23, 2013 CSCE 771 Natural Language Processing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 7 8 August 2007.
Heshaam Faili University of Tehran
6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.
Language acquisition
1 COMP 791A: Statistical Language Processing n-gram Models over Sparse Data Chap. 6.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Chapter 6: N-GRAMS Heshaam Faili University of Tehran.
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
Language Modeling 1. Roadmap (for next two classes)  Review LMs  What are they?  How (and where) are they used?  How are they trained?  Evaluation.
Language acquisition
9/22/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) Dr. Jan Hajič CS Dept., Johns.
1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) AI-lab
LING 388: Language and Computers Sandiway Fong Lecture 27: 12/6.
LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong. Last Time Gentle introduction to probability Important notions: –sample space –events –rule of counting.
Statistical NLP Winter 2009
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Natural Language Processing Statistical Inference: n-grams
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Speech and Language Processing Lecture 4 Chapter 4 of SLP.
Language Modeling Again So are we smooth now? Courtesy of Chris Jordan.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Intro to NLP - J. Eisner1 Smoothing Intro to NLP - J. Eisner2 Parameter Estimation p(x 1 = h, x 2 = o, x 3 = r, x 4 = s, x 5 = e,
N-Grams Chapter 4 Part 2.
N-Grams and Corpus Linguistics
CPSC 503 Computational Linguistics
Lecture 7 HMMs – the 3 Problems Forward Algorithm
Lecture 7 HMMs – the 3 Problems Forward Algorithm
CSCE 771 Natural Language Processing
CSCI 5832 Natural Language Processing
CSCE 771 Natural Language Processing
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
CSCE 771 Natural Language Processing
Language acquisition (4:30)
Conceptual grounding Nisheeth 26th March 2019.
Presentation transcript:

Lecture 4 Ngrams Smoothing CSCE 771 Natural Language Processing Lecture 4 Ngrams Smoothing Topics Python NLTK N – grams Smoothing Readings: Chapter 4 – Jurafsky and Martin January 23, 2013

Last Time Today Slides from Lecture 1 30- Morphology Regular expressions in Python, (grep, vi, emacs, word)? Eliza Morphology Today Smoothing N-gram models Laplace (plus 1) Good Turing Discounting Katz Backoff Neisser-Ney

Problem Let’s assume we’re using N-grams How can we assign a probability to a sequence where one of the component n-grams has a value of zero Assume all the words are known and have been seen Go to a lower order n-gram Back off from bigrams to unigrams Replace the zero with something else

Smoothing Smoothing - reevaluating some of the zero and low probability N-grams and assigning them non-zero values Add-One (Laplace) Make the zero counts 1., really start counting at 1 Rationale: They’re just events you haven’t seen yet. If you had seen them, chances are you would only have seen them once… so make the count equal to 1.

Add-One Smoothing Terminology N – Number of total words V – vocabulary size == number of distinct words Maximum Likelihood estimate

Adjusted counts “C*” Terminology N – Number of total words V – vocabulary size == number of distinct words Adjusted count C* Adjusted probabilities

Discounting View Discounting – lowering some of the larger non-zero counts to get the “probability” to assign to the zero entries dc – the discounted counts The discounted probabilities can then be directly calculated

Original BERP Counts (fig 4.1) Berkeley Restaurant Project data V = 1616

Figure 4.5 Add one counts (Laplace) Probabilities

Figure 6.6 Add one counts & prob. Probabilities

Add-One Smoothed bigram counts Think about the occurrence of an unseen item (

Good-Turing Discounting Singleton - an word that occurs only once Good-Turing: Estimate probability of word that occur zero times with the probability of a singleton Generalize words to bigrams, trigrams … events

Calculating Good-Turing  

Witten-Bell Think about the occurrence of an unseen item (word, bigram, etc) as an event. The probability of such an event can be measured in a corpus by just looking at how often it happens. Just take the single word case first. Assume a corpus of N tokens and T types. How many times was an as yet unseen type encountered?

Witten Bell First compute the probability of an unseen event Then distribute that probability mass equally among the as yet unseen events That should strike you as odd for a number of reasons In the case of words… In the case of bigrams

Witten-Bell In the case of bigrams, not all conditioning events are equally promiscuous P(x|the) vs P(x|going) So distribute the mass assigned to the zero count bigrams according to their promiscuity

Witten-Bell Finally, renormalize the whole table so that you still have a valid probability

Original BERP Counts; Now the Add 1 counts

Witten-Bell Smoothed and Reconstituted

Add-One Smoothed BERP Reconstituted