6. N-GRAMs 부산대학교 인공지능연구실 최성자. 2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative.

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

N-gram model limitations Important question was asked in class: what do we do about N-grams which were not in our training corpus? Answer given: we distribute.
Albert Gatt Corpora and Statistical Methods – Lecture 7.
SI485i : NLP Set 4 Smoothing Language Models Fall 2012 : Chambers.
Smoothing Techniques – A Primer
Smoothing N-gram Language Models Shallow Processing Techniques for NLP Ling570 October 24, 2011.
Language modelling using N-Grams Corpora and Statistical Methods Lecture 7.
CSC 9010: Special Topics, Natural Language Processing. Spring, Matuszek & Papalaskari 1 N-Grams CSC 9010: Special Topics. Natural Language Processing.
1 N-Grams and Corpus Linguistics September 2009 Lecture #5.
N-Grams and Corpus Linguistics.  Regular expressions for asking questions about the stock market from stock reports  Due midnight, Sept. 29 th  Use.
Part II. Statistical NLP Advanced Artificial Intelligence N-Gramms Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most slides taken.
Part II. Statistical NLP Advanced Artificial Intelligence Markov Models and N-gramms Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Kristian Kersting Some.
Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.
N-Grams and Corpus Linguistics
September BASIC TECHNIQUES IN STATISTICAL NLP Word prediction n-grams smoothing.
N-Gram Language Models CMSC 723: Computational Linguistics I ― Session #9 Jimmy Lin The iSchool University of Maryland Wednesday, October 28, 2009.
CS 4705 Lecture 6 N-Grams and Corpus Linguistics.
Smoothing Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Fall BASIC TECHNIQUES IN STATISTICAL NLP Word prediction n-grams smoothing.
N-gram model limitations Q: What do we do about N-grams which were not in our training corpus? A: We distribute some probability mass from seen N-grams.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/8.
1 Language Model (LM) LING 570 Fei Xia Week 4: 10/21/2009 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA A A.
CS 4705 Lecture 15 Corpus Linguistics III. Training and Testing Probabilities come from a training corpus, which is used to design the model. –overly.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
CS 4705 N-Grams and Corpus Linguistics. Homework Use Perl or Java reg-ex package HW focus is on writing the “grammar” or FSA for dates and times The date.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 19: 10/31.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.
CS 4705 N-Grams and Corpus Linguistics. Spelling Correction, revisited M$ suggests: –ngram: NorAm –unigrams: anagrams, enigmas –bigrams: begrimes –trigrams:
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
1 Advanced Smoothing, Evaluation of Language Models.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Natural Language Processing Lecture 6—9/17/2013 Jim Martin.
Machine Translation Course 3 Diana Trandab ă ț Academic year:
Speech and Language Processing
Natural Language Processing Language Model. Language Models Formal grammars (e.g. regular, context free) give a hard “binary” model of the legal sentences.
Name:Venkata subramanyan sundaresan Instructor:Dr.Veton Kepuska.
1 LIN6932 Spring 2007 LIN6932: Topics in Computational Linguistics Hana Filip Lecture 5: N-grams.
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 7 8 August 2007.
Heshaam Faili University of Tehran
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
1 COMP 791A: Statistical Language Processing n-gram Models over Sparse Data Chap. 6.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Chapter 6: N-GRAMS Heshaam Faili University of Tehran.
Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
Resolving Word Ambiguities Description: After determining word boundaries, the speech recognition process matches an array of possible word sequences from.
9/22/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) Dr. Jan Hajič CS Dept., Johns.
1 Introduction to Natural Language Processing ( ) LM Smoothing (The EM Algorithm) AI-lab
Lecture 4 Ngrams Smoothing
N-gram Models CMSC Artificial Intelligence February 24, 2005.
Ngram models and the Sparcity problem. The task Find a probability distribution for the current word in a text (utterance, etc.), given what the last.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
12/6/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Estimating N-gram Probabilities Language Modeling.
CS Machine Learning and Statistical Natural Language Processing Prof. Shlomo Argamon, Room: 237C Office Hours: Mon 3-4 PM Book:
Natural Language Processing Statistical Inference: n-grams
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Statistical Methods for NLP Diana Trandab ă ț
Speech and Language Processing Lecture 4 Chapter 4 of SLP.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Language Model for Machine Translation Jang, HaYoung.
N-Grams Chapter 4 Part 2.
N-Grams and Corpus Linguistics
CSCE 771 Natural Language Processing
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Presentation transcript:

6. N-GRAMs 부산대학교 인공지능연구실 최성자

2 Word prediction “I’d like to make a collect …” Call, telephone, or person-to-person -Spelling error detection -Augmentative communication -Context-sensitive spelling error correction

3 Language Model  Language Model (LM) –statistical model of word sequences  n-gram: Use the previous n -1 words to predict the next word

4 Applications  context-sensitive spelling error detection and correction “He is trying to fine out.” “The design an construction will take a year.”  machine translation

5 Counting Words in Corpora  Corpora (on-line text collections)  Which words to count –What we are going to count –Where we are going to find the things to count

6 Brown Corpus  1 million words  500 texts  Varied genres (newspaper, novels, non- fiction, academic, etc.)  Assembled at Brown University in  The first large on-line text collection used in corpus-based NLP research

7 Issues in Word Counting  Punctuation symbols (., ? !)  Capitalization (“He” vs. “he”, “Bush” vs. “bush”)  Inflected forms (“cat” vs. “cats”) –Wordform: cat, cats, eat, eats, ate, eating, eaten –Lemma (Stem): cat, eat

8 Types vs. Tokens  Tokens (N): Total number of running words  Types (B): Number of distinct words in a corpus (size of the vocabulary) Example: “They picnicked by the pool, then lay back on the grass and looked at the stars.” –16 word tokens, 14 word types (not counting punctuation) ※ “Types” will mean wordform types and not lemma type, and punctuation marks will generally be counted as word

9 How Many Words in English?  Shakespeare’s complete works –884,647 wordform tokens –29,066 wordform types  Brown Corpus –1 million wordform tokens –61,805 wordform types –37,851 lemma types

10 Simple (Unsmoothed) N-grams  Task: Estimating the probability of a word  First attempt: –Suppose there is no corpus available –Use uniform distribution –Assume: word types = V (e.g., 100,000)

11 Simple (Unsmoothed) N-grams  Task: Estimating the probability of a word  Second attempt: –Suppose there is a corpus –Assume: word tokens = N # times w appears in corpus = C(w)

12 Simple (Unsmoothed) N-grams  Task: Estimating the probability of a word  Third attempt: –Suppose there is a corpus –Assume a word depends on its n –1 previous words

13 Simple (Unsmoothed) N-grams

14 Simple (Unsmoothed) N-grams  n-gram approximation: –W k only depends on its previous n–1words

15 Bigram Approximation  Example: P(I want to eat British food) = P(I| ) P(want|I) P(to|want) P(eat|to) P(British|eat) P(food|British) : a special word meaning “start of sentence”

16 Note on Practical Problem  Multiplying many probabilities results in a very small number and can cause numerical underflow  Use logprob instead in the actual computation

17 Estimating N-gram Probability  Maximum Likelihood Estimate (MLE)

18

19 Estimating Bigram Probability  Example: –C(to eat) = 860 –C(to) = 3256

20

21 Two Important facts  The increasing accuracy of N-gram models as we increse the value of N  Very strong dependency on their training corpus (in particular its genre and its size in words)

22 Smoothing  Any particular training corpus is finite  Sparse data problem  Deal with zero probability

23 Smoothing  Smoothing –Reevaluating zero probability n-grams and assigning them non-zero probability  Also called Discounting –Lowering non-zero n-gram counts in order to assign some probability mass to the zero n- grams

24 Add-One Smoothing for Bigram

25

26

27 Things Seen Once  Use the count of things seen once to help estimate the count of things never seen

28 Witten-Bell Discounting

29 Witten-Bell Discounting for Bigram

30 Witten-Bell Discounting for Bigram

31  Seen count Unseen count

32

33 Good-Turing Discounting for Bigram

34

35 Backoff

36 Backoff

37 Entropy  Measure of uncertainty  Used to evaluate quality of n-gram models (how well a language model matches a given language)  Entropy H(X) of a random variable X:  Measured in bits  Number of bits to encode information in the optimal coding scheme

38 Example 1

39 Example 2

40 Perplexity

41 Entropy of a Sequence

42 Entropy of a Language

43 Cross Entropy  Used for comparing two language models  p: Actual probability distribution that generated some data  m: A model of p (approximation to p)  Cross entropy of m on p:

44 Cross Entropy  By Shannon-McMillan-Breimantheorem:  Property of cross entropy:  Difference between H(p,m) and H(p) is a measure of how accurate model m is  The more accurate a model, the lower its cross-entropy