Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language Model for Machine Translation Jang, HaYoung.

Similar presentations


Presentation on theme: "Language Model for Machine Translation Jang, HaYoung."— Presentation transcript:

1 Language Model for Machine Translation Jang, HaYoung

2 What is a Language Model? Probability distribution over strings of text How likely is a string in a given “ language ” ? Probabilities depend on what language we ’ re modeling p 1 = P(“a quick brown dog”) p 2 = P(“dog quick a brown”) p 3 = P(“быстрая brown dog”) p 4 = P(“быстрая собака”) In a language model for English: p 1 > p 2 > p 3 > p 4 In a language model for Russian: p 1 < p 2 < p 3 < p 4

3 Language Model from Wikipedia A statistical language model assigns a probability to a sequence of words P(w1..n) by means of a probability distribution. Language modeling is used in many natural language processing applications such as speech recognition, machine translation, part-of-speech tagging, parsing and information retrieval. Estimating the probabilty of sequences can become difficult in corpora, in which phrases or sentences can be arbitrarily long and hence some sequences are not observed during training of the language model (data sparseness problem of overfitting). For that reason these models are often approximated using smoothed N- gram models. In speech recognition and in data compression, such a model tries to capture the properties of a language, and to predict the next word in a speech sequence. When used in information retrieval, a language model is associated with a document in a collection. With query Q as input, retrieved documents are ranked based on the probability that the document's language model would generate the terms of the query, P(Q|Md).

4 Unigram Language Model Colored balls are randomly drawn from an urn (with replacement) P( )P( )= = (4/9)  (2/9)  (4/9)  (3/9) words M P( )   

5 Zero-Frequency Problem Suppose some event is not in our observation S Model will assign zero probability to that event M P ( )=1/2 P ( )=1/4 Sequence S P( )P( )= = (1/2)  (1/4)  0  (1/4) = 0 P( )    !!

6 Smoothing The solution: “smooth” the word probabilities P(w) w Maximum Likelihood Estimate Smoothed probability distribution

7 Phonetic Tree with n-gram Model Tell the T R U U TH L E 0.1 1 0.5 1 0.02 the T R U U TH L E 0.1 1 1 1 0.5 1 0.02 T R U U TH L E 0.1 1 1 1 0.5 1 0.02 Trigram Bigram Unigram

8 n-grams n-gram A sequence of n symbols n-gram Language Model A model to predict a symbol in a sequence, given its n-1 predecessors Why use them? Estimate the probability of a symbol in unknown text, given the frequency of its occurrence in known text

9 Creating n-gram LMs

10 Problems with n-grams More n-grams than those that can be observed Sensitivity to the genre of the training text Newpaper articles Personal letters Fixed n-gram Vocabulary Any additions lead to re-compilation of the n- gram model

11 Whole-Sentence Language Model The main advantage of WSME is its ability to freely incorporate arbitrary computational features into a single statistical model. The features can be: Traditional N-gram features (bigram, trigram) Long distance N-grams (triggers, d-2 ngram) Class based N-gram Syntactic features (PCFG, link grammar, dependency info.) Other features (sentence length, dialogue features, etc)

12 Reference Estimation of probabilities from sparse data for the language model component of a speech recognizer, Katz, S. Class-based n-gram models of natural language, Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della Pietra, Jenifer C. Lai Blocking Blog Spam with Language Model Disagreement, G. Mishne, D. Carmel, and R. Lempel. In: AIRWeb '05 - First International Workshop on Adversarial Information Retrieval on the Web, at the 14th International World Wide Web Conference (WWW2005), 2005.


Download ppt "Language Model for Machine Translation Jang, HaYoung."

Similar presentations


Ads by Google