# Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014 Diana Trandabat.

## Presentation on theme: "Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014 Diana Trandabat."— Presentation transcript:

Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014 Diana Trandabat

Intro to probabilities Probability deals with prediction: –Which word will follow in this....? –How can parses for a sentence be ordered? –Which meaning is more likely? –Which grammar is more linguistically plausible? –See phrase “more lies ahead”. How likely is it that “lies” is noun? –See “Le chien est noir”. How likely is it that the correct translation is “The dog is black”? Any rational decision can be described probabilistically.

Notations Experiment (or trial) – repeatable process by which observations are made –e.g. tossing 3 coins Observe basic outcome from sample space, Ω, (set of all possible basic outcomes) Examples of sample spaces: one coin toss, sample space Ω = { H, T }; three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} part-of-speech of a word, Ω = {N, V, Adj, etc…} next word in Shakespeare play, |Ω| = size of vocabulary number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }

Notation An event A, is a set of basic outcomes, i.e., a subset of the sample space, Ω. Example: – Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – e.g. basic outcome = THH – e.g. event = “has exactly 2 H’s” A={THH, HHT, HTH} – A=Ω is the certain event P(A=Ω)=1 – A= ∅ is the impossible event P(A= ∅ ) = 0 – For “not A”, we write Ā

Intro to probablities

Intro to probablities The probability of an event is hard to compute. It is easily to compute the estimation of probability, marked ^p(x). When |X| , ^p(x)  P(x)

Intro to probabilities “A coin is tossed 3 times. What is the likelihood of 2 heads?” – Experiment: Toss a coin three times – Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – Event: basic outcomes that have exactly 2 H’s A = {THH, HTH, HHT} – the likelihood of 2 heads is 3 out of 8 possible outcomes P(A) = 3/8

Probability distribution A probability distribution is an assignment of probabilities from a set of outcomes. –A uniform distribution assigns the same probability to all outcomes (eg a fair coin). –A gaussian distribution assigns a bell-curve over outcomes. –Many others. –Uniform and gaussians popular in SNLP.

Joint probabilities

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”.

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6}

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3

Independent events Two events are independent if: p(a,b)=p(a)*p(b) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3 p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6 ==> X and Y are independents

Conditioned events Non independent events are called conditioned events. p(X|Y) == “the probability of having X if an Y event occurred. p(X|Y)=p(X,Y) /p(Y) p(X) == apriori probability(prior) p(X|Y) = posterior probability

Conditioned events

Are X and Y independent? p(X)=1/2, p(Y)=1/3, p(X,Y)=1/6, p(X |Y)= 1/2 ==> independent. Consider Z the event “the number on the dice can be divided by 4” Are X and Z independent? p(Z)=p(4)=1 /6 p(X,Z)=1/6, p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=1  1/2 ==> non-indep.

Bayes’ Theorem Bayes’ Theorem lets us swap the order of dependence between events We saw that Bayes’ Theorem:

Example S:stiff neck, M: meningitis P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 I have stiff neck, should I worry?

Example S:stiff neck, M: meningitis P(S|M) =0.5, P(M) = 1/50,000 P(S)=1/20 I have stiff neck, should I worry?

Other useful relations: p(x)=  p(x|y) *p(y) or p(x)=  p(x,y) y  Y y  Y Chain rule: p(x 1,x 2,…x n ) = p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) The demonstration is easy, through successive reductions: Consider event y as coincident of events x 1,x 2,…x n -1 p(x 1,x 2,…x n )= p(y, x n )=p(y)*p(x n | y)= p(x 1,x 2,…x n -1 )*p(x n | x 1,x 2,…x n -1 ) similar for the event z p(x 1,x 2,…x n -1 )= p(z, x n -1 )=p(z)*p(x n -1 | z)= p(x 1,x 2,…x n -2 )*p(x n -1 | x 1,x 2,…x n -2 )... p(x 1,x 2,…x n )= p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) prior bigram, trigram, n-gram

Objections People don’t compute probabilities. Why would computers? Or do they? John went to … the market go red if number

Objections Statistics only count words and co-occurrences Two different concepts: –Statistical model and statistical method The first doesn’t need the second one. A person which used the intuition to raison is using a statistical model without statistical methods. Objections refer mainly to the accuracy of statistical models.

Download ppt "Statistical NLP Course for Master in Computational Linguistics 2nd Year 2013-2014 Diana Trandabat."

Similar presentations