Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical NLP Course for Master in Computational Linguistics 2nd Year 2015-2016 Diana Trandabat.

Similar presentations


Presentation on theme: "Statistical NLP Course for Master in Computational Linguistics 2nd Year 2015-2016 Diana Trandabat."— Presentation transcript:

1 Statistical NLP Course for Master in Computational Linguistics 2nd Year 2015-2016 Diana Trandabat

2 Intro to probabilities Probability deals with prediction: –Which word will follow in this....? –How can parses for a sentence be ordered? –Which meaning is more likely? –Which grammar is more linguistically plausible? –See phrase “more lies ahead”. How likely is it that “lies” is noun? –See “Le chien est noir”. How likely is it that the correct translation is “The dog is black”? Any rational decision can be described probabilistically.

3 Notations Experiment (or trial) – repeatable process by which observations are made –e.g. tossing 3 coins Observe basic outcome from sample space, Ω, (set of all possible basic outcomes) Examples of sample spaces: one coin toss, sample space Ω = { H, T }; three coin tosses, Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} part-of-speech of a word, Ω = {N, V, Adj, etc…} next word in Shakespeare play, |Ω| = size of vocabulary number of words in your Msc. Thesis Ω = { 0, 1, … ∞ }

4 Notation An event A, is a set of basic outcomes, i.e., a subset of the sample space, Ω. Example: – Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – e.g. basic outcome = THH – e.g. event = “has exactly 2 H’s” A={THH, HHT, HTH} – A=Ω is the certain event P(A=Ω)=1 – A= ∅ is the impossible event P(A= ∅ ) = 0 – For “not A”, we write Ā

5 Intro to probablities

6 Intro to probabilities “A coin is tossed 3 times. What is the likelihood of 2 heads?” – Experiment: Toss a coin three times – Sample space Ω = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} – Event: basic outcomes that have exactly 2 H’s A = {THH, HTH, HHT} – the likelihood of 2 heads is 3 out of 8 possible outcomes P(A) = 3/8

7 Probability distribution A probability distribution is an assignment of probabilities from a set of outcomes. –A uniform distribution assigns the same probability to all outcomes (eg a fair coin). –A gaussian distribution assigns a bell-curve over outcomes. –Many others. –Uniform and gaussians popular in SNLP.

8 Joint probabilities

9 Probabilities as sets P(A|B) = P(A∩B) / P(B) P(A∩B)= P(A|B) * P(B) P(B|A) = P(A∩B) / P(B) P(B∩A)= P(A∩B) = P(B|A) * P(A) = P(A|B) * P(B) AA ∩ BB

10 Probabilities as sets P(A|B) = P(A∩B) / P(B) P(A∩B)= P(A|B) * P(B) P(B|A) = ? P(B∩A)= P(A∩B) = P(B|A) * P(A) = P(A|B) * P(B) AA ∩ BB

11 Probabilities as sets P(A|B) = P(A∩B) / P(B) P(A∩B)= P(A|B) * P(B) P(B|A) = P(A∩B) / P(B) P(B∩A)= P(A∩B) = P(B|A) * P(A) = P(A|B) * P(B) AA ∩ BB Multiplication rule

12 Probabilities as sets P(A) = P(A∩B) + P(A∩B) P(A) = P(A|B) * P(B) + P(A|B) * P(B) AA ∩ BB Additivity rule

13 Bayes’ Theorem Bayes’ Theorem lets us swap the order of dependence between events We saw that Bayes’ Theorem:

14 Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”.

15 Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6}

16 Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3

17 Independent events Two events are independent if: P(A,B)=P(A)*P(B) Consider a fair dice. Intuitively, each side (1, 2, 3, 4, 5, 6) has an appearance chance of 1/6. Consider the eveniment X “the number on the dice will be devided by 2” and Y “the number s divided by 3”. X={2, 4, 6}, Y={3, 6} p(X)=p(2)+p(4)+p(6)=1/6+1/6+1/6=3/6=1/2 p(Y)=p(3)+p(6)=1/3 p(X,Y)=p(6)=1/2*1/3=p(X)*p(Y)=1/6 ==> X and Y are independents

18 Independent events Consider Z the event “the number on the dice can be divided by 4” Are X and Z independent? p(Z)=p(4)=1 /6 p(X,Z)=1/6, p(X|Z)=p(X,Z) / p(Z)=1/6 /1/6=1  1/2 ==> non-indep.

19 Other useful relations: p(x)=  p(x|y) *p(y) or p(x)=  p(x,y) y  Y y  Y Chain rule: p(x 1,x 2,…x n ) = p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) The demonstration is easy, through successive reductions: Consider event y as coincident of events x 1,x 2,…x n -1 p(x 1,x 2,…x n )= p(y, x n )=p(y)*p(x n | y)= p(x 1,x 2,…x n -1 )*p(x n | x 1,x 2,…x n -1 ) similar for the event z p(x 1,x 2,…x n -1 )= p(z, x n -1 )=p(z)*p(x n -1 | z)= p(x 1,x 2,…x n -2 )*p(x n -1 | x 1,x 2,…x n -2 )... p(x 1,x 2,…x n )= p(x 1 ) * p(x 2 | x 1 )*p(x 3 | x 1,x 2 )*... p(x n | x 1,x 2,…x n -1 ) prior bigram, trigram, n-gram

20 Objections People don’t compute probabilities. Why would computers? Or do they? John went to … the market go red if number

21 Objections Statistics only count words and co-occurrences Two different concepts: –Statistical model and statistical method The first doesn’t need the second one. A person which used the intuition to raison is using a statistical model without statistical methods. Objections refer mainly to the accuracy of statistical models.

22 Reference Christopher D. Manning and Hinrich Schiitze, Fundations of Statistical Natural Language ProcessingFundations of Statistical Natural Language Processing

23 Great! P(See you next time)…=

24 Great! P(See you next time)=…


Download ppt "Statistical NLP Course for Master in Computational Linguistics 2nd Year 2015-2016 Diana Trandabat."

Similar presentations


Ads by Google