Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 10, 11: EM (“how to know when you do not completely know”) 16 th and.

Similar presentations


Presentation on theme: "CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 10, 11: EM (“how to know when you do not completely know”) 16 th and."— Presentation transcript:

1 CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 10, 11: EM (“how to know when you do not completely know”) 16 th and 20 th August, 2012

2 Algorithm Problem Language Hindi Marathi English French Morph Analysis Part of Speech Tagging Parsing Semantics CRF HMM MEMM NLP Trinity

3 Maximum Likelihood considerations

4 Problem:- Given no of heads obtained out of N trials, what is probability of obtaining head? In case of one coin Let probabilty of obtaining head = This implies probability of obtaining exactly N h successes (heads) out of N trials (tosses)

5 Most “likely” value of P H To obtain the most likely value of, we take ln of the above equation and differentiate wrt :

6 Value of P H in absence of any information Suppose we know nothing about the properties of a coin then what can we say about probability of head ? We have to use the entropy E: Let be the probability of head ----(1)

7 Entropy Entropy is defined as sum of the multiplication of probability and log of probability with – sign.It is the instrument to deal with uncertainity. So best we can do is to maximize the entropy.Maximize E subject to the eq (1) and get the value of.

8 Finding P H and P T From 2 and 3 4 From 4 and 1

9 A deeper look at EM Problem: two coins are tossed, randomly picking a coin at a time. The number of trials is N, number of heads is N H and number of tails is N T. How can one estimate the following probabilities: p: prob. Of choosing coin 1 p 1 : prob. Of head from coin 1 p 2 : prob. Of head from coin 2

10 Expectation Maximization (1 Coin Toss) Toss 1 coin K = Number of heads N = Number of trials X = observation of tosses =,, … - each can take values 0 or 1 p = probability of Head = (as per MLE – maximizes probability of observed data)

11 Expectation Maximization (1 Coin Toss) Y =,, … … x i = 1 for Head = 0 for Tail z i = indicator function = 1 if the observation comes from the coin In this case, z i = 1

12 Expectation Maximization (2 coin toss) X =,, … … Y =,, … … x i = 1 for Head = 0 for Tail z i1 = 1 if the observation comes from coin 1 else 0 z i2 = 1 if the observation comes from coin 2 else 0 only 1 of z i1 and z i2 can be 1 x i is observed while z i1 and z i2 is unobserved

13 Expectation Maximization (2 coin toss) Parameters of the setting p 1 = probability of Head for coin 1 p 2 = probabilily of Head for coin 2 p = probability of choosing for coin 1 for the toss Express p, p 1 and p 2 in terms of observed and unobserved data

14 Expectation Maximization trick Replace z i1 and z i1 in p, p 1, p 2 with E(z i1 ) and E(z i2 ) z i1 : event of x=x i given that observation is from coin 1 E(z i1 ) = expectation of z i1

15 Summary X =,, … … Y =,, … … MstepMstep EstepEstep

16 Observations Any EM problem has observed and unobserved data Nature of distribution two coins follow two different binomial distributions Oscillation between E and M convergence to local maxima or minima guaranteed greedy algorithm

17 EM: Baum-Welch algorithm: counts String = abb aaa bbb aaa Sequence of states with respect to input symbols a, b q r o/p seq State seq a,b

18 Calculating probabilities from table Table of counts T=#states A=#alphabet symbols Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this. SrcDestO/PCou nt qra5 qqb3 rqa3 rqb2

19 Interplay Between Two Equations w k No. of times the transitions s i  s j occurs in the string

20 Illustration a:0.67 b:1.0 b:0.17 a:0.16 qr a:0.04 b:1.0 b:0.48 a:0.48 qr Actual (Desired) HMM Initial guess

21 One run of Baum-Welch algorithm: string ababb P(path) qrqrqq qrqqqq qqqrqq qqqqqq Rounded Total  New Probabilities (P)  0.06 =(0.01/( ) * ε is considered as starting and ending symbol of the input sequence string. State sequences Through multiple iterations the probability values will converge.

22 QUIZ 1 SOLUTIONS

23 Quiz: Question 1 (with bigram assumption)

24 Quiz: Question 1 (with bigram assumption)

25 Quiz: Question 1 (with trigram assumption)

26

27 Quiz: Question 2 Role of nouns as adjectives Sri Lanka skipper: Sri Lanka is a noun but takes the role of adjective here Inter annotator agreement There is often disagreement between human annotators that can be noun or function word Handling of multiwords and named entities Sri Lanka and Aravinda de Silva are named entities and they are not expected to be present in the training corpus. In addition, they are multiwords and thus they should be treated as a single entity. talk: let takes a verb, hence talk should be tagged as V, but since let and talk are separated, the HMM cannot relate them larger k will take care but will lead to data sparsity


Download ppt "CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 10, 11: EM (“how to know when you do not completely know”) 16 th and."

Similar presentations


Ads by Google