Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.

Similar presentations


Presentation on theme: "Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya."— Presentation transcript:

1 Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

2 Example : Blocks World STRIPS : A planning system – Has rules with precondition deletion list and addition list Robot hand Robot hand A C B A B C START GOAL on(B, table) on(A, table) on(C, A) hand empty clear(C) clear(B) on(C, table) on(B, C) on(A, B) hand empty clear(A)

3 Rules R1 : pickup(x) Precondition & Deletion List : handempty, on(x,table), clear(x) Add List : holding(x) R2 : putdown(x) Precondition & Deletion List : holding(x) Add List : handempty, on(x,table), clear(x)

4 Rules R3 : stack(x,y) Precondition & Deletion List :holding(x), clear(y) Add List : on(x,y), clear(x), handempty R4 : unstack(x,y) Precondition & Deletion List : on(x,y), clear(x),handempty Add List : holding(x), clear(y)

5 Plan for the block world problem
For the given problem, Start  Goal can be achieved by the following sequence : Unstack(C,A) Putdown(C) Pickup(B) Stack(B,C) Pickup(A) Stack(A,B) Execution of a plan: achieved through a data structure called Triangular Table.

6 (discussion based on the book “Automated Planning” by Dana Nau)
Why Probability? (discussion based on the book “Automated Planning” by Dana Nau)

7 Motivation c a b In many situations, actions may have more than one possible outcome Action failures e.g., gripper drops its load Exogenous events e.g., road closed Would like to be able to plan in such situations One approach: Markov Decision Processes Intended outcome c a b Grasp block c a b c Unintended outcome

8 Stochastic Systems Stochastic system: a triple  = (S, A, P)
S = finite set of states A = finite set of actions Pa (s | s) = probability of going to s if we execute a in s s  S Pa (s | s) = 1

9 Example Robot r1 starts at location l1 Objective is to get r1 to
State s1 in the diagram Objective is to get r1 to location l4 State s4 in the diagram Start Goal

10 Example No classical plan (sequence of actions) can be a solution, because we can’t guarantee we’ll be in a state where the next action is applicable e.g., π = move(r1,l1,l2), move(r1,l2,l3), move(r1,l3,l4) Start Goal

11 Another Example A colored ball choosing example : Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 Probability of transition to another Urn after picking a ball: U1 U2 U3 0.1 0.4 0.5 0.6 0.2 0.3

12 Example (contd.) R G B U1 0.3 0.5 0.2 U2 0.1 0.4 U3 0.6 U1 U2 U3 0.1
Given : and Observation : RRGGBRGR State Sequence : ?? Not so Easily Computable.

13 Example (contd.) Here : For observation: And State sequence π is
S = {U1, U2, U3} V = { R,G,B} For observation: O ={o1… on} And State sequence Q ={q1… qn} π is U1 U2 U3 0.1 0.4 0.5 0.6 0.2 0.3 A = R G B U1 0.3 0.5 0.2 U2 0.1 0.4 U3 0.6 B=

14 Hidden Markov Models

15 Model Definition Set of states : S where |S|=N Output Alphabet : V
Transition Probabilities : A = {aij} Emission Probabilities : B = {bj(ok)} Initial State Probabilities : π

16 Markov Processes Properties
Limited Horizon :Given previous n states, a state i, is independent of preceding 0…i-n+1 states. P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-n) Time invariance : P(Xt=i|Xt-1=j) = P(X1=i|X0=j) = P(Xn=i|X0-1=j)

17 Three Basic Problems of HMM
Given Observation Sequence O ={o1… oT} Efficiently estimate P(O|λ) Get best Q ={q1… qT} i.e. Maximize P(Q|O, λ) How to adjust to best maximize Re-estimate λ

18 Three basic problems (contd.)
Problem 1: Likelihood of a sequence Forward Procedure Backward Procedure Problem 2: Best state sequence Viterbi Algorithm Problem 3: Re-estimation Baum-Welch ( Forward-Backward Algorithm )

19 Problem 2 Given Observation Sequence O ={o1… oT} Solution :
Get “best” Q ={q1… qT} i.e. Solution : Best state individually likely at a position i Best state given all the previously observed states and observations Viterbi Algorithm

20 Example Output observed – aabb
What state seq. is most probable? Since state seq. cannot be predicted with certainty, the machine is given qualification “hidden”. Note: ∑ P(outlinks) = 1 for all states

21 Probabilities for different possible seq
1 1,2 1,1 0.4 0.15 1,1,2 0.06 1,1,1 0.16 1,2,1 0.0375 1,2,2 0.0225 1,1,1,1 0.016 1,1,1,2 0.056 1,1,2,1 0.018 1,1,2,2 0.018 ...and so on

22 P(si|si-1, si-2) (order 2 HMM)
Viterbi for higher order HMM If P(si|si-1, si-2) (order 2 HMM) then the Markovian assumption will take effect only after two levels. (generalizing for n-order… after n levels)

23 Forward and Backward Probability Calculation

24 A Simple HMM r q a: 0.2 a: 0.3 b: 0.2 b: 0.1 a: 0.2 b: 0.1 b: 0.5

25 Forward or α-probabilities
Let αi(t) be the probability of producing w1,t-1, while ending up in state si αi(t)= P(w1,t-1,St=si), t>1

26 Initial condition on αi(t)
1.0 if i=1 αi(t)= 0 otherwise

27 Probability of the observation using αi(t)
P(w1,n) =Σ1 σ P(w1,n, Sn+1=si) = Σi=1 σ αi(n+1) σ is the total number of states

28 Recursive expression for α
αj(t+1) =P(w1,t, St+1=sj) =Σi=1 σ P(w1,t, St=si, St+1=sj) =Σi=1 σ P(w1,t-1, St=sj) P(wt, St+1=sj|w1,t-1, St=si) =Σi=1 σ P(w1,t-1, St=si) P(wt, St+1=sj|St=si) = Σi=1 σ αj(t) P(wt, St+1=sj|St=si)

29 The forward probabilities of “bbba”
Time Ticks 1 2 INPUT ε b bb bbb bbba 1.0 0.2 0.0 0.1 P(w,t) 0.3

30 Backward or β-probabilities
Let βi(t) be the probability of seeing wt,n, given that the state of the HMM at t is si βi(t)= P(wt,n,St=si)

31 Probability of the observation using β
P(w1,n)=β1(1)

32 Recursive expression for β
βj(t-1) =P(wt-1,n |St-1=sj) =Σj=1 σ P(wt-1,n, St=si |St-1=si) =Σi=1 σ P(wt-1, St=sj|St-1=si) P(wt,n,|wt-1,St=sj, St-1=si) =Σi=1 σ P(wt-1, St=sj|St-1=si) P(wt,n, |St=sj) (consequence of Markov Assumption) = Σj=1 σ P(wt-1, St=sj|St-1=si) βj(t)

33 Problem 1 of the three basic problems

34 Problem 1 (contd) Order 2TNT Definitely not efficient!!
Is there a method to tackle this problem? Yes. Forward or Backward Procedure

35 Forward Procedure Forward Step:

36 Forward Procedure

37 Backward Procedure

38 Backward Procedure

39 Forward Backward Procedure
Benefit Order N2T as compared to 2TNT for simple computation Only Forward or Backward procedure needed for Problem 1

40 Problem 2 Given Observation Sequence O ={o1… oT} Solution :
Get “best” Q ={q1… qT} i.e. Solution : Best state individually likely at a position i Best state given all the previously observed states and observations Viterbi Algorithm

41 Viterbi Algorithm Define such that,
i.e. the sequence which has the best joint probability so far. By induction, we have,

42 Viterbi Algorithm

43 Viterbi Algorithm

44 Problem 3 How to adjust to best maximize Solutions : Re-estimate λ
To re-estimate (iteratively update and improve) HMM parameters A,B, π Use Baum-Welch algorithm

45 Baum-Welch Algorithm Define Putting forward and backward variables

46 Baum-Welch algorithm

47 Define Then, expected number of transitions from Si And, expected number of transitions from Sj to Si

48

49 Baum-Welch Algorithm Baum et al have proved that the above equations lead to a model as good or better than the previous


Download ppt "Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya."

Similar presentations


Ads by Google