Download presentation

Presentation is loading. Please wait.

Published byOswald Warner Modified over 4 years ago

1
Apaydin slides with a several modifications and additions by Christoph Eick.

2
Introduction Modeling dependencies in input; no longer iid; e.g the order of observations in a dataset matters: Temporal Sequences: In speech; phonemes in a word (dictionary), words in a sentence (syntax, semantics of the language). Stock market (stock values over time) Spatial Sequences Base pairs in DNA Sequences 2Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

3
Discrete Markov Process N states: S 1, S 2,..., S N State at “time” t, q t = S i First-order Markov P(q t+1 =S j | q t =S i, q t-1 =S k,...) = P(q t+1 =S j | q t =S i ) Transition probabilities a ij ≡ P(q t+1 =S j | q t =S i ) a ij ≥ 0 and Σ j=1 N a ij =1 Initial probabilities π i ≡ P(q 1 =S i ) Σ j=1 N π i =1 3

4
Stochastic Automaton/Markov Chain 4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

5
Three urns each full of balls of one color S 1 : blue, S 2 : red, S 3 : green Example: Balls and Urns 5 1 23

6
Given K example sequences of length T Balls and Urns: Learning 6 Remark: Extract the probabilities from the observed sequences: s1-s2-s1-s3 s2-s1-s1-s2 1 =1/3, 2 =2/3, a 11 =1/3, a 12 =1/3, a 13 =1/3, a 21 =3/4,… s2-s3-s2-s1

7
Hidden Markov Models States are not observable Discrete observations {v 1,v 2,...,v M } are recorded; a probabilistic function of the state Emission probabilities b j (m) ≡ P(O t =v m | q t =S j ) Example: In each urn, there are balls of different colors, but with different probabilities. For each observation sequence, there are multiple state sequences 7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) http://en.wikipedia.org/wiki/Hidden_Markov_model

8
HMM Unfolded in Time 8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) http://a-little-book-of-r-for-bioinformatics.readthedocs.org/en/latest/src/chapter10.htm l

9
Now a more complicated problem 9 We observe: 1 23 What urn sequence create it? 1.1-1-2-2 (somewhat trivial, as states are observable!) 2.(1 or 2)-(1 or 2)-(2 or 3)-(2 or 3) and the potential sequences have different probabilities—e.g drawing a blue ball from urn1 is more likely than from urn2! Markov Chains Hidden Markov Models

10
Another Motivating Example 10

11
Elements of an HMM N: Number of states M: Number of observation symbols A = [a ij ]: N by N state transition probability matrix B = b j (m): N by M observation probability matrix Π = [π i ]: N by 1 initial state probability vector λ = (A, B, Π), parameter set of HMM 11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

12
Three Basic Problems of HMMs 1. Evaluation: Given λ, and sequence O, calculate P (O | λ) 2. Most Likely State Sequence: Given λ and sequence O, find state sequence Q * such that P (Q * | O, λ ) = max Q P (Q | O, λ ) 3. Learning: Given a set of sequence O={O 1,…O k }, find λ * such that λ * is the most like explanation for the sequences in O. P ( O | λ * )=max λ k P ( O k | λ ) 12 (Rabiner, 1989)

13
Forward variable: Evaluation 13 Probability of observing O 1 -…-O t and additionally being in state i Complexity: O(N 2 *T) Using i the probability of the observed sequence can be computed as follows:

14
Backward variable: 14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Probability of observing O t+1 -…-O T and additionally being in state i

15
Finding the Most Likely State Sequence 15 Choose the state that has the highest probability, for each time step: q t * = arg max i γ t (i) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Observe: O 1 …O t O t+1 …O T t (i):=Probability of being in state i at step t.

16
Viterbi’s Algorithm δ t (i) ≡ max q1q2∙∙∙ qt-1 p(q 1 q 2 ∙∙∙q t-1,q t =S i,O 1 ∙∙∙O t | λ) Initialization: δ 1 (i) = π i b i (O 1 ), ψ 1 (i) = 0 Recursion: δ t (j) = max i δ t-1 (i)a ij b j (O t ), ψ t (j) = argmax i δ t-1 (i)a ij Termination: p * = max i δ T (i), q T * = argmax i δ T (i) Path backtracking: q t * = ψ t+1 (q t+1 * ), t=T-1, T-2,..., 1 16 Idea: Combines path probability computations with backtracking over competing paths. Only briefly discussed in 2014!

17
Baum-Welch Algorithm Baum- Welch Algorithm O={O 1,…,O K } Model =(A,B, ) Hidden State Sequence Observed Symbol Sequence O

18
Learning a Model from Sequences O 18 This is a hidden(latent) variable, measuring the probability of going from state i to state j at step t+1 observing O t+1, given a model and an observed sequence O O k. An EM-style algorithm is used! This is a hidden(latent) variable, measuring the probability of being in state i step t observing given a model and an observed sequence O O k.

19
Baum-Welch Algorithm: M-Step 19 Remark: k iterates over the observed sequences O 1,…,O K ; for each individual sequence O r O r and r are computed in the E-step; then, the actual model is computed in the M-step by averaging over the estimates of i,a ij,b j (based on k and k ) for each of the K observed sequences. Probability going from i to j Probability being in i

20
Baum-Welch Algorithm: Summary Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Baum- Welch Algorithm O={O 1,…,O K } Model =(A,B, ) For more discussion see: http://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf http://www.robots.ox.ac.uk/~vgg/rg/slides/hmm.pdf See also: http://www.digplanet.com/wiki/Baum%E2%80%93Welch_algorithmhttp://www.digplanet.com/wiki/Baum%E2%80%93Welch_algorithm

21
Generalization of HMM: Continuous Observations 21 The observations generated at each time step are vectors consisting of k numbers; a multivariate Gaussian with k dimensions is associated with each state j, defining the probabilities of k-dimensional vector v generated when being in state j: Hidden State Sequence Observed Vector Sequence O =(A, ( j, j ) j=1,…n,B)

22
Input-dependent observations: Input-dependent transitions (Meila and Jordan, 1996; Bengio and Frasconi, 1996): Time-delay input: Generalization: HMM with Inputs 22Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google