. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

Slides:



Advertisements
Similar presentations
1 Introduction to Discrete-Time Markov Chain. 2 Motivation  many dependent systems, e.g.,  inventory across periods  state of a machine  customers.
Advertisements

. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Models Chapter 11. CG “islands” The dinucleotide “CG” is rare –C in a “CG” often gets “methylated” and the resulting C then mutates to T.
Markov Chains Modified by Longin Jan Latecki
IERG5300 Tutorial 1 Discrete-time Markov Chain
Markov Chains 1.
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
1 Markov Chains (covered in Sections 1.1, 1.6, 6.3, and 9.4)
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models Tunghai University Fall 2005.
. Hidden Markov Models - HMM Tutorial #5 © Ydo Wexler & Dan Geiger.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Topics Review of DTMC Classification of states Economic analysis
Lecture 12 – Discrete-Time Markov Chains
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Chapter 17 Markov Chains.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Tutorial 8 Markov Chains. 2  Consider a sequence of random variables X 0, X 1, …, and the set of possible values of these random variables is {0, 1,
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
. Computational Genomics Lecture 8a Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
. Inference in HMM Tutorial #6 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Markov Chains Chapter 16.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Markov Chains Tutorial #5 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
. Basic Model For Genetic Linkage Analysis Lecture #5 Prepared by Dan Geiger.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
CS Statistical Machine learning Lecture 24
8/14/04J. Bard and J. W. Barnes Operations Research Models and Methods Copyright All rights reserved Lecture 12 – Discrete-Time Markov Chains Topics.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
. EM in Hidden Markov Models Tutorial 7 © Ydo Wexler & Dan Geiger, revised by Sivan Yogev.
Markov Processes What is a Markov Process?
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Let E denote some event. Define a random variable X by Computing probabilities by conditioning.
Markov Chains Tutorial #5
Hidden Markov Model LR Rabiner
Markov Chains Tutorial #5
Presentation transcript:

. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)

2 Outline u Finite, or Discrete, Markov Models u Hidden Markov Models u Three major questions: u Q1.: Computing the probability of a given observation. A1.: Forward – Backward (Baum Welch) dynamic programming algorithm. u Q2.: Computing the most probable sequence, given an observation. A2.: Viterbi’s dynamic programming Algorithm u Q3.: Learn best model, given an observation,. A3.: Expectation Maximization (EM): A Heuristic.

3 Markov Models u A discrete (finite) system: l N distinct states. l Begins (at time t=1) in some initial state(s). l At each time step (t=1,2,…) the system moves from current to next state (possibly the same as the current state) according to transition probabilities associated with current state. u This kind of system is called a finite, or discrete Markov model u After Andrei Andreyevich Markov ( )

4 Outline u Markov Chains (Markov Models) u Hidden Markov Chains (HMMs) u Algorithmic Questions u Biological Relevance

5 Discrete Markov Model: Example u Discrete Markov Model with 5 states. u Each a ij represents the probability of moving from state i to state j u The a ij are given in a matrix A = {a ij }  The probability to start in a given state i is  i, The vector  repre- sents these start probabilities.

6 Markov Property Markov Property: The state of the system at time t+1 depends only on the state of the system at time t X t=1 X t=2 X t=3 X t=4 X t=5

7 Markov Chains Stationarity Assumption Probabilities independent of t when process is “stationary” So, This means that if system is in state i, the probability that the system will next move to state j is p ij, no matter what the value of t is

8 raining today rain tomorrow p rr = 0.4 raining today no rain tomorrow p rn = 0.6 no raining today rain tomorrow p nr = 0.2 no raining today no rain tomorrow p rr = 0.8 Simple Minded Weather Example

9 Transition matrix for our example Note that rows sum to 1 Such a matrix is called a Stochastic Matrix If the rows of a matrix and the columns of a matrix all sum to 1, we have a Doubly Stochastic Matrix

10 Coke vs. Pepsi Given that a person’s last cola purchase was Coke ™, there is a 90% chance that her next cola purchase will also be Coke ™. If that person’s last cola purchase was Pepsi™, there is an 80% chance that her next cola purchase will also be Pepsi™. coke pepsi

11 Coke vs. Pepsi Given that a person is currently a Pepsi purchaser, what is the probability that she will purchase Coke two purchases from now? The transition matrices are: (corresponding to one purchase ahead)

12 Coke vs. Pepsi Given that a person is currently a Coke drinker, what is the probability that she will purchase Pepsi three purchases from now?

13 Coke vs. Pepsi Assume each person makes one cola purchase per week. Suppose 60% of all people now drink Coke, and 40% drink Pepsi. What fraction of people will be drinking Coke three weeks from now? Let (Q 0,Q 1 )=(0.6,0.4) be the initial probabilities. We will regard Coke as 0 and Pepsi as 1 We want to find P(X 3 =0) P 00

14 Equilibrium (Stationary) Distribution coke pepsi u Suppose 60% of all people now drink Coke, and 40% drink Pepsi. What fraction will be drinking Coke 10,100,1000,10000 … weeks from now? u For each week, probability is well defined. But does it converge to some equilibrium distribution [p 0,p 1 ]? u If it does, then eqs. : 0.9 p p 1 = p 0, 0.8 p p 0 = p 1 must hold, yielding p 0 = 2/3, p 1 =1/3.

15 Simulation: Markov Process Coke vs. Pepsi Example (cont) week - i Pr[X i = Coke] 2/3 stationary distribution coke pepsi

16 Equilibrium (Stationary) Distribution Whether or not there is a stationary distribution, and whether or not it is unique if it does exist, are determined by certain properties of the process. Irreducible means that every state is accessible from every other state. Aperiodic means that there exists at least one state for which the transition from that state to itself is possible. Positive recurrent means that the expected return time is finite for every state. coke pepsi

17 Equilibrium (Stationary) Distribution u If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution, and furthermore the process constructed by taking the stationary distribution as the initial distribution is ergodic. Then the average of a function f over samples of the Markov chain is equal to the average with respect to the stationary distribution,ergodic

18 Equilibrium (Stationary) Distribution u Writing P for the transition matrix, a stationary distribution is a vector π which satisfies the equation l Pπ = π. u In this case, the stationary distribution π is an eigenvector of the transition matrix, associated with the eigenvalue 1. eigenvectoreigenvalue

19 Discrete Markov Model - Example u States – Rainy:1, Cloudy:2, Sunny:3 u Matrix A – u Problem – given that the weather on day 1 (t=1) is sunny(3), what is the probability for the observation O:

20 Discrete Markov Model – Example (cont.) u The answer is -

21 Types of Models u Ergodic model Strongly connected - directed path w/ positive probabilities from each state i to state j (but not necessarily complete directed graph)

22 Third Example: A Friendly Gambler Game starts with 10$ in gambler’s pocket – At each round we have the following: Gambler wins 1$ with probability p Gambler loses 1$ with probability 1-p – Game ends when gambler goes broke (no sister in bank), or accumulates a capital of 100$ (including initial capital) – Both 0$ and 100$ are absorbing states 01 2 N-1 N p p p p 1-p Start (10$) or

23 Fourth Example: A Friendly Gambler 01 2 N-1 N p p p p 1-p Start (10$) Irreducible means that every state is accessible from every other state. Aperiodic means that there exists at least one state for which the transition from that state to itself is possible. Positive recurrent means that the expected return time is finite for every state. If the Markov chain is positive recurrent, there exists a stationary distribution. Is the gambler’s chain positive recurrent? Does it have a stationary distribution (independent upon initial distribution)?

24 Let Us Change Gear u Enough with these simple Markov chains. u Our next destination: Hidden Markov chains. 0.9 Fair loaded head tail /2 1/4 3/4 1/2 Start 1/2

25 Hidden Markov Models (probabilistic finite state automata) Often we face scenarios where states cannot be directly observed. We need an extension: Hidden Markov Models a 11 a 22 a 33 a 44 a 12 a 23 a 34 b 11 b 14 b 12 b Observed phenomenon a ij are state transition probabilities. b ik are observation (output) probabilities. b 11 + b 12 + b 13 + b 14 = 1, b 21 + b 22 + b 23 + b 24 = 1, etc.

26 Hidden Markov Models - HMM H1H1 H2H2 H L-1 HLHL X1X1 X2X2 X L-1 XLXL HiHi XiXi Hidden variables Observed data

27 Example: Dishonest Casino Actually, what is hidden in this model?

fair loaded H H T T /2 1/4 3/41/2 Hidden Markov Models - HMM Coin-Tossing Example Fair/Loaded Head/Tail X1X1 X2X2 X L-1 XLXL XiXi H1H1 H2H2 H L-1 HLHL HiHi transition probabilities emission probabilities Q1.: What is the probability of the sequence of observed outcome (e.g. HHHTHTTHHT), given the model?

29 H1 H2HL-1HL X1 X2XL-1XL Hi Xi L tosses Fair/Load ed Head/Tail 0.9 Fair loaded head tail /2 1/4 3/4 1/2 Start 1/2 Loaded Coin Example (cont.) Q1.: What is the probability of the sequence of observed outcome (e.g. HHHTHTTHHT), given the model?

30 HMMs – Question I  Given an observation sequence O = ( O 1 O 2 O 3 … O L ), and a model M = {A, B,   }  how do we efficiently compute P( O | M ), the probability that the given model M produces the observation O in a run of length L ? u This probability can be viewed as a measure of the quality of the model M. Viewed this way, it enables discrimination/selection among alternative models M 1, M 2, M 3 …

31 HMM Recognition (question I) u For a given model M = { A, B, p} and a given state sequence Q 1 Q 2 Q 3 … Q L,, the probability of an observation sequence O 1 O 2 O 3 … O L is P(O|Q,M) = b Q1O1 b Q2O2 b Q3O3 … b QTOT u For a given hidden Markov model M = { A, B, p} the probability of the state sequence Q 1 Q 2 Q 3 … Q L is (the initial probability of Q 1 is taken to be  Q1 ) P(Q|M) = p Q1 a Q1Q2 a Q2Q3 a Q3Q4 … a QL-1QL u So, for a given HMM, M the probability of an observation sequence O 1 O 2 O 3 … O T is obtained by summing over all possible state sequences

32 HMM – Recognition (cont.) P(O| M) =  Q  P(O|Q) P(Q|M) =  Q  Q 1 b Q 1 O 1 a Q 1 Q 2 b Q 2 O 2 a Q 2 Q 3 b Q 2 O 2 … u Requires summing over exponentially many paths u Can this be made more efficient?

33 HMM – Recognition (cont.) u Why isn’t it efficient? – O(2LQ L ) l For a given state sequence of length L we have about 2L calculations  P(Q|M) =  Q 1 a Q 1 Q 2 a Q 2 Q 3 a Q 3 Q 4 … a Q T-1 Q T H P(O|Q) = b Q 1 O 1 b Q 2 O 2 b Q 3 O 3 … b Q T O T l There are Q L possible state sequence l So, if Q=5, and L=100, then the algorithm requires 200x5 100 computations l Instead, we will use the forward-backward (F-B) algorithm of Baum (68) to do things more efficiently.

34 1. Compute the posteriori belief in H i (specific i) given the evidence {x 1,…,x L } for each of H i ’s values h i, namely, compute p(h i | x 1,…,x L ). 2. Do the same computation for every H i but without repeating the first task L times. Coin-Tossing Example Seeing the set of outcomes {x 1,…,x L }, compute p(loaded | x 1,…,x L ) for each coin toss Q.: what is the most likely sequence of values in the H-nodes to generate the observed data?