. Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3.1 Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Probabilistic sequence modeling II: Markov chains Haixu Tang School of Informatics.
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Lecture 6  Calculating P n – how do we raise a matrix to the n th power?  Ergodicity in Markov Chains.  When does a chain have equilibrium probabilities?
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
CS433 Modeling and Simulation Lecture 06 – Part 03 Discrete Markov Chains Dr. Anis Koubâa 12 Apr 2009 Al-Imam Mohammad Ibn Saud University.
Markov Chains 1.
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models Eine Einführung.
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Entropy Rates of a Stochastic Process
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Markov Chains Lecture #5
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
. EM algorithm and applications Lecture #9 Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Markov Models Charles Yan Spring Markov Models.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
CpG islands in DNA sequences
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
. Computational Genomics Lecture #3a (revised 24/3/09) This class has been edited from Nir Friedman’s lecture which is available at
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
Stochastic Process1 Indexed collection of random variables {X t } t   for each t  T  X t is a random variable T = Index Set State Space = range.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Induction and recursion
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Channel Capacity.
Set, Combinatorics, Probability & Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Set,
Computational Intelligence II Lecturer: Professor Pekka Toivanen Exercises: Nina Rogelj
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Discrete Time Markov Chains
CompSci 102 Discrete Math for Computer Science February 7, 2012 Prof. Rodger Slides modified from Rosen.
Markov Chain Models BMI/CS 576 Colin Dewey Fall 2015.
Mathematical Induction Section 5.1. Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
SECTION 9 Orbits, Cycles, and the Alternating Groups Given a set A, a relation in A is defined by : For a, b  A, let a  b if and only if b =  n (a)
11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.
Introduction to Set Theory (§1.6) A set is a new type of structure, representing an unordered collection (group, plurality) of zero or more distinct (different)
Hidden Markov Models BMI/CS 576
Markov Chains and Random Walks
Discrete-time markov chain (continuation)
Set, Combinatorics, Probability & Number Theory
Much More About Markov Chains
Markov Chains Mixing Times Lecture 5
Randomized Algorithms Markov Chains and Random Walks
Markov Chains Lecture #5
Hidden Markov Model Lecture #6
Presentation transcript:

. Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3.1 Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s

2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution q(  ) over the alpha bet {A,C,T,G}. This is too restrictive for true genomes. 1.There are special subsequences in the genome, like TATA within the regulatory area, upstream a gene. 2.The pattern CG is less common than expected for random sampling. We model such dependencies by Markov chains and hidden Markov model, which we define next.

3 Finite Markov Chain An integer time stochastic process, consisting of a domain D of m>1 states {s 1,…,s m } and 1.An m dimensional initial distribution vector ( p(s 1 ),.., p(s m )). 2.An m×m transition probabilities matrix M= (m s i s j ) For example, D can be the letters {A, C, T, G}, p(A) the probability of A to be the 1 st letter in a sequence, and m AG the probability that G follows A in a sequence.

4 Markov Chain (cont.) X1X1 X2X2 X n-1 XnXn For each integer n, a Markov Chain assigns probability to sequences (x 1 …x n ) in D n (i.e, x i D) as follows:

5 Markov Chain (cont.) X1X1 X2X2 X n-1 XnXn Similarly, each X i is a probability distributions over D, which is determined by the initial distribution (p 1,..,p n ) and the transition matrix M. There is a rich theory which studies the properties of such “Markov sequences” (X 1,…, X i,…). A bit of this theory is presented next.

6 Matrix Representation AB B A C C D D After one move, the distribution is changed to X 2 = X 1 M M is a stochastic Matrix: The initial distribution vector (u 1 …u m ) defines the distribution of X 1 (p(X 1 =s i )=u i ). The transition probabilities Matrix M =(m st )

7 Matrix Representation AB B A C C D D The i-th distribution is X i = X 1 M i-1 Example: if X 1 =(0, 1, 0, 0) then X 2 =(0.2, 0.5, 0, 0.3) And if X 1 =(0, 0, 0.5, 0.5) then X 2 =(0, 0.1, 0.5, 0.4).

8 Representation of a Markov Chain as a Digraph Each directed edge A  B is associated with the positive transition probability from A to B. A B C D AB B A C C D D

9 Properties of Markov Chain states A B C D States of Markov chains are classified by the digraph representation (omitting the actual probability values) A, C and D are recurrent states: they are in strongly connected components which are sinks in the graph. B is not recurrent – it is a transient state Alternative definitions: A state s is recurrent if it can be reached from any state reachable from s; otherwise it is transient.

10 Another example of Recurrent and Transient States A B C D A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back.

11 Irreducible Markov Chains A Markov Chain is irreducible if the corresponding graph is strongly connected (and thus all its states are recurrent). A B C D A B C D E

12 Periodic States A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. (in the shown graph the period of A is 2). A B C D E A Markov Chain is periodic if all the states in it have a period k >1. It is aperiodic otherwise. Exercise: All the states in the same strongly connected component have the same period

13 Ergodic Markov Chains A B C D A Markov chain is ergodic if : 1.the corresponding graph is strongly connected. 2.It is not peridoic Ergodic Markov Chains are important since they guarantee the corresponding Markovian process converges to a unique distribution, in which each state has a strictly positive probability.

14 Stationary Distributions for Markov Chains Let M be a Markov Chain of m states, and let V = (v 1,…,v m ) be a probability distribution over the m states V = (v 1,…,v m ) is stationary distribution for M if VM=V. (ie, if one step of the process does not change the distribution). V is a stationary distribution V is a non-negative left (row) Eigenvector of M with Eigenvalue 1.

15 Stationary Distributions for a Markov Chain M Exercise: A stochastic matrix always has a real left Eigenvector with Eigenvalue 1 (hint: show that a stochastic matrix has a right Eigenvector with Eigenvalue 1. Note that the left Eigenvalues of a Matrix are the same as the right Eigenvlues). [It can be shown that the above Eigenvector V can be non- negative. Hence each Markov Chain has a stationary distribution.]

16 “Good” Markov chains A Markov Chains is good if the distributions X i, as i  ∞: (1)converge to a unique distribution, independent of the initial distribution. (2) In that unique distribution, each state has a positive probability. The Fundamental Theorem of Finite Markov Chains: A Markov Chain is good  the corresponding graph is ergodic. We will prove the  part, by showing that non-ergodic Markov Chains are not good.

17 Examples of “Bad” Markov Chains A Markov chains is “bad” if either : 1. It does not converge to a unique distribution. 2. It does converge to u.d., but some states in this distribution have zero probability.

18 Bad case 1: Mutual Unreachabaility A B C D In case a), the sequence will stay at A forever. In case b), it will stay in {C,D} for ever. Fact 1: If G has two states which are unreachable from each other, then {X i } cannot converge to a distribution which is independent on the initial distribution. Consider two initial distributions: a) p(X 1 =A)=1 (p(X 1 = x)=0 if x≠A). b) p(X 1 = C) = 1

19 Bad case 2: Transient States A B C D Once the process moves from B to D, it will never come back.

20 Bad case 2: Transient States A B C D Fact 2: For each initial distribution, with probability 1 a transient state will be visited only a finite number of times. Proof: Let A be a transient state, and let X be the set of states from which A is unreachable. It is enough to show that, starting from any state, with probability 1 a state in X is reached after a finite number of steps (Exercise: complete the proof) X

21 Corollary: A good Markov Chain is irreducible

22 Bad case 3: Periodic Markov Chains A B C D E Recall: A Markov Chain is periodic if all the states in it have a period k >1. The above chain has period 2. In the above chain, consider the initial distribution p(B)=1. Then states {B, C} are visited (with positive probability) only in odd steps, and states {A, D, E} are visited in only even steps.

23 Bad case 3: Periodic States A B C D E Fact 3: In a periodic Markov Chain (of period k >1) there are initial distributions under which the states are visited in a periodic manner. Under such initial distributions X i does not converge as i  ∞. Corollary: A good Markov Chain is not periodic

24 The Fundamental Theorem of Finite Markov Chains: If a Markov Chain is ergodic, then 1.It has a unique stationary distribution vector V > 0, which is an Eigenvector of the transition matrix. 2.For any initial distribution, the distributions X i, as i  ∞, converges to V. We have proved that non-ergodic Markov Chains are not good. A proof of the opposite direction below is based on Perron- Frobenius theory on nonnegative matrices, and is beyond the scope of this course:

25 Use of Markov Chains in Genome search: Modeling CpG Islands In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

26 Example: CpG Island (Cont.) We consider two questions: Question 1: Given a short stretch of genomic data, does it come from a CpG island ? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ? We “solve” the first question by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:

27 Example: CpG Island (Cont.) The “+” model: Use transition matrix M + = (m + st ), Where: m + st = (the probability that t follows s in a CpG island) The “-” model: Use transition matrix M - = (m - st ), Where: m - st = (the probability that t follows s in a non CpG island)

28 Example: CpG Island (Cont.) With this model, to solve Question 1 we need to decide whether a given short sequence of letters is more likely to come from the “+” model or from the “–” model. This is done by using the definitions of Markov Chain, in which the parameters are determined by known data and the log odds-ratio test.

29 Question 1: Using two Markov chains M + (For CpG islands): X i-1 XiXi ACGT A C 0.17p + (C | C)0.274p + (T|C) G 0.16p + (C|G)p + (G|G)p + (T|G) T 0.08p + (C |T) p + (G|T)p + (T|T) We need to specify p + (x i | x i-1 ) where + stands for CpG Island. From Durbin et al we have: (Recall: rows must add up to one; columns need not.)

30 Question 1: Using two Markov chains M - (For non-CpG islands): X i-1 XiXi ACGT A C 0.32p - (C|C)0.078p - (T|C) G 0.25p - (C|G) p - (G|G) p - (T|G) T 0.18p - (C|T)p - (G|T)p - (T|T) …and for p - (x i | x i-1 ) (where “-” stands for Non CpG island) we have:

31 Discriminating between the two models Given a string x=(x 1 ….x L ), now compute the ratio If RATIO>1, CpG island is more likely. Actually – the log of this ratio is computed: X1X1 X2X2 X L-1 XLXL Note: p + (x 1 |x 0 ) is defined for convenience as p + (x 1 ). p - (x 1 |x 0 ) is defined for convenience as p - (x 1 ).

32 Log Odds-Ratio test Taking logarithm yields If logQ > 0, then + is more likely (CpG island). If logQ < 0, then - is more likely (non-CpG island).

33 Where do the parameters (transition- probabilities) come from ? Learning from complete data, namely, when the label is given and every x i is measured: Source: A collection of sequences from CpG islands, and a collection of sequences from non-CpG islands. Input: Tuples of the form (x 1, …, x L, h), where h is + or - Output: Maximum Likelihood parameters (MLE) Count all pairs (X i =a, X i-1 =b) with label +, and with label -, say the numbers are N ba,+ and N ba,-.

34 Maximum Likelihood Estimate (MLE) of the parameters (using labeled data) The needed parameters are: P + (x 1 ), p + (x i | x i-1 ), p - (x 1 ), p - (x i | x i-1 ) The ML estimates are given by: Using MLE is justified when we have a large sample. The numbers appearing in our tables (taken from Durbin et al p. 50) are based on 60,000 nucleotides. X1X1 X2X2 X L-1 XLXL Where N x,+ is the number of times letter x appear in CpG islands in the dataset. Where N bx,+ is the number of times letter x appears after letter b in CpG islands in the dataset.

35 CpG Island: Question 2 Now we solve the 2 nd question: Question 2: Given a long string of genomic data, does it contain CpG islands in it, and where? For this, we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model. Intuitively, we have split the string to two types of substrings: those which are likely to come from the + model and those which are likely to come from the – model.

36 Question 2: Finding CpG Islands Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): C+C+ T+T+ G+G+ A+A+ C-C- T-T- G-G- A-A- The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. Therefore we use here Hidden Markov Model

37 Hidden Markov Model A Markov chain (s 1,…,s L ): and for each state s and a symbol x we have p(X i =x|S i =s) Application in communication: message sent is (s 1,…,s m ) but we receive (x 1,…,x m ). Compute what is the most likely message sent. Application in speech recognition: word said is (s 1,…,s m ) but we recorded (x 1,…,x m ). Compute what is the most likely word said. S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT

38 Hidden Markov Model Notations: Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities: p(X i = b| S i = s) = e s (b) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT For Markov Chains we know: What is p( s,x ) = p(s 1,…,s L ;x 1,…,x L )?

39 Hidden Markov Model p(X i = b| S i = s) = e s (b), means that the probability of x i depends only on the probability of s i. Formally, this is equivalent to the conditional independence assumption: p(X i =x i |x 1,..,x i-1,x i+1,..,x L,s 1,..,s i,..,s L ) = e s i (x i ) S1S1 S2S2 S L-1 SLSL x1x1 x2x2 X L-1 xLxL M M M M TTTT Thus