. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

Probabilistic sequence modeling II: Markov chains Haixu Tang School of Informatics.
ST3236: Stochastic Process Tutorial 3 TA: Mar Choong Hock Exercises: 4.
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Markov Chains Modified by Longin Jan Latecki
Phylogenetic Trees Lecture 4
Markov Chains 1.
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
1 Markov Chains (covered in Sections 1.1, 1.6, 6.3, and 9.4)
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models Tunghai University Fall 2005.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
. Hidden Markov Models - HMM Tutorial #5 © Ydo Wexler & Dan Geiger.
Topics Review of DTMC Classification of states Economic analysis
Hidden Markov Models Eine Einführung.
Lecture 12 – Discrete-Time Markov Chains
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Chapter 17 Markov Chains.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Markov Chain Part 2 多媒體系統研究群 指導老師:林朝興 博士 學生:鄭義繼. Outline Review Classification of States of a Markov Chain First passage times Absorbing States.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Markov Chains Lecture #5
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Markov Models Charles Yan Spring Markov Models.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Markov Chains Chapter 16.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3.1 Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
. Markov Chains Tutorial #5 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Computational Intelligence II Lecturer: Professor Pekka Toivanen Exercises: Nina Rogelj
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
 { X n : n =0, 1, 2,...} is a discrete time stochastic process Markov Chains.
Chapter 3 : Problems 7, 11, 14 Chapter 4 : Problems 5, 6, 14 Due date : Monday, March 15, 2004 Assignment 3.
MS Sequence Clustering
8/14/04J. Bard and J. W. Barnes Operations Research Models and Methods Copyright All rights reserved Lecture 12 – Discrete-Time Markov Chains Topics.
Relevant Subgraph Extraction Longin Jan Latecki Based on : P. Dupont, J. Callut, G. Dooms, J.-N. Monette and Y. Deville. Relevant subgraph extraction from.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Discrete Time Markov Chains
Markov Chain Models BMI/CS 576 Colin Dewey Fall 2015.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
COMS Network Theory Week 5: October 6, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.
Meaning of Markov Chain Markov Chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only.
11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
From DeGroot & Schervish. Example Occupied Telephone Lines Suppose that a certain business office has five telephone lines and that any number of these.
Let E denote some event. Define a random variable X by Computing probabilities by conditioning.
Hidden Markov Models BMI/CS 576
Markov Chains and Random Walks
Discrete-time markov chain (continuation)
Markov Chains Mixing Times Lecture 5
Markov Chains Tutorial #5
Markov Chains Lecture #5
Markov Chains Tutorial #5
Hidden Markov Model Lecture #6
Presentation transcript:

. Markov Chains

2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution q(  ) over the alpha bet {A,C,T,G}. This model could suffice for alignment scoring, but it is not the case in true genomes. 1.There are special subsequences in the genome, like TATA within the regulatory area, upstream a gene. 2.The pairs C followed by G is less common than expected for random sampling. We model such dependencies by Markov chains and hidden Markov model, which we define next.

3 Finite Markov Chain An integer time stochastic process, consisting of a domain D of m states {s 1,…,s m } and 1.An m dimensional initial distribution vector ( p(s 1 ),.., p(s m )). 2.An m×m transition probabilities matrix M= (a s i s j ) For example, D can be the letters {A, C, T, G}, p(A) the probability of A to be the 1 st letter in a sequence, and a AG the probability that G follows A in a sequence.

4 Simple Model - Markov Chains Markov Property: The state of the system at time t+1 only depends on the state of the system at time t X1X1 X2X2 X3X3 X4X4 X5X5

5 Markov Chain (cont.) X1X1 X2X2 X n-1 XnXn For each integer n, a Markov Chain assigns probability to sequences (x 1 …x n ) over D (i.e, x i D) as follows: Similarly, (X 1,…, X i,…)is a sequence of probability distributions over D.

6 Matrix Representation AB B A C C D D Then after one move, the distribution is changed to X 2 = X 1 M After i moves the distribution is X i = X 1 M i-1 M is a stochastic Matrix: The initial distribution vector (u 1 …u m ) defines the distribution of X 1 (p(X 1 =s i )=u i ). The transition probabilities Matrix M =(a st )

7 Weather: –raining today rain tomorrow p rr = 0.4 –raining today no rain tomorrow p rn = 0.6 –no raining today rain tomorrow p nr = 0.2 –no raining today no rain tomorrow p rr = 0.8 Simple Example

8 Transition Matrix for Example Note that rows sum to 1 Such a matrix is called a Stochastic Matrix If the rows of a matrix and the columns of a matrix all sum to 1, we have a Doubly Stochastic Matrix

9 Gambler’s Example – At each play we have the following: Gambler wins $1 with probability p Gambler loses $1 with probability 1-p – Game ends when gambler goes broke, or gains a fortune of $100 – Both $0 and $100 are absorbing states 01 2 N-1 N p p p p 1-p Start (10$) or

10 Coke vs. Pepsi Given that a person’s last cola purchase was Coke, there is a 90% chance that her next cola purchase will also be Coke. If a person’s last cola purchase was Pepsi, there is an 80% chance that her next cola purchase will also be Pepsi. coke pepsi

11 Coke vs. Pepsi Given that a person is currently a Pepsi purchaser, what is the probability that she will purchase Coke two purchases from now? The transition matrix is: (Corresponding to one purchase ahead)

12 Coke vs. Pepsi Given that a person is currently a Coke drinker, what is the probability that she will purchase Pepsi three purchases from now?

13 Coke vs. Pepsi Assume each person makes one cola purchase per week. Suppose 60% of all people now drink Coke, and 40% drink Pepsi. What fraction of people will be drinking Coke three weeks from now? Let (Q 0,Q 1 )=(0.6,0.4) be the initial probabilities. We will regard Coke as 0 and Pepsi as 1 We want to find P(X 3 =0) P 00

14 “Good” Markov chains For certain Markov Chains, the distributions X i, as i  ∞: (1) converge to a unique distribution, independent of the initial distribution. (2) In that unique distribution, each state has a positive probability. Call these Markov Chain “good”. We describe these “good” Markov Chains by considering Graph representation of Stochastic matrices.

15 Representation as a Digraph Each directed edge A  B is associated with the positive transition probability from A to B. A B C D We now define properties of this graph which guarantee: 1.Convergence to unique distribution: 2.In that distribution, each state has positive probability AB B A C C D D

16 Examples of “Bad” Markov Chains Markov chains are not “good” if either : 1. They do not converge to a unique distribution. 2. They do converge to u.d., but some states in this distribution have zero probability.

17 Bad case 1: Mutual Unreachabaility A B C D In case a), the sequence will stay at A forever. In case b), it will stay in {C,D} for ever. Fact 1: If G has two states which are unreachable from each other, then {X i } cannot converge to a distribution which is independent on the initial distribution. Consider two initial distributions: a) p(X 1 =A)=1 (p(X 1 = x)=0 if x≠A). b) p(X 1 = C) = 1

18 Bad case 2: Transient States A B C D A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back. Def: A state s is recurrent if it can be reached from any state reachable from s; otherwise it is transient.

19 Bad case 2: Transient States A B C D Fact 2: For each initial distribution, with probability 1 a transient state will be visited only a finite number of times. X

20 Bad case 3: Periodic States A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. A B C D E A Markov Chain is periodic if all the states in it have a period k >1. It is aperiodic otherwise. Example: Consider the initial distribution p(B)=1. Then states {B, C} are visited (with positive probability) only in odd steps, and states {A, D, E} are visited in only even steps.

21 Bad case 3: Periodic States A B C D E Fact 3: In a periodic Markov Chain (of period k >1) there are initial distributions under which the states are visited in a periodic manner. Under such initial distributions X i does not converge as i  ∞.

22 Ergodic Markov Chains The Fundamental Theorem of Finite Markov Chains: If a Markov Chain is ergodic, then 1.It has a unique stationary distribution vector V > 0, which is an Eigenvector of the transition matrix. 2.The distributions X i, as i  ∞, converges to V. A B C D A Markov chain is ergodic if : 1.All states are recurrent (ie, the graph is strongly connected) 2.It is not periodic

23 Use of Markov Chains in Genome search: Modeling CpG Islands In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

24 Example: CpG Island (Cont.) We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ? We “solve” the first question by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:

25 Example: CpG Island (Cont.) The “+” model: Use transition matrix A + = (a + st ), Where: a + st = (the probability that t follows s in a CpG island) The “-” model: Use transition matrix A - = (a - st ), Where: a - st = (the probability that t follows s in a non CpG island)

26 Example: CpG Island (Cont.) With this model, to solve Question 1 we need to decide whether a given short sequence of letters is more likely to come from the “+” model or from the “–” model. This is done by using the definitions of Markov Chain. [to solve Question 2 we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model. This is done by using the Hidden Markov Model, to be defined later.] We start with Question 1:

27 Question 1: Using two Markov chains A + (For CpG islands): X i-1 XiXi ACGT A C 0.17p + (C | C)0.274p + (T|C) G 0.16p + (C|G)p + (G|G)p + (T|G) T 0.08p + (C |T) p + (G|T)p + (T|T) We need to specify p + (x i | x i-1 ) where + stands for CpG Island. From Durbin et al we have: (Recall: rows must add up to one; columns need not.)

28 Question 1: Using two Markov chains A - (For non-CpG islands): X i-1 XiXi ACGT A C 0.32p - (C|C)0.078p - (T|C) G 0.25p - (C|G) p - (G|G) p - (T|G) T 0.18p - (C|T)p - (G|T)p - (T|T) …and for p - (x i | x i-1 ) (where “-” stands for Non CpG island) we have:

29 Discriminating between the two models Given a string x=(x 1 ….x L ), now compute the ratio If RATIO>1, CpG island is more likely. Actually – the log of this ratio is computed: X1X1 X2X2 X L-1 XLXL Note: p + (x 1 |x 0 ) is defined for convenience as p + (x 1 ). p - (x1|x0) is defined for convenience as p - (x1).

30 Log Odds-Ratio test Taking logarithm yields If logQ > 0, then + is more likely (CpG island). If logQ < 0, then - is more likely (non-CpG island).

31 Where do the parameters (transition- probabilities) come from ? Learning from complete data, namely, when the label is given and every x i is measured: Source: A collection of sequences from CpG islands, and a collection of sequences from non-CpG islands. Input: Tuples of the form (x 1, …, x L, h), where h is + or - Output: Maximum Likelihood parameters (MLE) Count all pairs (X i =a, X i-1 =b) with label +, and with label -, say the numbers are N ba,+ and N ba,-.

32 Maximum Likelihood Estimate (MLE) of the parameters (using labeled data) The needed parameters are: P + (x 1 ), p + (x i | x i-1 ), p - (x 1 ), p - (x i | x i-1 ) The ML estimates are given by: X1X1 X2X2 X L-1 XLXL Where N a,+ is the number of times letter a appear in CpG islands in the dataset. Where N ba,+ is the number of times letter b appears after letter a in CpG islands in the dataset.