# Markov Chains Lecture #5

## Presentation on theme: "Markov Chains Lecture #5"— Presentation transcript:

Markov Chains Lecture #5
Background Readings: Durbin et. al. Section 3.1, .

Dependencies along the genome
In previous classes we assumed every letter in a sequence is sampled randomly from some distribution q() over the alpha bet {A,C,T,G}. This model could suffice for alignment scoring, but it is not the case in true genomes. There are special subsequences in the genome, like TATA within the regulatory area, upstream a gene. The pairs C followed by G is less common than expected for random sampling. We model such dependencies by Markov chains and hidden Markov model, which we define next.

Finite Markov Chain An integer time stochastic process, consisting of a domain D of m states {s1,…,sm} and An m dimensional initial distribution vector ( p(s1),.., p(sm)). An m×m transition probabilities matrix M= (asisj) For example, D can be the letters {A, C, T, G}, p(A) the probability of A to be the 1st letter in a sequence, and aAG the probability that G follows A in a sequence.

Markov Chain (cont.) X1 X2 Xn-1 Xn For each integer n, a Markov Chain assigns probability to sequences (x1…xn) over D (i.e, xi D) as follows: Similarly, (X1,…, Xi ,…)is a sequence of probability distributions over D. Question: How these distributions changes in time?

Matrix Representation
1 0.8 0.2 0.3 0.5 0.05 0.95 A B C D The transition probabilities Matrix M =(ast) M is a stochastic Matrix: The initial distribution vector (u1…um) defines the distribution of X1 (p(X1=si)=ui) . Then after one move, the distribution is changed to X2 = X1M After i moves the distribution is Xi = X1Mi-1

Stationary Distributions
A distribution V = (v1,…,vm) is stationary if VM=V If V is a stationary distribution then V is a left (row) Eigenvector of M with Eigenvalue 1. Exercise: A stochastic matrix has a real left Eigenvector with Eigenvalue 1 (hint: show that a stochastic matrix has a right Eigenvector with Eigenvalue 1. Note that the left Eigenvalues of a Matrix are the same as the right Eiganvlues). [It can be shown that the above Eigenvector V can be non-negative. Hence each Markov Chain has a stationary distribution.]

“Good” Markov chains For certain Markov Chains, the distributions Xi , as i∞: (1) converge to a unique distribution, independent of the initial distribution. (2) In that unique distribution, each state has a positive probability. Call these Markov Chain “good”. We describe these “good” Markov Chains by considering Graph representation of Stochastic matrices.

Representation as a Digraph
B C D 0.2 0.3 0.5 0.05 0.95 0.8 1 1 0.8 0.2 0.3 0.5 0.05 0.95 A B C D Each directed edge AB is associated with the positive transition probability from A to B. We now define properties of this graph which guarantee: Convergence to unique distribution: In that distribution, each state has positive probability.

Examples of “Bad” Markov Chains
Markov chains are not “good” if either : They do not converge to a unique distribution. They do converge to u.d., but some states in this distribution have zero probability.

Bad case 1: Mutual Unreachabaility
Consider two initial distributions: p(X1=A)=1 (p(X1 = x)=0 if x≠A). p(X1= C) = 1 In case a), the sequence will stay at A forever. In case b), it will stay in {C,D} for ever. Fact 1: If G has two states which are unreachable from each other, then {Xi} cannot converge to a distribution which is independent on the initial distribution.

Bad case 2: Transient States
Def: A state s is recurrent if it can be reached from any state reachable from s; otherwise it is transient. A B C D A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back.

Bad case 2: Transient States
Fact 2: For each initial distribution, with probability 1 a transient state will be visited only a finite number of times. X Proof: Let A be a transient state, and let X be the set of states from which A is unreachable. It is enough to show that, starting from any state, with probability 1 a state in X is reached after a finite number of steps (Exercise: complete the proof)

Bad case 3: Periodic States
A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. Exercise: All the states in the same strongly connected component have the same period (in the shown graph the period is 2). A Markov Chain is periodic if all the states in it have a period k >1. It is aperiodic otherwise. Example: Consider the initial distribution p(B)=1. Then states {B, C} are visited (with positive probability) only in odd steps, and states {A, D, E} are visited in only even steps.

Bad case 3: Periodic States
Fact 3: In a periodic Markov Chain (of period k >1) there are initial distributions under which the states are visited in a periodic manner. Under such initial distributions Xi does not converge as i∞.

Ergodic Markov Chains The Fundamental Theorem of Finite Markov Chains:
B C D 0.2 0.3 0.5 0.05 0.95 0.8 1 A Markov chain is ergodic if : All states are recurrent (ie, the graph is strongly connected) It is not peridoic The Fundamental Theorem of Finite Markov Chains: If a Markov Chain is ergodic, then It has a unique stationary distribution vector V > 0, which is an Eigenvector of the transition matrix. The distributions Xi , as i∞, converges to V.

Use of Markov Chains in Genome search: Modeling CpG Islands
In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

Example: CpG Island (Cont.)
We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length ? We “solve” the first question by modeling strings with and without CpG islands as Markov Chains over the same states {A,C,G,T} but different transition probabilities:

Example: CpG Island (Cont.)
The “+” model: Use transition matrix A+ = (a+st), Where: a+st = (the probability that t follows s in a CpG island) The “-” model: Use transition matrix A- = (a-st), Where: a-st = (the probability that t follows s in a non CpG island)

Example: CpG Island (Cont.)
With this model, to solve Question 1 we need to decide whether a given short sequence of letters is more likely to come from the “+” model or from the “–” model. This is done by using the definitions of Markov Chain. [to solve Question 2 we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model. This is done by using the Hidden Markov Model, to be defined later.] We start with Question 1:

Question 1: Using two Markov chains
A+ (For CpG islands): We need to specify p+(xi | xi-1) where + stands for CpG Island. From Durbin et al we have: Xi A C G T 0.18 0.27 0.43 0.12 0.17 p+(C | C) 0.274 p+(T|C) 0.16 p+(C|G) p+(G|G) p+(T|G) 0.08 p+(C |T) p+(G|T) p+(T|T) Xi-1 (Recall: rows must add up to one; columns need not.)

Question 1: Using two Markov chains
A- (For non-CpG islands): …and for p-(xi | xi-1) (where “-” stands for Non CpG island) we have: Xi A C G T 0.3 0.2 0.29 0.21 0.32 p-(C|C) 0.078 p-(T|C) 0.25 p-(C|G) p-(G|G) p-(T|G) 0.18 p-(C|T) p-(G|T) p-(T|T) Xi-1

Discriminating between the two models
X1 X2 XL-1 XL Given a string x=(x1….xL), now compute the ratio If RATIO>1, CpG island is more likely. Actually – the log of this ratio is computed: Note: p+(x1|x0) is defined for convenience as p+(x1). p-(x1|x0) is defined for convenience as p-(x1).

Log Odds-Ratio test Taking logarithm yields
If logQ > 0, then + is more likely (CpG island). If logQ < 0, then - is more likely (non-CpG island).

Where do the parameters (transition- probabilities) come from ?
Learning from complete data, namely, when the label is given and every xi is measured: Source: A collection of sequences from CpG islands, and a collection of sequences from non-CpG islands. Input: Tuples of the form (x1, …, xL, h), where h is + or - Output: Maximum Likelihood parameters (MLE) Count all pairs (Xi=a, Xi-1=b) with label +, and with label -, say the numbers are Nba,+ and Nba,- .

Maximum Likelihood Estimate (MLE) of the parameters (using labeled data)
XL-1 XL The needed parameters are: P+(x1), p+ (xi | xi-1), p-(x1), p-(xi | xi-1) The ML estimates are given by: Where Na,+ is the number of times letter a appear in CpG islands in the dataset. Where Nba,+ is the number of times letter b appears after letter a in CpG islands in the dataset. Using MLE is justified when we have a large sample. The numbers appearing in the text book are based on 60,000 nucleotides. When only small samples are available, Bayesian learning is an attractive alternative, which we will cover soon.