1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.

Slides:



Advertisements
Similar presentations
Markov models and applications
Advertisements

Discrete time Markov Chain
1 Introduction to Discrete-Time Markov Chain. 2 Motivation  many dependent systems, e.g.,  inventory across periods  state of a machine  customers.
Probabilistic sequence modeling II: Markov chains Haixu Tang School of Informatics.
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
IERG5300 Tutorial 1 Discrete-time Markov Chain
Markov Chains 1.
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models Tunghai University Fall 2005.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
. Hidden Markov Models - HMM Tutorial #5 © Ydo Wexler & Dan Geiger.
Topics Review of DTMC Classification of states Economic analysis
Markov Models Charles Yan Markov Chains A Markov process is a stochastic process (random process) in which the probability distribution of the.
TCOM 501: Networking Theory & Fundamentals
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Chapter 17 Markov Chains.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Entropy Rates of a Stochastic Process
Tutorial 8 Markov Chains. 2  Consider a sequence of random variables X 0, X 1, …, and the set of possible values of these random variables is {0, 1,
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
Overview of Markov chains David Gleich Purdue University Network & Matrix Computations Computer Science 15 Sept 2011.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
Markov Chains Lecture #5
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Review.
Markov Models Charles Yan Spring Markov Models.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Hidden Markov Models Lecture #5 Prepared by Dan Geiger. Background Readings: Chapter 3 in the text book (Durbin et al.).
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models.
Markov Chains Chapter 16.
Markov models and applications Sushmita Roy BMI/CS 576 Oct 7 th, 2014.
INDR 343 Problem Session
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
. Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3.1 Prepared by Shlomo Moran, based on Danny Geiger’s and Nir Friedman’s.
. Markov Chains Tutorial #5 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
CS6800 Advanced Theory of Computation Fall 2012 Vinay B Gavirangaswamy
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Entropy Rate of a Markov Chain
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Computational Intelligence II Lecturer: Professor Pekka Toivanen Exercises: Nina Rogelj
. Correctness proof of EM Variants of HMM Sequence Alignment via HMM Lecture # 10 This class has been edited from Nir Friedman’s lecture. Changes made.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Theory of Computations III CS-6800 |SPRING
8/14/04J. Bard and J. W. Barnes Operations Research Models and Methods Copyright All rights reserved Lecture 12 – Discrete-Time Markov Chains Topics.
Relevant Subgraph Extraction Longin Jan Latecki Based on : P. Dupont, J. Callut, G. Dooms, J.-N. Monette and Y. Deville. Relevant subgraph extraction from.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Discrete Time Markov Chains
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
COMS Network Theory Week 5: October 6, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.
11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Markov Chains and Random Walks
Discrete-time markov chain (continuation)
Markov Chains Tutorial #5
Markov Chains Lecture #5
Markov Chains Tutorial #5
Hidden Markov Model Lecture #6
Presentation transcript:

1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler

2 So far we assumed every letter in a sequence is sampled randomly from some distribution q() This model could suffice for alignment scoring, but it is not the case in true genomes. There are special subsequences in the genome in which dependencies between nucleotides exist Example 1: TATA within the regulatory area, upstream a gene. Example 2: CG pairs We model such dependencies by Markov chains and hidden Markov Models (HMMs) Dependencies Along Biological Sequences

3 Markov Chains A chain of random variables in which the next one depends only on the current Given X=x 1 …x n, then P ( x i |x 1 …x i-1 ) =P ( x i |x i-1 ) The general case: k th –order Markov process Given X=x 1 …x n, then P ( x i |x 1 …x i-1 ) =P ( x i |x i-1 …x i-k ) X1X1 X2X2 X n-1 XnXn

4 Markov Chains An integer time stochastic process, consisting of a domain D of m>1 states { s 1,…,s m } and An m dimensional initial distribution vector ( p ( s 1 ),.., p ( s, )). An m x m transition probabilities matrix M = ( a ij ) For example: D can be the letters { A, C, G, T } p ( A ) the probability of A to be the 1 st letter in a sequence a AG the probability that G follows A in a sequence.

5 Markov Chains For each integer n, a Markov Chain assigns probability to sequences ( x 1 … x n ) over D (i.e, x i  D ) as follows: Similarly, ( x 1 … x i …) is a sequence of probability distributions over D. There is a rich theory which studies the properties of these sequences.

6 Matrix Representation The transition probabilities Matrix M =( a st ) M is a stochastic Matrix: The initial distribution vector ( U 1 … U m ) defines the distribution of X 1 P ( X 1 = s i )= U i Then after one move, the distribution changes to x 2 = x 1 M AB B A C C D D

7 Matrix Representation Example: if X 1 =(0, 1, 0, 0) Then X 2 =(0.2, 0.5, 0, 0.3) And if X 1 =(0, 0, 0.5, 0.5) then X 2 =(0, 0.1, 0.5, 0.4). The i th distribution is X i = X 1 M i AB B A C C D D

8 Representing a Markov Model as a Digraph AB B A C C D D A B C D Each directed edge A  B is associated with the transition probability from A to B.

9 Markov Chains – Weather Example Weather forecast: raining today  40% rain tomorrow  60% no rain tomorrow No rain today  20% rain tomorrow  80% no rain tomorrow Stochastic FSM: rain no rain

10 Markov Chains – Gambler Example Gambler starts with 10$ At each play we have one of the following: Gambler wins 1$ with probability p Gambler looses 1$ with probability 1-p Game ends when gambler goes broke, or gains a fortune of 100$ p p p p 1-p Start (10$)

11 Properties of Markov Chain States States of Markov chains are classified by the digraph representation (omitting the actual probability values) Recurrent states: s is recurrent if it is accessible from all states that are accessible from s. C and D are recurrent states. Transient states: “ s is transient” if it will be visited a finite number of times as n . A and B are transient states. A B C D

12 Irreducible Markov Chains A Markov Chain is irreducible if the corresponding graph is strongly connected (and thus all its states are recurrent). A B C D A B C D E

13 Properties of Markov Chain states A state s has a period k if k is the GCD of the lengths of all the cycles that pass via s. Periodic states A state is periodic if it has a period k>1. in the shown graph the period of A is 2. Aperiodic states A state is aperiodic if it has a period k=1. in the shown graph the period of F is 1. A B C D E F

14 Ergodic Markov Chains A Markov chain is ergodic if: the corresponding graph is irreducible. It is not peridoic Ergodic Markov Chains are important they guarantee the corresponding Markovian process converges to a unique distribution, in which all states have strictly positive probability. A B C D

15 Stationary Distributions for Markov Chains Let M be a Markov Chain of m states, and let V =( v 1,…, v m ) be a probability distribution over the m states V =( v 1,…, v m ) is stationary distribution for M if VM = V.  one step of the process does not change the distribution V is a stationary distribution V is a left (row) Eigenvector of M with Eigenvalue 1

16 “Good” Markov chains A Markov Chains is good if the distributions X i satisfy the following as i  : converge to a unique distribution, independent of the initial distribution In that unique distribution, each state has a positive probability The Fundamental Theorem of Finite Markov Chains: A Markov Chain is good  the corresponding graph is ergodic.

17 “Bad” Markov Chains A Markov chains is not “good” if either : It does not converge to a unique distribution It does converge to a unique distribution, but some states in this distribution have zero probability For instance: Chains with periodic states Chains with transient states

18 An Example: Searching the Genome for CpG Islands In the human genome, the pair CG appears less than expected the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genome such as in the start regions of many genes. These areas are called CpG islands (p denotes “pair”).

19 CpG Islands We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? Question 2: Given a long piece of genomic data, does it contain CpG islands in it, where, what length? We “solve” the first question by modeling strings with and without CpG islands as Markov Chains States are {A,C,G,T} but Transition probabilities are different

20 CpG Islands The “+” model Use transition matrix A + =( a + st ), Where: a + st = (the probability that t follows s in a CpG island) The “-” model Use transition matrix A - =( a - st ), Where: A - st = (the probability that t follows s in a non CpG island)

21 CpG Islands To solve Question 1 we need to decide whether a given short sequence of letters is more likely to come from the “+” model or from the “–” model. This is done by using the definitions of Markov Chain, in which the parameters are determined by known data and the log odds-ratio test.

22 CpG Islands – the “+” Model We need to specify p + ( x i | x i-1 ) where + stands for CpG Island. From Durbin et al we have: ACGT A C G T

23 CpG Islands – the “-” Model p - ( x i | x i-1 ) for non-CpG Island is given by: ACGT A C G T

24 CpG Islands Given a string X =( x 1,…, x L ), now compute the ratio RATIO>1  CpG island is more likely RATIO<1  non-CpG island is more likely. X1X1 X2X2 X L-1 XLXL