Download presentation

Presentation is loading. Please wait.

1
**Markov models and applications**

Sushmita Roy BMI/CS 576 Oct 15th, 2013

2
**Key concepts Markov chains Hidden Markov models**

Computing the probability of a sequence Estimating parameters of a Markov model Hidden Markov models States Emission and transition probabilities Parameter estimation Forward and backward algorithm Viterbi algorithm

3
Why Markov models? So far we have assumed in our models that there is no dependency between consecutive base pair locations This assumption is rarely true in practice Markov models allow us to model the dependencies inherent in sequential data such as DNA or protein sequence

4
**Applications of Markov models**

Genome annotation Given a genome sequence find functional untis of the genome Genes, CpG islands, promoters.. Sequence classification A Hidden Markov model (Profile HMMs) can represent a family of proteins Sequence alignment

5
**Markov models Provide a generative model for sequence data**

A Markov chain is the simplest type of Markov models Described by A collection of states, each state representing a symbol observed in the sequence At each state a symbol is emitted At each state there is a some probability of transitioning to another state

6
**Markov chain model notation**

Xt denote the state at time t aij=P(Xt=j|Xt-1=i) denotes the transition probability from i to j

7
**A Markov chain model for DNA sequence**

.34 .16 .38 .12 transition probabilities A G begin C T state transition

8
Markov chain models can also have an end state; allows the model to represent a distribution over sequences of different lengths preferences for ending sequences with certain symbols A G begin end C T

9
**Computing the probability of a sequence from a first Markov model**

Let X be a sequence of random variables X1 … XL representing a biological sequence from the chain rule of probability

10
**Computing the probability of a sequence from a first Markov model**

Key property of a (1st order) Markov chain: the probability of each Xt depends only on the value of Xt-1 This can be written as the initial probabilities or a transition probability from the “begin” state

11
**Example of computing the probability from a Markov chain**

begin end =P(c )P(g|c)P(g|g)P(t|g)P(E|t) c t What is P(CGGT)?

12
**Learning Markov model parameters**

Transition probabilities Estimated via maximum likelihood

13
**Estimating model parameters**

Assume we have some training data that we know was generated from a first order Markov model Need to estimate transition probabilities P(Xt|Xt-1)

14
**Estimating model parameters**

P(X) P(Xt|Xt-1) Number of times x follows G

15
**Estimating parameters example**

Assume the following training data ACCGCGCTTA GCTTAGTGAC TAGCCGTTAC Fill in the zeroth and first order transition probability values Do we really want to set this to 0? A T G C P(A) 6/30 P(T) 8/30 P(G) 7/30 P(C) 9/30 A 0/5 2/5 3/5 1/7 2/7 0/7 4/7 T G C

16
**Laplace estimates of parameters**

instead of estimating parameters strictly from the data, we could start with some prior belief for each for example, we could use Laplace estimates where represents the number of occurrences of character i pseudocount using Laplace estimates with the sequences gccgcgcttg gcttggtggc tggccgttgc

17
**Estimation for 1st order probabilities**

Using Laplace estimates with the sequences for first order probabilities gccgcgcttg gcttggtggc tggccgttgc

18
**Using Markov chains to classify sequences as CpG islands or not**

CpG islands are isolated regions of the genome corresponding to high concentrations of CG dinucleotides Range from bps For example regions upstream of genes or gene promoters Given a short sequence of DNA, how can we classify it as a CpG island?

19
**Build a Markov chain model for CpG islands**

Learn two Markov models One from sequences that look like CpG (positive) One from sequences that don’t look like CpG (negative) Parameters estimated from ~60,000 nucleotides + A C G T .18 .27 .43 .12 .17 .37 .19 .16 .34 .38 .08 .36 - A C G T .30 .21 .28 .32 .08 .25 .24 .18 .29 G is much more likely to follow A in the CpG positive model CpG Not CpG

20
**Using the Markov chains to classify a new sequence**

Let y denote a new sequence To use the Markov models to classify y we compute the log odds ratio The larger the value of S(y) the more likely is y a CpG island Note: must also normalize by the length

21
**Distribution of scores for CpG and non CpG examples**

22
**Applying the CpG Markov chain models**

Is CGCGA a CpG island? Un-normalized Score= =3.3684 Is CCTGG a CpG island? Un-normalized Score=0.3746

23
**Extensions to Markov chains**

Hidden Markov models (Next lectures) Higher-order Markov models Inhomogeneous Markov models

24
**Order of a Markov model Describes how much history we want to keep**

First order Markov models are most common Second order nth order

25
**Higher order Markov models**

An nth order Markov model over alphabet A is equivalent to a first order Markov model An An corresponds to the alphabet of n-tuples For example A 2nd order Markov model over A,T,G,C And a 1st order Markov model over AA,AT,AG,AC,TA,TG,TC,TT,GA,GT,GC,GG,CA,CT,CC,CG Are equivalent However higher the order the more parameters we need to estimate

26
**A first order Markov chain for a second-order Markov chain with {A,B} alphabet**

AA AB BA BB

27
**Inhomogeneous Markov chains**

So far the Markov chain has the same transition probability for the entire sequence Sometimes it is useful to switch between Markov chains depending on where we are in the sequence For example, recall the genetic code that specifies what triplets of bases code for an amino acid There are different levels of redundancy for each the first, second or third positions This could be modeled by three Markov chains

28
**Inhomogeneous Markov chain**

Let 1, 2 and 3 denote the three Markov chains So our transition probabilities look like aij1, aij2, aij3 Let x1 be in codon position 3 Probability of x2, x3.. would be Exercise: write down the terms for x5 and x6

29
Summary Markov models allow us to model dependencies in sequential data Markov models are described by states, each state corresponding to a letter in our observed alphabet Transition probabilities between states Parameter estimation can be done by counting the number of transitions between consecutive states (first order) Laplace correction is often applied Often used for classifying sequences as CpG islands or not Extensions of Markov models Order Inhomogeneous Markov models

Similar presentations

OK

Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Jit ppt on manufacturing processes Ppt on differential aptitude test Ppt on blue star operation Download ppt on recession in india Ppt on steve jobs as entrepreneur Ppt on brand marketing agencies Ppt on care of public property information Ppt on life and works of william shakespeare Ppt on working of human eye and defects of vision Ppt on stages of group development