3 Markov ChainsWe want a model that generates sequences in which the probability of a symbol depends on the previous symbol only.Transition probabilities:Probability of a sequence:Note:
4 Markov ChainsThe key property of a Markov Chain is that the probability of each symbol xi depends only on the value of the preceeding symbolModelling the beginning and end of sequences
5 Markov ChainsMarkov Chains can be used to discriminate between two options by calculating a likelihood ratioExample: CpG – Islands in human DANNRegions labeled as CpG – islands + modelRegions labeled as non-CpG – islands - modelMaximum Likelihood estimators for the transition probabilities for each modeland analgously for the – model. Cst+ is the number of times letter t followed letter s in the labelled region
6 Markov Chains + A C G T - A C G T From 48 putative CpG – islands of a human DNA one estimates the following transition probabilitiesNote that the tables are asymmetric+ACGT0.1800.2740.4260.1200.1710.3680.1880.1610.3390.3750.1250.0790.3550.3840.182-ACGT0.3000.2050.2850.2100.3220.2980.0780.3020.2480.2460.2080.1770.2390.292
7 Markov ChainsTo use the model for discrimination one calculates the log-odds ratioΒ (bits)ACGT-0.7400.4190.580-0.803-0.9130.3021.812-0.685-0.6240.4610.331-0.730-1.1690.5730.393-0.679
8 Hidden Markov ModelsHow can one find CpG – islands in a long chain of nucleotides?Merge both models into one model with small transition probabilities between the chains.Within each chain the transition probabilities should remain close to the original onesRelabeling of the states:The states A+, C+, G+, T+ emit the symbols A, C, G, TThe relabeling is critical as there is no one to one correspondence between the states and the symbols. From looking at C in isolation one cannot tell whether it was emitted from C+ or C-
9 Hidden Markov Models Formal Definitions Distinguish the sequence of states from the sequence of symbolsCall the state sequence the path π. It follows a simple Markov modelwith transition probabilitiesAs the symbols b are decoupled from the states k new parameters are needed giving the probability that symbol b is seen when in state kThese are known as emission probabilities
10 Hidden Markov Models The Viterbi Algorithm It is the most common decoding algorithm with HMMsIt is a dynamic programming algorithmThere may be many state sequences which give rise to any particular sequence of symbolsBut the corresponding probabilities are very differentCpG – islands:(C+, G+, C+, G+) (C-, G-, C-, G-) (C+, G-, C+, G-)They all generate the symbol sequenceCGCGbut the first has the highest probability
11 Hidden Markov Models Search recursively for the most probable path Suppose the probability vk(i) of the most probable path ending in state k with observation i is known for all states kThen this probability can be calculated for state xi+1 bywith initial condition
13 CpG Islands and CGCG sequence Hidden Markov ModelsCpG Islands and CGCG sequenceVl(i)CGB1A+C+0.130.12G+0.0340.0032T+A-C-0.0026G-0.010T-
14 Hidden Markov Models The Forward Algorithm As many different paths π can give rise to the same sequence,the probability of a sequencey P(x) isBrute force enumeration is not practical as the number of paths rises exponentially with the length of the sequenceA simple solution is to evaluateat the most probable path only.
15 Hidden Markov ModelsThe full probability P(x) can be calculated in a recursive way with dynamic programming.This is called the forward algorithm.Calculate the probability fk(i) of the observed sequence up to and including xi under the constraint that πi = kThe recursion equation is
16 Hidden Markov Model Forward Algorithm Initialization (i=0): Recursion (i=1…..L):Termination:
17 The Backward Algorithm Hidden Markov ModelThe Backward AlgorithmWhat is the most probable state for an observation xi ?What is the probability P(πi = k | x) that observation xi came from state k given the observed sequence. This is the posterior probability of state k at time i when the emitted sequence is known.First calculate the probability of producing the entire observed sequence with the ith symbol being produced by state k:
19 Posterior Probabilities Hidden Markov ModelsPosterior ProbabilitiesFrom the backward algorithm posterior probabilities can be obtainedwhere P(x) is the result of the forward algorithm.
20 Parameter Estimation for HMMs Hidden Markov ModelParameter Estimation for HMMsTwo problems remain:1) how to choose an appropriate model architecture2) how to assign the transition and emission probabilitiesAssumption: Independent training sequences x1 …. xn are givenConsider the log likelihoodwhere θ represents the set of values of all parameters (akl,el)
21 Estimation with known state sequence Hidden Markov ModelsEstimation with known state sequenceAssume the paths are known for all training sequencesCount the number Akl and Ek(b) of times each particular transition or emission is used in the set of training sequences plus pseudocounts rkl and rk(b), respectively.The Maximum Likelihood estimators for akl and ek(b) are then given by
22 Estimation with unknown paths Hidden Markov ModelsEstimation with unknown pathsIterative procedures must be used to estimate the parametersAll standard algorithms for optimization of continuous functions can be usedOne particular iteration method is standardly used: the Baum – Welch algorithmus-- first estimate the Akl and Ek(b) by considering probable paths for the training sequences using the current values of the akl and ek(b)-- second use the maximum likelihood estimators to obtain new transition and emission parameters-- iterate that process until a stopping criterium is met-- many local maxima exist particularly with large HMMs
23 Baum – Welch Algorithmus Hidden Markov ModelsBaum – Welch AlgorithmusIt calculates the Akl and Ek(b) as the expected number of times each transition or emission is used in the training sequenceIt uses the values of the forward and backward algorithmsThe probability that akl is used at position i in sequence x is
24 Hidden Markov Models Baum – Welch Algorithm The expected number of times akl is used can be derived then by summing over all positions and over all training sequencesThe expected umber of times that letter b appears in state k is given by
25 Baum – Welch Algoritmus Hidden Markov ModelsBaum – Welch AlgoritmusInitialisation: Pick arbitrary model parametersRecurrence: Set all A and E variables to their pseudocount values r or to zeroFor each sequence j=1……n:-- calculate fk(i) for sequence j using the forward algorithm-- calculate bk(i) for sequence j using the backward algorithm-- add the contribution of sequence j to A and E-- calculate the new model parameters maximum likelihood estimator-- calculate the new log likelihood of the modelTermination: stop if log likelihood change is less than threshold
26 Hidden Markov Models Baum – Welch Algorithm The Baum – Welch algorithm is a special case of an Expectation – Maximization AlgorithmAs an alternative Viterbi training can be used as well. There the most probable paths are estimated with the Viterbi algorithm. These are used in the iterative re-estimation process.Convergence is garanteed as the assignment of the paths is a discrete processUnlike Baum – Welch this procedure does not maximise the true likelihood P(x1…..xn|θ) regarded as a function of the model parameters θIt finds the value of θ that maximizes the contribution to the likelihood P(x1…..xn|θ,π*(x1),….., π*(xn)) from the most probable paths for all sequences.