Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.

Slides:



Advertisements
Similar presentations
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Hidden Markov Model.
Hidden Markov Models Chapter 11. CG “islands” The dinucleotide “CG” is rare –C in a “CG” often gets “methylated” and the resulting C then mutates to T.
Rolling Dice Data Analysis - Hidden Markov Model Danielle Tan Haolin Zhu.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model Most pages of the slides are from lecture notes from Prof. Serafim Batzoglou’s course in Stanford: CS 262: Computational Genomics (Winter.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Usman Roshan BNFO 601.
CpG islands in DNA sequences
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
Lecture 6, Thursday April 17, 2003
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Time Warping Hidden Markov Models Lecture 2, Thursday April 3, 2003.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Hidden Markov Models.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Models (HMMs) for Information Extraction
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Sequence Alignment Cont’d. CS262 Lecture 4, Win06, Batzoglou Indexing-based local alignment (BLAST- Basic Local Alignment Search Tool) 1.SEED Construct.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Dishonest Casino Let’s take a look at a casino that uses a fair die most of the time, but occasionally changes it to a loaded die. This model is hidden.
Class 5 Hidden Markov models. Markov chains Read Durbin, chapters 1 and 3 Time is divided into discrete intervals, t i At time t, system is in one of.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS5263 Bioinformatics Lecture 12: Hidden Markov Models and applications.
CS5263 Bioinformatics Lecture 10: Markov Chain and Hidden Markov Models.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Eric Xing © Eric CMU, Machine Learning Mixture Model, HMM, and Expectation Maximization Eric Xing Lecture 9, August 14, 2010 Reading:
Eric Xing © Eric CMU, Machine Learning Structured Models: Hidden Markov Models versus Conditional Random Fields Eric Xing Lecture 13,
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CSCI2950-C Lecture 2 September 11, Comparative Genomic Hybridization (CGH) Measuring Mutations in Cancer.
Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.
Hidden Markov Model ..
Presentation transcript:

Bioinformatics Hidden Markov Models

Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its current state. Any random process having this property is called a Markov random process. n For observable state sequences (state is known from data), this leads to a Markov chain model. n For non-observable states, this leads to a Hidden Markov Model (HMM).

The casino models Dishonest casino: it has two dice: n Fair die: P(1) = P(2) = P(3) = P(5) =P(6) = 1/6 n Loaded die: P(1) = P(2) = … = P(5) = 1/10 P(6) = 1/2 Casino player approximately switches back-&-forth between fair and loaded die once every 20 turns Game: You bet $1 You roll (always with a fair die) Casino player rolls (maybe with fair die, maybe with loaded die) Highest number wins $1 Honest casino: it has one dice: n Fair die: P(1) = P(2) = P(3) = P(5) =P(6) = 1/6 Crooked casino: it has one dice: n Loaded die: P(1) = P(2) = … = P(5) = 1/10 P(6) = 1/2

The casino models Honest casino Dishonest casino: Crooked casino:

The casino models (only one die) LOADEDFAIR P(1) = 1/6 P(2) = 1/6 P(3) = 1/6 P(4) = 1/6 P(5) = 1/6 P(6) = 1/6 P(1) = 1/10 P(2) = 1/10 P(3) = 1/10 P(4) = 1/10 P(5) = 1/10 P(6) = 1/ L L L L F F F F Honest casino Crooked casino:

The casino models FAIRLOADED P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/ F L F L F L F L I 0.5 Dishonest casino:

The honest and crooked casino models P(1) = 1/6 P(2) = 1/6 P(3) = 1/6 P(4) = 1/6 P(5) = 1/6 P(6) = 1/6 P(1) = 1/10 P(2) = 1/10 P(3) = 1/10 P(4) = 1/10 P(5) = 1/10 P(6) = 1/ F F F F n Then, what is the likelyhood of π = F, F, F, F, F, F, F, F, F, F? P(x|H)= P(1) P(2) … P(4) = (1/6)^10=1.7x D D D D Let the sequence of rolls be: x = 1, 2, 1, 5, 6, 2, 1, 6, 2, 4 n And the likelyhood of π = L, L, L, L, L, L, L, L, L, L? P(x|C)= P(1 ) … P(4) = (1/10) 8  (1/2) 2 = 2.5  Therefore, it is after all 6.8 times more likely that the model was honest all the way, than that it was crooked all the way.

The honest and crooked casino models P(1) = 1/6 P(2) = 1/6 P(3) = 1/6 P(4) = 1/6 P(5) = 1/6 P(6) = 1/6 P(1) = 1/10 P(2) = 1/10 P(3) = 1/10 P(4) = 1/10 P(5) = 1/10 P(6) = 1/ F F F F n Then, what is the likelyhood of π = F, F, F, F, F, F, F, F, F, F? P(x|H)= P(1) P(2) … P(4) = (1/6)^10=1.7x D D D D Let the sequence of rolls be: x = 1, 6, 6, 5, 6, 2, 6, 6, 3, 6 n And the likelyhood of π = L, L, L, L, L, L, L, L, L, L? P(x|C)= P(1 ) … P(4) = (1/10) 4  (1/2) 6 = 1.6  Therefore, it is after all 98 times more likely that the model was crooked all the way, than that it was honest all the way.

Representation of a HMM Definition: A hidden Markov model (HMM) n Alphabet  = {a,b,c,…} = { b 1, b 2, …, b M } n Set of states Q = { 1,..., q } n Transition probabilities between any two states: p ij = transition prob from state i to state j p i1 + … + p iq = 1, for all states i = 1…q n Start probabilities p 0i such that p 01 + … + p 0q = 1 n Emission probabilities within each state e i (b) = P( x = b | q = i) e i (b 1 ) + … + e i (b M ) = 1, for all states i = 1…q q 1 … 2 a, b, c … 11 1 … … 2 q q … q q I...

General questions Evaluation problem: how likely is this sequence, given our model of how the casino works? GIVEN a HMM Mand a sequence x, FIND Prob[ x | M ] Decoding problem: what portion of the sequence was generated with the fair die, and what portion with the loaded die? GIVENa HMM M, and a sequence x, FINDthe sequence  of states that maximizes P[ x,  | M ] Learning problem: how “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? Are there only two dies? GIVENa HMM M, with unspecified transition/emission probs. , and a sequence x, FINDparameters  that maximize P[ x |  ]

Evaluation problem: Forward Algorithm We want to calculate P(x | M) = probability of a sequence x, given the HMM M = Sum over all possible ways of generating x: Given x= 1, 4, 2, 3, 6, 6, 3…, how many ways generate x? only one way ? n Honest casino: n Crooked casino: n Dishonest casino: F F F F D D D D F L F L F L F L I 0.5

Evaluation problem: Forward Algorithm We want to calculate P(x | D) = probability of x, given the HMM D = Sum over all possible ways of generating x: F L F L F L F L I 0.5 Given x= 1, 4, 2, 3, 6, 6, 3…, how many ways generate x?2 |x| Naïve computation is very expensive: given |x| characters and N states, there are N |x| possible state sequences. Even small HMMs, |x|=10 and N=10, contain 10 billion different paths!

Evaluation problem: Forward Algorithm P(x ) = probability of x, given the HMM D = Sum over all possible ways of generating x: =   P(x,  ) =   P(x |  ) P(  ) Then, define f k (i) = P(x 1 …x i,  i = k) (the forward probability) x 1 x 2 x 3 x i a, b, c … 11 1 … … 2 q q … q q I The probability of prefix x 1 x 2 …x i f 1 (i) f 2 (i) f q (i)

Evaluation problem: Forward Algorithm The forward probability recurrence: f k (i) = P(x 1 …x i,  i = k) x 1 x 2 x 3 x i-1 x i a, b, c … 11 1 … … 2 q q … q q I 1 2 q k withf 0 (0) = 1 f k (0) = 0, for all k > 0 and cost space: O(Nq) time: O(Nq 2) =  h=1..q P(x 1 …x i-1,  i-1 = h)p hk e k (x i ) = e k (x i )  h=1..q P(x 1 …x i-1,  i-1 = h)p hk = e k (x i )  h=1..q f h (i-1) p hk f k (i)

The dishonest casino model P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/ F L F L F L F L I 0.5 f k (i) FLFL f k (i) = e k (x i )  h=1..q f h (i-1) p hk 0 0 1/ / x =

The dishonest casino model P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/ F L F L F L F L I 0.5 f k (i) FLFL f k (i) = e k (x i )  h=1..q f h (i-1) p hk / / x =

The dishonest casino model P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/ F L F L F L F L I 0.5 f k (i) FLFL f k (i) = e k (x i )  h=1..q f h (i-1) p hk / / Then P(125) = x =

The dishonest casino model Honest casino Dishonest casino Prob (S | Honest casino model) = exp (-896) Prob (S | Dishonest casino model)= exp (-916) Prob (S | Honest casino model ) = exp (-896) Prob (S | Dishonest casino model )= exp (-847)

General questions Evaluation problem: how likely is this sequence, given our model of how the casino works? Decoding problem: what portion of the sequence was generated with the fair die, and what portion with the loaded die? Learning problem: how “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? GIVEN a HMM Mand a sequence x, FIND Prob[ x | M ] GIVENa HMM M, and a sequence x, FINDthe sequence  of states that maximizes P[ x,  | M ] GIVENa HMM M, with unspecified transition/emission probs. , and a sequence x, FINDparameters  that maximize P[ x |  ]

Decoding problem We want to calculate path  * such that  * = argmax π P(x,  | M) = the sequence  of states that maximizes P(x,  | M) F L F L F L F L I 0.5 Naïve computation is very expensive: given |x| characters and N states, there are N |x| possible state sequences.

Evaluation problem: Viterbi algorithm Then, define v k (i) = argmax π P(x 1 …x i,  i = k) x 1 x 2 x 3 x i a, b, c … 11 1 … … 2 q q … q q I The sequence of states that maximizes x 1 x 2 …x i v 1 (i) v 2 (i) v q (i)  * = argmax π P(x,  | M) = the sequence  of states that maximizes P(x,  | M)

Evaluation problem: Viterbi algorithm The forward probability recurrence: v k (i) = argmax π P(x 1 …x i,  i = k) x 1 x 2 x 3 x i-1 x i a, b, c … 11 1 … … 2 q q … q q I 1 2 q k = max h [argmax π P(x 1 …x i-1,  i-1 = h)p hk e k (x i )] = e k (x i ) max h [p hk argmax π P(x 1 …x i-1,  i-1 = h)] = e k (x i ) max h [p hk v h (i-1)] v k (i)

The dishonest casino model P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/ F L F L F L F L I 0.5 f k (i) = e k (x i ) max h=1..q v h (i-1) p hk 0 0 1/ / x = F 0.05 L f k (i) FLFL

The dishonest casino model P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/ F L F L F L F L I 0.5 f k (i) = e k (x i ) max h=1..q v h (i-1) p hk / / max(0.013, ) max( ,0.0047) x = F 0.05 L FF LL f k (i) FLFL

The dishonest casino model P(1|F) = 1/6 P(2|F) = 1/6 P(3|F) = 1/6 P(4|F) = 1/6 P(5|F) = 1/6 P(6|F) = 1/6 P(1|L) = 1/10 P(2|L) = 1/10 P(3|L) = 1/10 P(4|L) = 1/10 P(5|L) = 1/10 P(6|L) = 1/ F L F L F L F L I 0.5 f k (i) FLFL f k (i) = e k (x i ) max h=1..q v h (i-1) p hk / / max(0.013, ) max( ,0.0047) max(0.0022, ) max( ,00049) x = F 0.05 L FF LL FFF LLL Then, the most probable path is FFF !

The dishonest casino model Dishonest casino sequence of values: Dishonest casino sequence of states:

General questions Evaluation problem: how likely is this sequence, given our model of how the casino works? Decoding problem: what portion of the sequence was generated with the fair die, and what portion with the loaded die? Learning problem: how “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? GIVEN a HMM Mand a sequence x, FIND Prob[ x | M ] GIVENa HMM M, and a sequence x, FINDthe sequence  of states that maximizes P[ x,  | M ] GIVENa HMM M, with unspecified transition/emission probs. , and a sequence x, FINDparameters  that maximize P[ x |  ]

Learning problem How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? GIVENa HMM M, with unspecified transition/emission probs. , and a sequence x, FINDparameters  that maximize P[ x |  ] We need a training data set. It could be: n A sequence of pairs (x,π) = (x 1,π 1 ), (x 2,π 2 ), …,(x n,π n ) where we know the set of values and the states. n A sequence of singles x = x 1,x 2, …,x n where we only know the set of values.

Learning problem: given (x,π) i=1..n From the training set we can define: n H ki as the number of times the transition from state k to state i appears in the training set. n J l (x) as the number of times the value x is emitted by state l. For instance, given the training set: Fair die, Loaded die H FF = 51 H FL = H LF = H LL =4426 J F (1) = 10 J F (2) = J F (3)= J F (4) = J F (5) = J F (6) = J L (1) = 0 J L (2) = J L (3)= J L (4) = J L (5) = J L (6) =

And we estimate the parameters of the HMM as n p kl = H ki / (H k1 + … + H kq ). n e l (r) = J l (r) /(J 1 (r) +…+ J q (r) ) Learning problem: given (x,π) i=1..n From the training set we have computed: n H ki as the number of times the transition from state k to state i appears in the training set. n J l (r) as the number of times the value r is emitted by state l. H FF = 51 H FL = 4 H LF = 4 H LL = 26 J F (1) = 10 e F (1)= J F (2) = 11 e F (2)= J F (3)= 9 e F (3)= J F (4) = 12 e F (4)= J F (5) = 8 e F (5)= J F (6) = 6 e F (6)= 2/563/ J L (1) = 0 e L (6)= J L (2) = 5 e L (6)= J L (3)=6 e L (6)= J L (4) = 3 e L (6)= J L (5) = 1 e L (6)= J L (6) = 14 e L (6)= 5/296/ p FF = 51/85=0.6 p FL = p LF = p LL = /29 10/56

Learning problem: given x i=1..n To choose the parameters of HMM that maximize P(x 1 ) x P(x 2 ) x …x P(x n ) that implies The use of standard (iterative) optimization algorithms: n Determine initial parameters values n Iterate until P(x 1 ) x P(x 2 ) x …x P(x n ) becomes smaller that some predeterminated threshold but the algorithm may converge to a point close to a local maximum, not to a global maximum.

Learning problem: algorithm From the training x i=1..n we estimate M 0 : n p ki as the probability of transitions. n e l (r) as the probability of emissions. Do (we have M s ) n Compute H ki as the expected number of times the transition from state k to state I is reached. n Compute J l (r) as the expected number of times the value r is emitted by state l. n Compute p ki =H ki / (H k1 + … + H kq ) and e l (r) = J l (r) /(J 1 (r) +…+ J q (r) ). n { we have M s+1 } Until some value smaller than the threshold { M is close to a local maximum}

Recall forward and backward algorithms The forward probability recurrence: f k (i) = P(x 1 …x i,  i = k) = e k (x i )  h=1..q f h (i-1) x 1 x i-1 x i x i+1 x i+2 x n a, b, c … 1 … 1 2 … 2 … q q I 1 2 q k 1 2 q l 1 … 1 2 … 2 … q q The backward probability recurrence: b l (i+1) = P(x i+1 …x n,  i+1 = l) =  h=1..q p lh e h (x i+2 ) b h (i+2) f k (i) b l (i+1)

Baum-Welch training algorithm J k (r) = the expected number of times the value r is emitted by state k. = ∑ all x ∑ all i Prob(state k emits r at step i in sequence x) = ∑ all x ∑ all i Prob(x 1 …x n ) = ∑ all x ∑ all i Prob(x 1 …x n ) Prob(x 1 …x n | state k emits x i ) δ(r = x i ) f k (i) b k (i) δ(r = x i ) a, b, c … 1 … 1 2 … 2 … q q I 1 2 q k 1 2 q … 1 … 2 … q

Baum-Welch training algorithm H kl (r) = as the expected number of times the transition from k to I is reached = ∑ all x ∑ all i Prob(transition from k to l is reached at step i in x) = ∑ all x ∑ all i Prob(x 1 …x n ) = ∑ all x ∑ all i Prob(x 1 …x n ) Prob(x 1 …x n | state k reaches state l ) f k (i) p kl e l (x i+1 ) b l (i+1) a, b, c … 1 … 1 2 … 2 … q q I 1 2 q k 1 2 q … 1 … 2 … q 1 2 q l

And we estimate the new parameters of the HMM as n p kl = H ki / (H k1 + … + H kq ). n e l (r) = J l (r) /(J 1 (r) +…+ J q (r) ) Baum-Welch training algorithm H ki as the expected number of times the transition from state k to state i appears. H kl (r) = ∑ all x ∑ all i Prob(x 1 …x n ) f k (i) p kl e l (x i+1 ) b l (i+1) J l (r) as the expected number of times the value r is emitted by state l. J l (r) = ∑ all x ∑ all i Prob(x 1 …x n ) f k (i) b k (i) δ(r = x i )

Baum-Welch training algorithm The algorithm has been applied to the sequences:. For |S|=500: n M= 6 N= 2 P0F= P0L= n PFF= PFL= PLL= PLF= n n n pi: n For |S|=50000: n M= 6 N= n n n n