Presentation is loading. Please wait.

Presentation is loading. Please wait.

EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Similar presentations


Presentation on theme: "EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS."— Presentation transcript:

1 EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS

2 Goal  Given a set of sequences and a tree representing their evolutionary relationship, to find a multiple sequence alignment which maximizes the probability of the evolutionary relationships between the sequences. 14-Dec-15 2 Siva Theja Maguluri

3 Evolutionary Model  Pairwise likelihood for relation between two sequences  Reversibility  Additivity 14-Dec-15 3 Siva Theja Maguluri

4  Alignment can be inferred from the sequences using DP if Markov condition applies  Joint likelihood of a multiple alignment on a tree 14-Dec-15 4 Siva Theja Maguluri

5 Alignment Model  Substitution models 14-Dec-15 5 Siva Theja Maguluri

6 Links Model 14-Dec-15Siva Theja Maguluri 6  Birth Death process with Immigration ie each residue can either spawn a child or die  Birth rate λ, Death rate µ  Immortal link at the left hand side  Independent Homogenous Substitution

7 Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 7  Time evolution of the probability of a link surviving and spawning n descendants  Time evolution of the probability of a link dying before time t and spawning n descendants

8 Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 8  Time evolution of the probability of the immortal link spawning n descendants at time t

9 Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 9  Solution of these differential equations is  where

10 Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 10  Conceptually, α is the probability the ancestral residue survives  β is the probability of more insertions given one or more descendants  γ is the probability of insertion given ancestor did not survive  In the limit, immortal link generates residues according to geometric distribution

11 Links model as a Pair HMM 14-Dec-15Siva Theja Maguluri 11  Just like a standard HMM, but emits two sequences instead of one  Aligning two sequences with pair HMM, implicitly aligns the sequences

12 Pair HMM for Links model 14-Dec-15Siva Theja Maguluri 12  Either the residue lives or dies, spawning geometrically distributed residues in each case

13 Links model as a Pair HMM 14-Dec-15Siva Theja Maguluri 13  The path through the Pair HMM is π  DP used to infer alignment of two sequences  Viterbi Algorithm for finding optimum π  Forward algorithm to sum over all alignments or to sample from the posterior,

14 Multiple HMMs 14-Dec-15Siva Theja Maguluri 14  Instead of emitting 2 sequences, emit N sequences  2 N -1 emit states!  Can develop such a model for any tree  Viterbi and Forward algorithms use N dimensional Dynamic programming Matrix  Given a tree relating N sequences, Multiple HMM can be constructed from Pair HMMs so that the likelihood function is

15 Multiple HMMs 14-Dec-15Siva Theja Maguluri 15

16 Multiple HMMs 14-Dec-15Siva Theja Maguluri 16

17 Composing multiple alignment from branch alignments 14-Dec-15Siva Theja Maguluri 17  Residues X i and Y j in a multiple alignment containing sequences X and Y are aligned iff  They are in the same column  That column contains no gaps for intermediate sequences  No deletion, re-insertion is allowed  Ignoring all gap columns, provides and unambiguous way of composing multiple alignment from branch alignments and vice versa

18 Eliminating internal nodes 14-Dec-15Siva Theja Maguluri 18  Internal nodes are Missing data  Sum them out of the likelihood function  Summing over indel histories will kill the independence  Sum over substitution histories using post order traversal algorithm of Felsentein

19 Algorithm 14-Dec-15Siva Theja Maguluri 19  Progressive alignment – profiles of parents estimated by aligning siblings on a post order traversal – Impatient strategy  Iterative refinement – revisit branches following initial alignment phase – Greedy  Sample from a population of alignments, exploring suboptimal alignments in anticipation of long term improvements

20 Algorithm 14-Dec-15Siva Theja Maguluri 20  Moves to explore alignment space  These moves need to be ergodic, i.e. allow for transformation of any alignment into any other alignment  These moves need to satisfy detailed balance i.e. converges to desired stationary distribution

21 Move 1: Parent Sampling. 14-Dec-15Siva Theja Maguluri 21  Goal: Align two sibling nodes Y and Z and infer their parent X  Construct the multiple HMM for X,Y and Z  Sample an alignment of Y and Zusing the forward algorithm  This imposes an alignment of XZ and YZ  Similar to sibling alignment step of impatient- progressive alignment

22 Move 2: Branch Sampling 14-Dec-15Siva Theja Maguluri 22  Goal: realign two adjacent nodes X and Y  Construct the pair HMM for X and Y, fixing everything else  Resample the alignment using the forward algorithm  This is similar to branch alignment step of greedy- refined algorithm

23 Move 3: Node Sampling 14-Dec-15Siva Theja Maguluri 23  Goal: resample the sequence at an internal node X  Construct the multiple HMM and sample X, its parent W and children Y and Z, fixing everything else  Resample the sequence of X, conditioned on relative alignment of W,Y and Z  This is similar to inferring parent sequence lengths in impatient-progressive algorithms

24 Algorithm 14-Dec-15Siva Theja Maguluri 24 1. Parent sample up the guide tree and construct a multiple alignment 2. Visit each branch and node once for branch sampling or node sampling respectively 3. Repeat 2 to get more samples

25 Algorithm 14-Dec-15Siva Theja Maguluri 25  Replacing ‘sampling by Forward algorithm’ with ‘optimizing by Viterbi algorithm’  Impatient- Progressive is ML version of parent sampling  Greedy-refinement is ML version of Branch and node sampling

26 Gibbs sampling in ML context 14-Dec-15Siva Theja Maguluri 26  Periodically save current alignment, then take a greedy approach to record likelihood of refined alignment and get back to the saved alignment  Store this and compare likelihood to other alignments at the end of the run

27 Ordered over-relaxation 14-Dec-15Siva Theja Maguluri 27  Sampling is a random walk on Markov chain so follows Brownian motion ie rms drift grows as sqrt(n)  Would be better to avoid previously explored spaces ie ‘boldly go where no alignment has gone before’  Impose a strict weak order on alignments  Sample N alignments at each stage and sort them  If the original sample ends up in position k, choose the (N-k)th sample for the next emission

28 Implementation and results 14-Dec-15Siva Theja Maguluri 28

29 Implementation and results 14-Dec-15Siva Theja Maguluri 29  A True alignment  B impatient progressive  C greedy refined  D Gibbs Sampling followed by Greedy refinement  E Gibbs sampling with simulated annealing  F Gibbs sampling with over relaxation  G without Felsentein wild cards

30 Discussion  Outlines a very appealing Bayesian framework for multiple alignment  Performs very well, considering the simplicity of the model  Could add profile information and variable sized indels to the model to improve performance 14-Dec-15 30 Siva Theja Maguluri

31 14-Dec-15 31 Siva Theja Maguluri

32 Questions 14-Dec-15 32 Siva Theja Maguluri

33 Questions 14-Dec-15Siva Theja Maguluri 33  What is the assumption that enabled us to use this algorithm, enabling us to avoid the N dimensional matrices of DP ?  What is the importance of immortal link in the Links model ?

34 References  “Evolutionary HMMs: a Bayesian approach to multiple alignment” - Holmes and Bruno. Bioinformatics 2001 14-Dec-15 34 Siva Theja Maguluri

35 More results 14-Dec-15Siva Theja Maguluri 35

36 More results 14-Dec-15Siva Theja Maguluri 36

37 More results 14-Dec-15Siva Theja Maguluri 37

38 More results 14-Dec-15Siva Theja Maguluri 38  Poor performance on 4 is probably because Handel produces a global alignment and doesn’t handle affine gaps  Handle doesn’t incorporate any profile information  Handle cannot use BLOSUM (it’s not additive)


Download ppt "EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS."

Similar presentations


Ads by Google