EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Multiple Sequence Alignment
Markov Chains 1.
Lecture 3: Markov processes, master equation
Hidden Markov Models Theory By Johan Walters (SR 2003)
BAYESIAN INFERENCE Sampling techniques
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
A Hidden Markov Model for Progressive Multiple Alignment Ari Löytynoja and Michel C. Milinkovitch Appeared in BioInformatics, Vol 19, no.12, 2003 Presented.
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Lecture 6, Thursday April 17, 2003
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
1 Protein Multiple Alignment by Konstantin Davydov.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Lecture 5: Learning models using EM
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
Phylogenetic Trees Presenter: Michael Tung
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models 1 2 K … x1 x2 x3 xK.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
BNFO 602 Multiple sequence alignment Usman Roshan.
Protein Multiple Sequence Alignment Sarah Aerni CS374 December 7, 2006.
Copyright N. Friedman, M. Ninio. I. Pe’er, and T. Pupko. 2001RECOMB, April 2001 Structural EM for Phylogentic Inference Nir Friedman Computer Science &
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Probabilistic methods for phylogenetic trees (Part 2)
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Multiple sequence alignment methods 1 Corné Hoogendoorn Denis Miretskiy.
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Approximate Inference 2: Monte Carlo Markov Chain
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Introduction to Profile Hidden Markov Models
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Hidden Markov Models for Sequence Analysis 4
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Evolutionary Models for Multiple Sequence Alignment CBB/CS 261 B. Majoros.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
Expected accuracy sequence alignment Usman Roshan.
1 MARKOV MODELS MARKOV MODELS Presentation by Jeff Rosenberg, Toru Sakamoto, Freeman Chen HIDDEN.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Local Search. Systematic versus local search u Systematic search  Breadth-first, depth-first, IDDFS, A*, IDA*, etc  Keep one or more paths in memory.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Expected accuracy sequence alignment Usman Roshan.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Modelling evolution Gil McVean Department of Statistics TC A G.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Multiple sequence alignment (msa)
Bayesian inference Presented by Amir Hadadi
Presentation transcript:

EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS

Goal  Given a set of sequences and a tree representing their evolutionary relationship, to find a multiple sequence alignment which maximizes the probability of the evolutionary relationships between the sequences. 14-Dec-15 2 Siva Theja Maguluri

Evolutionary Model  Pairwise likelihood for relation between two sequences  Reversibility  Additivity 14-Dec-15 3 Siva Theja Maguluri

 Alignment can be inferred from the sequences using DP if Markov condition applies  Joint likelihood of a multiple alignment on a tree 14-Dec-15 4 Siva Theja Maguluri

Alignment Model  Substitution models 14-Dec-15 5 Siva Theja Maguluri

Links Model 14-Dec-15Siva Theja Maguluri 6  Birth Death process with Immigration ie each residue can either spawn a child or die  Birth rate λ, Death rate µ  Immortal link at the left hand side  Independent Homogenous Substitution

Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 7  Time evolution of the probability of a link surviving and spawning n descendants  Time evolution of the probability of a link dying before time t and spawning n descendants

Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 8  Time evolution of the probability of the immortal link spawning n descendants at time t

Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 9  Solution of these differential equations is  where

Probability evolution in Links Model 14-Dec-15Siva Theja Maguluri 10  Conceptually, α is the probability the ancestral residue survives  β is the probability of more insertions given one or more descendants  γ is the probability of insertion given ancestor did not survive  In the limit, immortal link generates residues according to geometric distribution

Links model as a Pair HMM 14-Dec-15Siva Theja Maguluri 11  Just like a standard HMM, but emits two sequences instead of one  Aligning two sequences with pair HMM, implicitly aligns the sequences

Pair HMM for Links model 14-Dec-15Siva Theja Maguluri 12  Either the residue lives or dies, spawning geometrically distributed residues in each case

Links model as a Pair HMM 14-Dec-15Siva Theja Maguluri 13  The path through the Pair HMM is π  DP used to infer alignment of two sequences  Viterbi Algorithm for finding optimum π  Forward algorithm to sum over all alignments or to sample from the posterior,

Multiple HMMs 14-Dec-15Siva Theja Maguluri 14  Instead of emitting 2 sequences, emit N sequences  2 N -1 emit states!  Can develop such a model for any tree  Viterbi and Forward algorithms use N dimensional Dynamic programming Matrix  Given a tree relating N sequences, Multiple HMM can be constructed from Pair HMMs so that the likelihood function is

Multiple HMMs 14-Dec-15Siva Theja Maguluri 15

Multiple HMMs 14-Dec-15Siva Theja Maguluri 16

Composing multiple alignment from branch alignments 14-Dec-15Siva Theja Maguluri 17  Residues X i and Y j in a multiple alignment containing sequences X and Y are aligned iff  They are in the same column  That column contains no gaps for intermediate sequences  No deletion, re-insertion is allowed  Ignoring all gap columns, provides and unambiguous way of composing multiple alignment from branch alignments and vice versa

Eliminating internal nodes 14-Dec-15Siva Theja Maguluri 18  Internal nodes are Missing data  Sum them out of the likelihood function  Summing over indel histories will kill the independence  Sum over substitution histories using post order traversal algorithm of Felsentein

Algorithm 14-Dec-15Siva Theja Maguluri 19  Progressive alignment – profiles of parents estimated by aligning siblings on a post order traversal – Impatient strategy  Iterative refinement – revisit branches following initial alignment phase – Greedy  Sample from a population of alignments, exploring suboptimal alignments in anticipation of long term improvements

Algorithm 14-Dec-15Siva Theja Maguluri 20  Moves to explore alignment space  These moves need to be ergodic, i.e. allow for transformation of any alignment into any other alignment  These moves need to satisfy detailed balance i.e. converges to desired stationary distribution

Move 1: Parent Sampling. 14-Dec-15Siva Theja Maguluri 21  Goal: Align two sibling nodes Y and Z and infer their parent X  Construct the multiple HMM for X,Y and Z  Sample an alignment of Y and Zusing the forward algorithm  This imposes an alignment of XZ and YZ  Similar to sibling alignment step of impatient- progressive alignment

Move 2: Branch Sampling 14-Dec-15Siva Theja Maguluri 22  Goal: realign two adjacent nodes X and Y  Construct the pair HMM for X and Y, fixing everything else  Resample the alignment using the forward algorithm  This is similar to branch alignment step of greedy- refined algorithm

Move 3: Node Sampling 14-Dec-15Siva Theja Maguluri 23  Goal: resample the sequence at an internal node X  Construct the multiple HMM and sample X, its parent W and children Y and Z, fixing everything else  Resample the sequence of X, conditioned on relative alignment of W,Y and Z  This is similar to inferring parent sequence lengths in impatient-progressive algorithms

Algorithm 14-Dec-15Siva Theja Maguluri Parent sample up the guide tree and construct a multiple alignment 2. Visit each branch and node once for branch sampling or node sampling respectively 3. Repeat 2 to get more samples

Algorithm 14-Dec-15Siva Theja Maguluri 25  Replacing ‘sampling by Forward algorithm’ with ‘optimizing by Viterbi algorithm’  Impatient- Progressive is ML version of parent sampling  Greedy-refinement is ML version of Branch and node sampling

Gibbs sampling in ML context 14-Dec-15Siva Theja Maguluri 26  Periodically save current alignment, then take a greedy approach to record likelihood of refined alignment and get back to the saved alignment  Store this and compare likelihood to other alignments at the end of the run

Ordered over-relaxation 14-Dec-15Siva Theja Maguluri 27  Sampling is a random walk on Markov chain so follows Brownian motion ie rms drift grows as sqrt(n)  Would be better to avoid previously explored spaces ie ‘boldly go where no alignment has gone before’  Impose a strict weak order on alignments  Sample N alignments at each stage and sort them  If the original sample ends up in position k, choose the (N-k)th sample for the next emission

Implementation and results 14-Dec-15Siva Theja Maguluri 28

Implementation and results 14-Dec-15Siva Theja Maguluri 29  A True alignment  B impatient progressive  C greedy refined  D Gibbs Sampling followed by Greedy refinement  E Gibbs sampling with simulated annealing  F Gibbs sampling with over relaxation  G without Felsentein wild cards

Discussion  Outlines a very appealing Bayesian framework for multiple alignment  Performs very well, considering the simplicity of the model  Could add profile information and variable sized indels to the model to improve performance 14-Dec Siva Theja Maguluri

14-Dec Siva Theja Maguluri

Questions 14-Dec Siva Theja Maguluri

Questions 14-Dec-15Siva Theja Maguluri 33  What is the assumption that enabled us to use this algorithm, enabling us to avoid the N dimensional matrices of DP ?  What is the importance of immortal link in the Links model ?

References  “Evolutionary HMMs: a Bayesian approach to multiple alignment” - Holmes and Bruno. Bioinformatics Dec Siva Theja Maguluri

More results 14-Dec-15Siva Theja Maguluri 35

More results 14-Dec-15Siva Theja Maguluri 36

More results 14-Dec-15Siva Theja Maguluri 37

More results 14-Dec-15Siva Theja Maguluri 38  Poor performance on 4 is probably because Handel produces a global alignment and doesn’t handle affine gaps  Handle doesn’t incorporate any profile information  Handle cannot use BLOSUM (it’s not additive)