DTC Gerton Lunter, WTCHG February 10, 2010 Hidden Markov models in Computational Biology.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
Analysis and Applications of next-gen sequencing Gerton Lunter Wellcome Trust Centre for Human Genetics GMS lecture, Feb 2015.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Models. Room Wandering I’m going to wander around my house and tell you objects I see. Your task is to infer what room I’m in at every point.
Hidden Markov Model.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Introduction of Probabilistic Reasoning and Bayesian Networks
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Profiles for Sequences
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
DTC Gerton Lunter, WTCHG February 2011 Includes material from: Dirk Husmeier, Heng Li Hidden Markov models in Computational Biology.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Lecture 5: Learning models using EM
Bayes Nets Rong Jin. Hidden Markov Model  Inferring from observations (o i ) to hidden variables (q i )  This is a general framework for representing.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
Hidden Markov Models.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
A Brief Introduction to Graphical Models
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Hidden Markov Models (HMMs) Chapter 3 (Duda et al.) – Section 3.10 (Warning: this section has lots of typos) CS479/679 Pattern Recognition Spring 2013.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models BMI/CS 576
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
CSCI 5822 Probabilistic Models of Human and Machine Learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Hidden Markov Models (HMMs)
CONTEXT DEPENDENT CLASSIFICATION
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

DTC Gerton Lunter, WTCHG February 10, 2010 Hidden Markov models in Computational Biology

Overview First part: Mathematical context: Bayesian Networks Markov models Hidden Markov models Second part: Worked example: the occasionally crooked casino Two applications in computational biology Third part: Practical 0: a bit more theory on HMMs Practical I-V: theory, implementation, biology. Pick & choose.

Part I HMMs in (mathematical) context

Probabilistic models Mathematical model describing how variables occur together. Three type of variables are distinguished: Observed variables Latent (hidden) variables Parameters Latent variables often are the quantities of interest, and can be inferred from observations using the model. Sometimes they are “nuisance variables”, and used to correctly describe the relationships in the data. Example: P(clouds, sprinkler_used, rain, wet_grass)

Some notation P(X,Y,Z): probability of (X,Y,Z) occurring (simultaneously) P(X,Y):probability of (X,Y) occurring. P(X,Y|Z): probability of (X,Y) occurring, provided that it is known that Z occurs (“conditional on Z”, or “given Z”) P(X,Y) = Σ Z P(X,Y,Z) P(Z) = Σ X,Y P(X,Y,Z) P(X,Y| Z ) = P(X,Y,Z) / P(Z) Σ X,Y,Z P(X,Y,Z) = 1 P(Y | X ) = P(X | Y) P(Y) / P(X) (Bayes’ rule)

Independence Two variables X, Y are independent if P(X,Y) = P(X)P(Y) Knowing that two variables are independent reduces the model complexity. Suppose X, Y each take N possible values: specification of P(X,Y) requires N 2 -1 numbers specification of P(X), P(Y) requires 2N-2 numbers. Two variables X,Y are conditionally independent (given Z) if P(X,Y|Z) = P(X|Z)P(Y|Z).

Probabilistic model: example P(Clouds, Sprinkler, Rain, WetGrass) = P(Clouds) × P(Sprinker|Clouds) × P(Rain|Clouds) × P(WetGrass | Sprinkler, Rain) This specification of the model determines which variables are deemed to be (conditionally) independent. These independence assumptions simplify the model. Using formulas as above to describe the independence relationship is not very intuitive, particularly for large models. Graphical models (in particular, Bayesian Networks) are a more intuitive way to do the same

Bayesian network: example Cloudy Sprinkl er Rain Wet grass P(Clouds) × P(Sprinker|Clouds) × P(Rain|Clouds) × P(WetGrass | Sprinkler, Rain) Rule: Two nodes of the graph are conditionally independent given the state of their parents E.g. Sprinker and Rain are independent given Cloudy

Bayesian network: example Cloudy Sprinkl er Rain Wet grass Convention: Latent variables are open Observed variables are shaded P(Clouds) × P(Sprinker|Clouds) × P(Rain|Clouds) × P(WetGrass | Sprinkler, Rain)

Bayesian network: example Combat Air Identification algorithm;

Bayesian networks Intuitive formalism to develop models Algorithms to learn parameters from training data (maximum likelihood; EM) General and efficient algorithms to infer latent variables from observations (“message passing algorithm”) Allows dealing with missing data in a robust and coherent way (make relevant node a latent variable) Simulate data

Markov model A particular kind of Bayesian network All variables are observed Good for modeling dependencies within sequences P(S n | S 1,S 2,…,S n-1 ) = P(S n | S n-1 ) (Markov property) P(S 1, S 2, S 3, …, S n ) = P(S 1 ) P(S 2 |S 1 ) … P (S n | S n-1 ) S1S1 S1S1 S2S2 S2S2 S3S3 S3S3 S4S4 S4S4 S5S5 S5S5 S6S6 S6S6 S7S7 S7S7 S8S8 S8S8 …

Markov model States: letters in English words Transitions: which letter follows which S1S1 S1S1 S2S2 S2S2 S3S3 S3S3 S4S4 S4S4 S5S5 S5S5 S6S6 S6S6 S7S7 S7S7 S8S8 S8S8 … MR SHERLOCK HOLMES WHO WAS USUALLY VERY LATE IN THE MORNINGS SAVE UPON THOSE NOT INFREQUENT OCCASIONS WHEN HE WAS UP ALL …. S 1 =M S 2 =R S 3 = S 4 =S S 5 =H …. P(S n = y| S n-1 = x ) = (parameters) P(S n-1 S n = xy ) / P (S n-1 = x ) (frequency of xy) / (frequency of x) (max likelihood) UNOWANGED HE RULID THAND TROPONE AS ORTIUTORVE OD T HASOUT TIVE IS MSHO CE BURKES HEST MASO TELEM TS OME SSTALE MISSTISE S TEWHERO

Markov model States: triplets of letters Transitions: which (overlapping) triplet follows which S1S1 S1S1 S2S2 S2S2 S3S3 S3S3 S4S4 S4S4 S5S5 S5S5 S6S6 S6S6 S7S7 S7S7 S8S8 S8S8 … MR SHERLOCK HOLMES WHO WAS USUALLY VERY LATE IN THE MORNINGS SAVE UPON THOSE NOT INFREQUENT OCCASIONS WHEN HE WAS UP ALL …. S 1 =MR S 2 =R S S 3 = SH S 4 =SHE S 5 =HER …. P(S n = xyz| S n-1 = wxy ) = P( wxyz ) / P( wxy ) (frequency of wxyz) / (frequency of wxy) THERE THE YOU SOME OF FEELING WILL PREOCCUPATIENCE CREASON LITTLED MASTIFF HENRY MALIGNATIVE LL HAVE MAY UPON IMPRESENT WARNESTLY

Markov model States: word pairs Text from: Then churls their thoughts (although their eyes were kind) To thy fair appearance lies To side this title is impanelled A quest of thoughts all tenants to the sober west As those gold candles fixed in heaven's air Let them say more that like of hearsay well I will drink Potions of eisel 'gainst my strong infection No bitterness that I was false of heart Though absence seemed my flame to qualify As easy might I not free When thou thy sins enclose! That tongue that tells the story of thy love Ay fill it full with feasting on your sight Book both my wilfulness and errors down And on just proof surmise accumulate Bring me within the level of your eyes And in mine own when I of you beauteous and lovely youth When that churl death my bones with dust shall cover And shalt by fortune once more re-survey These poor rude lines of life thou art forced to break a twofold truth Hers by thy deeds

Hidden Markov model HMM = probabilistic observation of Markov chain Another special kind of Bayesian network S i form a Markov chain as before, but states are unobserved Instead, y i (dependent on S i ) are observed Generative viewpoint: state S i “emits” symbol y i y i do not form a Markov chain (= do not satisfy Markov property) They exhibit more complex (and long-range) dependencies S1S1 S1S1 S2S2 S2S2 S3S3 S3S3 S4S4 S4S4 S5S5 S5S5 S6S6 S6S6 S7S7 S7S7 S8S8 S8S8 … y1y1 y1y1 y2y2 y2y2 y3y3 y3y3 y4y4 y4y4 y5y5 y5y5 y6y6 y6y6 y7y7 y7y7 y8y8 y8y8 …

Hidden Markov model Notation above emphasizes relation to Bayesian networks Different graph notation, emphasizing “transition probabilities” P(S i |S i-1 ). E.g. in the case S i  {A,B,C,D}: Notes: “Emission probabilities” P( y i | S i ) not explicitly represented Advance from i to i+1 also implicit Not all arrows need to be present (prob = 0) S1S1 S1S1 S2S2 S2S2 S3S3 S3S3 S4S4 S4S4 S5S5 S5S5 S6S6 S6S6 S7S7 S7S7 S8S8 S8S8 … y1y1 y1y1 y2y2 y2y2 y3y3 y3y3 y4y4 y4y4 y5y5 y5y5 y6y6 y6y6 y7y7 y7y7 y8y8 y8y8 … A A B B D D C C

Pair Hidden Markov model S 11 S 21 S 31 S 41 S 51 … z1z1 z1z1 S 12 S 22 S 23 S 24 S 25 z2z2 z2z2 S 31 S 32 S 33 S 34 S 35 z3z3 z3z3 … … y1y1 y1y1 y2y2 y2y2 y3y3 y3y3 y4y4 y4y4 y5y5 y5y5

Pair Hidden Markov model S 11 S 21 S 31 S 41 S 51 … z1z1 z1z1 S 12 S 22 S 23 S 24 S 25 z2z2 z2z2 S 31 S 32 S 33 S 34 S 35 z3z3 z3z3 … … y1y1 y1y1 y2y2 y2y2 y3y3 y3y3 y4y4 y4y4 y5y5 y5y5 Normalization: Σ paths p Σ s p(1) … s p(N) Σ y 1 …y A Σ z 1 …z B P(s p(1),…,s p(N),y 1 …y A,z 1 …z B ) = 1 N = N(p) = length of path States may “emit” a symbol in sequence y, or in z, or both, or neither (“silent” state). If a symbol is emitted, the associated coordinate subscript increases by one. E.g. diagonal transitions are associated to simultaneous emissions in both sequences. A realization of the pair HMM consists of a state sequence, with each symbol emitted by exactly one state, and the associated path through the 2D table. (A slightly more general viewpoint decouples the states and the path; then the hidden variables are the sequence of states S, and a path through the table. In this viewpoint the transitions, not states, emit symbols. The technical term in finite state machine theory is Mealy machine; the standard viewpoint is also known as Moore machine)

Inference in HMMs So HMMs can describe complex (temporal, spatial) relationships in data. But how can we use the model? A number of (efficient) inference algorithms exist for HMMs: Viterbi algorithm: most likely state sequence, given observables Forward algorithm: likelihood of model given observables Backward algorithm: together with Forward, allows computation of posterior probabilities Baum-Welch algorithm: parameter estimation given observables

Summary of part I Probabilistic models Observed variables Latent variables: of interest for inference, or nuisance variables Parameters: obtained from training data, or prior knowledge Bayesian networks independence structure of model represented as a graph Markov models linear Bayesian network; all nodes observed Hidden Markov models observed layer, and hidden (latent) layer of nodes efficient inference algorithm (Viterbi algorithm) Pair Hidden Markov model two observed sequences with interdependencies, determined by an unobserved Markov sequence

Part II Examples of HMMs

Detailed example: The Occasionally Crooked Casino Dirk Husmeier’s slides Slides 1-15 Recommended reading: Slides 16-23: the Forward and Backward algorithm, and posteriors

Applications in computational biology Dirk Husmeier’s slides: Slides 1-8: pairwise alignment Slides 12-16: Profile HMMs

Part III Practicals

Practical 0: HMMs What is the interpretation of the probability computed by the Forward (FW) algorithm? The Viterbi algorithm also computes a probability. How does that relate to the one computed by the FW algorithm? How do the probabilities computed by FW and Backward algorithms compare? Explain what a posterior is, either in the context of alignment using an HMM, or of profile HMMs. Why is the logarithm trick useful for the Viterbi algorithm? Does the same trick work for the FW algorithm?

Practical I: Profile HMMs in context

Lookup protein sequence of PRDM9 in the UCSC genome browser Search Intropro for the protein sequence. Look at the ProSite profile and sequence logo. Work out the syntax of the profile (HMMer syntax), and relate the logo and profile. Which residues are highly conserved? What structural role do these play? Which are not very much conserved? Can you infer that these are less important biologically? Read PMID: (PubMed). What is the meaning of the changed number of zinc finger motifs across species? Relate the conserved and changeable positions in the zinc fingers to the INTERPRO motif. Do these match the predicted pattern? Read PMID: and PMID: Explain the relationship between the recombination motif and the zinc fingers. What do you think is the cellular function of PRDM9? Relate the fact that recombination hotspots in Chimpanzee do not coincide with those in human with PRDM9. What do you predict about recombination hotspots in other mammalian species? Why do you think PRDM9 evolves so fast? Background information on motif finding:

Practical II: HMMs and population genetics

Read PMID: , and PMID: What is the difference between phylogeny and genealogy? What is incomplete lineage sorting? The model operates on multiple sequences. Is it a linear HMM, a pair HMM, or something else? What do the states represent? How could the model be improved? Which patterns in the data is the model looking for? Would it be possible to analyze these patterns without a probabilistic model? (Estimate how frequently (per nucleotide) mutations occur between the species considered. What is the average distance between recombinations?) How does the method scale to more species?

Practical III: HMMs and alignment

PMID: What are the causes of inaccuracies in alignments? Would a more accurate model of sequence evolution improve alignments? Would this be a large improvement? What is the practical limit (in terms of evolutionary distance, in mutations/site) on pairwise alignment? Would multiple alignment allow more divergent species to be aligned? How does the complexity scale for multiple alignment using HMMs, in a naïve implementation? What could you do to improve this? What is posterior decoding and how does it work? In what way does it improve alignments, compared to Viterbi? Why is this?

Practical IV: HMMs and conservation: phastCons

Read PMID: What is the difference between a phyloHMM and a “standard” HMM? How does the model identify conserved regions? How is the model helped by the use of multiple species? How is the model parameterized? The paper uses the model to estimate the fraction of the human genome that is conserved. How can this estimate be criticized? Look at a few protein-coding genes, and their conservation across mammalian species, using the UCSC genome browser. Is it always true that (protein-coding) exons are well conserved? Can you see regions of conservation outside of protein-coding exons? Do these observations suggest that the model is inaccurate? Read PMID: Summarize the differences of approaches of the new methods and the “old” phyloHMM.

Practical V: Automatic code generation for HMMs

and alignments.doc. Skip sections and alignments.doc Implementing the various algorithms for HMMs can be hard work, particularly when a reasonable efficiency is required. Library implementations are however neither fast nor flexible enough. This practical demonstrates a code generator that takes the pain out of working with HMMs. This practical takes you through an existing alignment HMM, and modifies it to identify conserved regions (à la phastCons) Requirements: a Linux system, with Java and GCC installed. Experience with C and/or C++ is helpful for this tutorial.