Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Markov models and applications
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Dynamic Bayesian Networks (DBNs)
Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R Jeff Krause (Shodor)
Hidden Markov Models.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models First Story! Majid Hajiloo, Aria Khademi.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
… Hidden Markov Models Markov assumption: Transition model:
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 14: Introduction to Hidden Markov Models Martin Russell.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Intro to comp genomics Lecture 3-4: Examples, Approximate Inference.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
Lecture 5: Learning models using EM
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
Genome evolution: a sequence-centric approach Lecture 5: Undirected models and variational inference.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
. Class 5: HMMs and Profile HMMs. Review of HMM u Hidden Markov Models l Probabilistic models of sequences u Consist of two parts: l Hidden states These.
Temporal Processes Eran Segal Weizmann Institute.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Genome evolution: a sequence-centric approach Lecture 4: Beyond Trees. Inference by sampling Pre-lecture draft – update your copy after the lecture!
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Probabilistic Sequence Alignment BMI 877 Colin Dewey February 25, 2014.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models for Sequence Analysis 4
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
CS Statistical Machine learning Lecture 24
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Monkey Business Bioinformatics Research Center University of Aarhus Thomas Mailund Joint work with Asger Hobolth, Ole F. Christiansen and Mikkel H. Schierup.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Introduction to Profile HMMs
Hidden Markov Models BMI/CS 576
Today.
Gil McVean Department of Statistics, Oxford
Hidden Markov Models Part 2: Algorithms
Presentation transcript:

Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs

Web: Access ppts and ex. directly: /home/atanay/public_html/GenomeEvo/ Subscribe to course messages:

Course outline Probabilistic models Inference Parameter estimation Genome structure Mutations Population Inferring Selection (Probability, Calculus/Matrix theory, some graph theory, some statistics) Simple Tree Models (Continuous time) Markov Chaing

Stochastic Processes and Stationary Distributions Stationary Model Process Model t

Inference on trees and the EM algorithm: summary Inference using dynamic programming (up-down Message passing): Marginal/Posterior probabilities: The EM update rule (conditional probabilities per lineage):

Bayesian Networks Defining the joint probability for a set of random variables given: 1)a directed acyclic graph 2)Conditional probabilities Claim: if G is a tree, we can compute marginals and total probability efficiently Proof: exactly what we did last time.. Claim: For General G, inference is NP hard whiteboard/ exercise Why the up-down will not work? We will discuss methods for approximate inference in detail later, now, lets look for more easy cases Claim: if G have no cycles, whiteboard/ exercise

Markov Models x t the state at time t Transition probabilities are defining the process Add an initial condition to define a distribution on infinite sequences: Problem: we observe finite sequences…and infinite probability spaces are difficult to work with Solution: add an absorbing finish state. Add start state to express probability at time 0.

Hidden Markov Models Observing only emissions of states to some probability space E Each state is equipped with an emission distribution (x a state, e emission) Emission space Caution! This is NOT the HMM Bayes Net 1.Cycles 2.States are NOT random vars!

Hidden Markov Models The HMM can be viewed as a template-model Given a sequence of observations or just its length, we can form a BN Since the BN will be have a tree topology, we know how to compute posteriors Emissions StatesFinishStart

Inference in HMM Forward formula: (like the down alg): Basically, exactly what we saw for trees Backward formula: (like the up alg):

EM for HMMs Can we apply the tree EM verbatim? Emissions States Finish Start Almost, but we have to share parameters: Claim: HMM EM is monotonically improving the likelihood (i.e., sharing parameters is ok)

Hidden states Example: Two Markov models describe our data Switching between models is occurring at random How to model this? No Emission Hidden state

Hidden states What about hidden cycles? Hidden Emitting

Profile HMM for Protein or DNA motifs M I D M I D M I D M I D M I D M I D S F M (Match) states emit a certain amino acid/nucleotide profile I (Insert) states emit some background profile D (Delete) states are hidden Can use EM to train the parameters from a set of examples The use the model for classification or annotation (Both emissions and transition probabilities are informative!) (How do we determine the right size of the model?) (google PFAM, Prosite, “HMM profile protein domain”)

N-order Markov model For evolutionary models, the Markov property makes much sense For spatial (sequence) effects, the Markov property is a (horrible) heuristic N-order relations can be modeled naturally Common error: Forward/Backward in N-order HMM. Dynamic programming would work?

Emissions StatesFinishStart FinishStart 1-HMM Bayes Net: 2-HMM Bayes Net: (You shall explore the inference problem in Ex 2)

Pair-HMM Given two sequences s 1,s 2, an alignment is defined by a set of ‘gaps’ (or indels) in each of the sequences. ACGCGAACCGAATGCCCAA---GGAAAACGTTTGAATTTATA ACCCGT-----ATGCCCAACGGGGAAAACGTTTGAACTTATA indel Standard dynamic programming algorithm compute the best alignment given such distance metric: Standard distance metric: Affine gap cost:Substitution matrix:

Pair-HMM Generalize the HMM concept to probabilistically model alignments. Problem: we are observing two sequences, not a-priori related. What will be emitted from our HMM? M G1G1 G2G2 S F Match states emit and aligned nucleotide pair Gap states emit a nucleotide from one of the sequences only Pr(M->G i ) – “gap open cost”, Pr(G 1 ->G 1 ) – “gap extension cost” Is it a BN template? Forward-backward formula? Whiteboard/ Exercise

Mixture models Whiteboard/ Exercise Inference? EM for Parameter estimation? What about very high dimensions?