By: Manchikalapati Myerow Shivananda Monday, April 14, 2003

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models in Bioinformatics
Hidden Markov Models.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Statistical NLP: Lecture 11
Hidden Markov Models in Bioinformatics
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in Bioinformatics Applications
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Profile HMMs for sequence families and Viterbi equations Linda Muselaars and Miranda Stobbe.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Lecture 5: Learning models using EM
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Comparative ab initio prediction of gene structures using pair HMMs
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Profile HMMs Biology 162 Computational Genetics Todd Vision 16 Sep 2004.
Hidden Markov Models In BioInformatics
Probabilistic Sequence Alignment BMI 877 Colin Dewey February 25, 2014.
Applications of HMMs Yves Moreau Overview Profile HMMs Estimation Database search Alignment Gene finding Elements of gene prediction Prokaryotes.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Isolated-Word Speech Recognition Using Hidden Markov Models
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
1 MARKOV MODELS MARKOV MODELS Presentation by Jeff Rosenberg, Toru Sakamoto, Freeman Chen HIDDEN.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
Multiple alignment using hidden Markove models November 21, 2001 Kim Hye Jin Intelligent Multimedia Lab
(H)MMs in gene prediction and similarity searches.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models BMI/CS 576
Free for Academic Use. Jianlin Cheng.
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
Presentation transcript:

By: Manchikalapati Myerow Shivananda Monday, April 14, 2003 Application of Hidden Markov Model for Sequence Analysis and Use for Predicting Protein Localization By: Manchikalapati Myerow Shivananda Monday, April 14, 2003

Mathematical Modeling Mathematical Modeling in biology and chemistry Using probabilistic models Bayes Theorem and Maximum Likelihood Theorem Ex: HMM

What is Markov Chain ? A directed graph with a collection of states with transition probabilities. Models a random process with finite states. Markov Assumption – The chain is memory less and current state probability depends on previous state. This allows us to predict behavior.

Hidden Markov Model Hidden Markhov Model A probabilistic model that is composed of states which are not observable events. A statistical model that describes a probability distribution over a number of possible sequences. HMM has the following components: States Symbol emission probabilities State transition probabilities Why Hidden? Only the symbol sequence that a hidden state emits is observable. Protein Modeling using HMM.

What is Hidden? in the Markov Model Observed sequence is a probabilistic function of underlying Markov chain In HMMs the state sequence is not uniquely determined by the observed symbol sequence, but must be inferred probabilistically from it.

Definition of Profile A profile is a description of the consensus of a multiple sequence alignment. Alignment Methods Position Specific Scoring System Position Independent (Pairwise alignment) Scoring System Ex: BLAST, FASTA

Profile HMM Is a linear state machine consisting of a series of nodes, each of which corresponds roughly to a position (column) in the alignment from which it was built. The HMM will have a set of positions which would correspond to the columns in a multiple alignment and each column can have one of the three states: Insert, Delete and Match. Profile HMMs can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus.

Profile HMM vs Std Profiles Profile HMMs have a formal probabilistic basis and have a consistent theory behind gap and insertion scores. Profile HMMs apply a statistical method to estimate the true frequency of a residue at a given position in the alignment from its observed frequency. In general, producing good profile HMMs requires less skill and manual intervention than producing good standard profiles. Standard profile methods use heuristic methods. Standard profiles use the observed frequency itself to assign the score for that residue.

Three Algorithms of HMM The Viterbi algorithm: get the most probable state sequence. The Forward/Backward algorithm: score an observation sequence against a model. Expectation/Maximization: get the parameters of the model from the data. For all HMM applications, the algorithms are fairly standard. Only the design of the model are different.

Application of HMM Gene finding Chromosome identification Protein applications include Database searching Homology detection Ex:One could take a single sequence of interest, and query it against the model to determine if it contained certain domains of interest.

HMM and its basic elements 1)Match States(M1,M2..) 2)Delete State(D1,D2…) 3)Insert States(I0,I1…) 4) Begin State 5)End State 6)Emmision Probabilities 7) Transition Probabilites 8) Parameters

Problems “DEFINE” HMM Architecture Problem at hand (given below)defines architecture(to the left) Finding Ungapped Motifs - BLOCKS Finding Multiple MotifsMETA-MEME Finding Protein Familes  ProfileHMMs(Krogh) HMMER2 architecture is used in SAM,HMMER.

HMM Profile alignment flow chart in Pfam

Three Important Questions that HMM should answer Scoring 1Q) How likely is a given sequence coming from the model? Alignment 2Q)What is the optimal path for generating a given sequence Training 3Q) Given a set of sequences how can you learn about the HMM parameters

Q1)How likely is the given Seq (ACCY) coming from the model Answer Forward Algorithm Prob(A in state I0) = 0.4*0.3=0.12 Prob(C in state I1) = 0.05*0.06*0.5 = 0.015 Prob(C in state M1) = 0.46*0.01= 0.005 Prob(C in state M2) = (0.005*0.97) +(0.015*0.46)= .012 Prob(Y in state I3) = .012*0.015*0.73*0.01 = 1.31x10-7 Prob(Y in state M3) = .012*0.97*0.2 = 0.002

Q2)What is the optimal path for generating a given seq(ACCY) Answer: Viterbi Algorithim 1. The probability that the amino acid A was generated by state I0 is computed and entered as the first element of the matrix. 2. The probabilities that C is emitted in state M1 (multiplied by the probability of the most likely transition to state M1 from state I0) and in state I1 (multiplied by the most likely transition to state I1 from state I0) are entered into the matrix element indexed by C and I1/M1. 3. The maximum probability, max(I1, M1), is calculated. 4. A pointer is set from the winner back to state I0. 5. Steps 2-4 are repeated until the matrix is filled. Prob(A in state I0) = 0.4*0.3=0.12 Prob(C in state I1) = 0.05*0.06*0.5 = .015 Prob(C in state M1) = 0.46*0.01 = 0.005 Prob(C in state M2) = 0.46*0.5 = 0.23 Prob(Y in state I3) = 0.015*0.73*0.01 = .0001 Prob(Y in state M3) = 0.97*0.23 = 0.22 The most likely path through the model can now be found by following the back-pointers.

3Q)Given a set of sequences how do you learn about HMM params The Learning Task given: – a model – a set of sequences (the training set) do: – find the most likely parameters to explain the training sequences the goal is find a model that generalizes well to sequences we haven’t seen before Answer: Baum-Welch(Forward Backward) Algorithm initialize parameters of model iterate until convergence – calculate the expected number of times each transition or emission is used – adjust the parameters to maximize the likelihood of these expected values

HMMER in the Workflow

Tripartite structure of signal peptide

Translocation of Signal Peptide and Signal Anchor After translocation the signal peptide is cleaved off and the mature protein released, signal anchor The signal anchor is not cleaved off and the protein is anchored to the membrane

Two HMM Models for Signal Peptides First Model: (Nielsen, H and Krogh A. Prediction of signal peptides and signal anchors by a hidden Markov model. Proc. Sixth Int. Conf on Intelligent Systems for Molecular Biology, 122-130. AAAI Press, 1998.) Model not based on Multiple sequence alignment (profile) Compare model to neural network in eukaryotes and prokaryotes

The model used for signal peptides The model used for signal peptides. The states in a shaded box are tied to each other.

Combined Model The model of signal anchors has only two types of states (grouped by the shaded boxes) apart from the Met state. The final states shown in the shaded box are tied to each other, and model all residues not in a signal peptide or an anchor.

Hidden Markov model (HMM) vs. neural network (NN) Cleavage site location: percentage of signal peptide sequences where the cleavage site was placed correctly Discrimination values: correlation coefficients (Mathews 1975). Protein types: signal peptides (sig) cytoplasmic or nuclear—proteins (non-sec), and signal anchors (anc). NN simple= S-score NN combined= Y-score

Second model for Signal Peptide Barash S, Wang W, and Shi Y. Human secretory signal peptide description by hidden Markov model and generation of a strong artificial signal peptide for secreted protein expression. Biochem and Biophys Res Com 294, 835-842, 2002. Profile HMM method using HMMER software

Steps for Model Building with HMMER N-terminal region of 416 non-redundant human secreted proteins Training in hmmalign: all start Met aligned in first column, 406/416 cleavage sites aligned Build model with MLL estimation (random model= Swiss Prot 34) Evaluate alignment model: 416/416 start Met, 406/416 cleavage site, 416/416 h-region Re-estimate HMM with maximum discrimination method

Model Validation Used hmmemit program to generate artificial sequences of variable bit scores In vitro validation using secretion test plasmid constructs: using secretory alkP with native signalP replaced by HMM signal peptides, the signal strengths correlate with the bit scores (transcription or translation effect?) Ranked signal strengths of known natural human secretory proteins: above average serum proteins such as albumin were found to have high bit scores

Conclusion HMM and its applicability to sequence analysis has been discussed Two different HMM architectures for modeling the signal peptide have been shown Both are able to perform the task of separating secreted proteins from cytoplasmic and nuclear proteins with excellent discrimination Discrimination of signal peptides from signal anchors is a little less clean Multiple modeling strategies may be beneficial depending on the nature of the query and available data for training