Profiles for Sequences

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Hidden Markov Model.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Ab initio gene prediction Genome 559, Winter 2011.
Ch9 Reasoning in Uncertain Situations Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Hidden Markov Models.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Hidden Markov Models Fundamentals and applications to bioinformatics.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Patterns, Profiles, and Multiple Alignment.
Hidden Markov Models Modified from:
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Ka-Lok Ng Dept. of Bioinformatics Asia University
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Hidden Markov Models Theory By Johan Walters (SR 2003)
JM - 1 Introduction to Bioinformatics: Lecture XIII Profile and Other Hidden Markov Models Jarek Meller Jarek Meller Division.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Hidden Markov Models in Bioinformatics Example Domain: Gene Finding Colin Cherry
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Biochemistry and Molecular Genetics Computational Bioscience Program Consortium for Comparative Genomics University of Colorado School of Medicine
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Lyle Ungar, University of Pennsylvania Hidden Markov Models.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Comparative ab initio prediction of gene structures using pair HMMs
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Hw1 Shown below is a matrix of log odds column scores made from an alignment of a set of sequences. (A) Calculate the alignment score for each of the four.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Multiple Sequence Alignment BMI/CS 576 Colin Dewey Fall 2010.
Applications of HMMs Yves Moreau Overview Profile HMMs Estimation Database search Alignment Gene finding Elements of gene prediction Prokaryotes.
Hidden Markov Models As used to summarize multiple sequence alignments, and score new sequences.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
10/29/20151 Gene Finding Project (Cont.) Charles Yan.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein and RNA Families
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
(H)MMs in gene prediction and similarity searches.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Introducing Hidden Markov Models First – a Markov Model State : sunny cloudy rainy sunny ? A Markov Model is a chain-structured process where future states.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Genome Annotation (protein coding genes)
Ab initio gene prediction
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY
Presentation transcript:

Profiles for Sequences Hidden Markov Models in Computational Biology Profiles for Sequences

Sequence Profiles Often, sequences are characterized by similarities that are not well captured through matching algorithms. For example, identification of genes in the presence of exons/introns, gene features (CpG islands, etc.), domain profiles in proteins, among others. For such sequences, Markov chains provide useful abstractions.

Markov Chains State transition matrix : The probability of Rain Sunny Cloudy State transition matrix : The probability of the weather given the previous day's weather. States : Three states - sunny, cloudy, rainy. Initial Distribution : Defining the probability of the system being in each of the states at time 0.

Hidden Markov Models Hidden states : the (TRUE) states of a system that may be described by a Markov process (e.g., the weather). Observable states : the states of the process that are `visible’.

Hidden Markov Models Initial Distribution : Initial state probability vector. State transition Matrix Emission Probabilities: containing the probability of observing a particular observable state given that the hidden model is in a particular hidden state.

Hidden Markov Models Output Prob. Transition Prob. Output Prob. Observed sequences can be scored if their state transitions are known. The probability of ACCY along this path is: .4 * .3 * .46 * .6 * .97 * .5 * .015 * .73 *.01 * 1 = 1.76x10-6.

Methods for Hidden Markov Models Scoring problem: Given an existing HMM and observed sequence , what is the probability that the HMM can generate the sequence

Methods, contd. Alignment Problem Given a sequence, what is the optimal state sequence that the HMM would use to generate it

Methods, contd. Training Problem How do we estimate the structure and parameters of a HMM from data.

HMMs– Some Applications Gene finding and prediction Protein-Profile Analysis Secondary Structure prediction Copy Number Variation Characterizing SNPs

Gene Template (Left) (Removed)

HMMs: Applications Classification: Classifying observations within a sequence Order: A DNA sequence is a set of ordered observations Structure : can be intuitively defined: Measure of success: # of complete exons correctly labeled Training data: Available from various genome annotation projects

Hidden Markov Models in Computational Biology HMMs for Gene Finding Training - Expectation Maximization (EM) Parsing – Viterbi algorithm from Salzberg, Chapter 4. pp 50. An HMM for unspliced genes. x : non-coding DNA c : coding state

Genefinders: a Comparison Sn = Sensitivity Sp = Specificity Ac = Approximate Correlation ME = Missing Exons WE = Wrong Exons GENSCAN Performance Data, http://genes.mit.edu/Accuracy.html

Protein Profile HMMs Motivation What is a Profile? Given a single amino acid target sequence of unknown structure, we want to infer the structure of the resulting protein. Use Profile Similarity What is a Profile? Proteins families of related sequences and structures Same function Clear evolutionary relationship Patterns of conservation, some positions are more conserved than the others

Transition probabilities HMMs From Alignment ACA - - - ATG TCA ACT ATC ACA C - - AGC AGA - - - ATC ACC G - - ATC insertion Transition probabilities Output Probabilities A HMM model for a DNA motif alignments, The transitions are shown with arrows whose thickness indicate their probability. In each state, the histogram shows the probabilities of the four bases.

HMMs from Alignments Deletion states Matching states Insertion states No of matching states = average sequence length in the family PFAM Database - of Protein families (http://pfam.wustl.edu)

Database Searching Given HMM, M, for a sequence family, find all members of the family in data base. LL – score LL(x) = log P(x|M) (LL score is length dependent – must normalize or use Z-score)

Querying a Sequence Suppose I have a query protein sequence, and I am interested in which family it belongs to? There can be many paths leading to the generation of this sequence. Need to find all these paths and sum the probabilities. Consensus sequence: P (ACACATC) = 0.8x1 x 0.8x1 x 0.8x0.6 x 0.4x0.6 x 1x1 x 0.8x1 x 0.8 = 4.7 x 10 -2 ACAC - - ATC

Multiple Alignments Try every possible path through the model that would produce the target sequences Keep the best one and its probability. Output : Sequence of match, insert and delete states Viterbi alg. Dynamic Programming

HMMs from Unaligned Sequences Baum-Welch Expectation-maximization method Start with a model whose length matches the average length of the sequences and with random output and transition probabilities. Align all the sequences to the model. Use the alignment to alter the output and transition probabilities Repeat. Continue until the model stops changing By-product: a multiple alignment

PHMM Example An alignment of 30 short amino acid sequences chopped out of a alignment of the SH3 domain. The shaded area are the most conserved and were represented by the main states in the HMM. The unshaded area was represented by an insert state.