HMMER tutorial 羅偉軒 Account IP: 140.129.78.120 Account: binfo2005 Password: 2005binfo.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Chapter 2 Data Searches and Pairwise Alignments 暨南大學資訊工程學系 黃光璿 2004/03/08.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Hidden Markov Model Ed Anderson and Sasha Tkachev.
Using PFAM database’s profile HMMs in MATLAB Bioinformatics Toolkit Presentation by: Athina Ropodi University of Athens- Information Technology in Medicine.
Homework Assignments due next session 1.Find a entry of interest in OMIM ( )
Heuristic alignment algorithms and cost matrices
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
Protein Domain Analysis Using Hidden Markov Models Liangjiang (LJ) Wang March 10, 2005 PLPTH 890 Introduction to Genomic Bioinformatics.
Similar Sequence Similar Function Charles Yan Spring 2006.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo.
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Psi-Blast: Detecting structural homologs Psi-Blast was designed to detect homology for highly divergent amino acid sequences Psi = position-specific iterated.
Sequence alignment, E-value & Extreme value distribution
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Protein Sequence Alignment and Database Searching.
Hidden Markov Models for Sequence Analysis 4
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
The Pfam and MEROPS databases EMBO course 2004 Robert Finn
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
MCB 5472 Lecture #4: Probabilistic models of homology: Psi-BLAST and HMMs February 17, 2014.
Lab7 QRNA, HMMER, PFAM. Sean Eddy’s Lab
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
11 Overview Paracel GeneMatcher2. 22 GeneMatcher2 The GeneMatcher system comprises of hardware and software components that significantly accelerate a.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Protein and RNA Families
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Lab7 Twinscan, HMMER, PFAM. TWINSCAN TwinScan TwinScan finds genes in a "target" genomic sequence by simultaneously maximizing the probability of the.
Finding new nirK genes in metagenomic data
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Search Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Copyright OpenHelix. No use or reproduction without express written consent1.
PORTING HMMER AND INTERPROSCAN TO THE GRID Daniel Alberto Burbano Sefair ( ) Michael Angel Pérez Cabarcas.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
HANDS-ON ConSurf! Web-Server: The ConSurf webserver.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
DNA / protein sequence analysis 第九組成員: 吳宇軒 侯卜夫 朱子豪 王俊偉
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
EMBL-EBI, programmatically - take a REST from manual searching: Sequence analysis tools Web Production Team Anna Foix Joon Lee.
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
BLAST.
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool
BLAST Slides adapted & edited from a set by
Sequence alignment, E-value & Extreme value distribution
BLAST Slides adapted & edited from a set by
Presentation transcript:

HMMER tutorial 羅偉軒

Account IP: Account: binfo2005 Password: 2005binfo

HMMER The theory behind profile HMMs: R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, Biological sequence analysis: probabilistic models of proteins and nucleic acids, Cambridge University Press, Biological sequence analysis: probabilistic models of proteins and nucleic acids

Flowchart

Format of input alignment files Output of CLUSTAL family of programs Wisconsin/GCG MSF format the input format for the PHYLIP phylogenetic analysis programs aligned FASTA format Stockholm format (HMMER’s native format, used by the Pfam and Rfam databases) SELEX format

Searching a sequence database with a single profile HMM build a profile HMM with hmmbuild > hmmbuild globin.hmm globins50.msf calibrate the profile HMM with hmmcalibrate > hmmcalibrate globin.hmm search the sequence database with hmmsearch > hmmsearch globin.hmm Artemia.fa

local alignment versus global alignment To HMMER, whether local or global alignments are allowed is part of the model, rather than being accomplished by running a different algorithm. you need to choose what kind of alignments you want to allow when you build the model with hmmbuild. By default, hmmbuild builds models which allow alignments that are global with respect to the HMM, local with respect to the sequence, and allows multiple domains to hit per sequence.

Searching a query sequence against a profile HMM database creating your own profile HMM database > hmmbuild -A myhmms rrm.sto > hmmbuild -A myhmms fn3.sto > hmmbuild -A myhmms pkinase.sto > hmmcalibrate myhmms parsing the domain structure of a sequence with hmmpfam > hmmpfam myhmms 7LES DROME

Creating and maintaining multiple alignments with hmmalign Another use of profile HMMs is to create multiple sequence alignments of large numbers of sequences. A profile HMM can be build of a “seed” alignment of a small number of representative sequences, and this profile HMM can be used to efficiently align any number of additional sequences. > hmmalign -o globins630.ali globin.hmm globins630.fa

HMMER scoring and determining significance HMMER gives you at least two scoring criteria to judge by: the HMMER raw score, and an E- value. The E-value is calculated from the bit score. It tells you how many false positives you would have expected to see at or above this bit score. HMMER bit scores reflect whether the sequence is a better match to the profile model (positive score) or to the null model of nonhomologous sequences (negative score).

hmmsearch output

Building a model –hmmbuild From a multiple sequence alignmenthmmbuild Using a model –hmmalign Align sequences to an existing model (outputs a multiple alignment)hmmalign –hmmconvert Convert a model into different formatshmmconvert –hmmcalibrate Takes an HMM and empirically determines parameters that are used to make searches more sensitive, by calculating more accurate expectation value scores (E-values)hmmcalibrate –hmmemit Emit sequences probabilistically from a profile HMMhmmemit –hmmsearch Search a sequence database for matches to an HMMhmmsearch HMMs Databases –hmmfetch Get a single model from an HMM databasehmmfetch –hmmindex: Index an HMM database (not available on the WEB server) –hmmpfam Search an HMM database for matches to a query sequencehmmpfam Other programs –alistat: Show some simple statistics about a sequence alignment filealistat –seqstat: Show some simple statistics about a sequence fileseqstat –getseq: Retrieve a (sub-)sequence from a sequence file (not available on the WEB server) –sreformat: Reformat a sequence(s) or alignment file into a different formatsreformat

References HMMER user guide Eddy SR. (1998) Profile hidden Markov models. Bioinformatics.Eddy SR

Related links HMMER SAM PFTOOLS HMMpro GENEWISE PROBE ftp://ftp.ncbi.nih.gov/pub/neuwald/probe1.0/ META-MEME BLOCKS PSI-BLAST

Homework: Search for homologies with hidden Markov models Obtain the UniProtKB/Swiss-Prot entry P10242 of the myb proto-oncogene protein (AC P10242, entry MYB_HUMAN) Take the amino acid sequence of the myb protein and search against the NCBI nr protein database with BLASTp to obtain a HMM for myb-domains and use this HMM for searching against the UniProt-SwissProt protein database. Select 10 myb-domains while screening the hits of the BLASTp search and copy the corresponding parts of the sequences to a file in fasta-format Do a multiple sequence alignment with these ten myb- domains by ClustalW.

Homework: Search for homologies with hidden Markov models (cont.) Download HMMER from and install. Build and calibrate a HMM of these myb- domains by means of hmmbuild and hmmcalibrate. Use hmmsearch to search against the UniProt- SwissProt protein library with the HMM of the myb-domains. Screen the hits, build a new HMM including selected hits and hmmsearch again. How many hits do you get? What are they?

HMM

Some examples