Duncan Legge EMBL-EBI. Introduction to InterPro Introduction to InterPro Introduction to Protein Signatures & InterPro.

Slides:



Advertisements
Similar presentations
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Advertisements

Pfam(Protein families )
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Protein function and classification Hsin-Yu Chang
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Profiles for Sequences
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein and Function Databases
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Protein function and classification Hsin-Yu Chang
Protein function and classification Hsin-Yu Chang
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Understanding proteins: resources for identification and annotation.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
EBI web resources II: Ensembl and InterPro Yanbin Yin Fall
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
EBI is an Outstation of the European Molecular Biology Laboratory. Amaia Sangrador InterPro curator Introduction to InterPro.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein and RNA Families
Proteins to Proteomes The InterPro Database
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Domain Database
Teresa K.Attwood School of Biological Sciences University of Manchester, Oxford Road Manchester M13 9PT, UK Bioinformatics:
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
EBI is an Outstation of the European Molecular Biology Laboratory. Amaia Sangrador InterPro curator Introduction to InterPro.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
InterPro Sandra Orchard.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Protein families, domains and motifs in functional prediction May 31, 2016.
Protein families, domains and motifs in functional prediction
Bio/Chem-informatics
Protein Families, Motifs & Domains.
Demo: Protein Information Resource
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Sandra Orchard EMBL-EBI
Genome Annotation Continued
UniProt: the Universal Protein Resource
PIR: Protein Information Resource
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
InterPro An Introduction
A brief on: Domain Families & Classification
A brief on: Domain Families & Classification
Presentation transcript:

Duncan Legge EMBL-EBI

Introduction to InterPro Introduction to InterPro Introduction to Protein Signatures & InterPro

Introduction to InterPro Introduction to InterPro Protein Signatures Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic.

Introduction to InterPro Introduction to InterPro Manual curation Integration of signaturesInterPro Foundations of InterPro

Introduction to InterPro Introduction to InterPro InterPro Consortium Consortium of 11 major signature databases

Introduction to InterPro Better at finding proteins with common function  Find more distant homologues than BLAST What value are signatures?

Introduction to InterPro Better at finding proteins with common function What value are signatures? Classification of proteins  Associate proteins that share: Function Domains Sequence Structure

Introduction to InterPro What value are signatures? Annotation of protein sequences  Define conserved regions of a protein -e.g. location and type of domains key structural or functional sites Classification of proteins Better at finding proteins with common function

Introduction to InterPro Introduction to InterPro Protein Signature Methods

Introduction to InterPro Introduction to InterPro How are protein signatures made? Multiple sequence alignment Protein family/domain Build modelSearch Significant matches ITWKGPVCGLDGKTYRNECALL AVPRSPVCGSDDVTYANECELK SVPRSPVCGSDGVTYGTECDLK HPPPGPVCGTDGLTYDNRCELR E-value 1e-49 E-value 3e-42 E-value 5e-39 E-value 6e-10 Protein signature Refine

Introduction to InterPro Types of Protein signatures (sequence based) Multiple protein alignment

Introduction to InterPro Single motif methods Regular expression patterns C - C - {P} - x(2) - C - [STDNEKPI] - C Types of Protein signatures (sequence based)

Introduction to InterPro C - C - {P} - x(2) - C - [STDNEKPI] - C Must be this { } = cannot be.. x = any AA ( ) = number of AAs x = any AA ( ) = number of AAs [ ] = any of Single motif methods Regular expression patterns Types of Protein signatures (sequence based)

Introduction to InterPro Multiple motif methods Identity matrices Fingerprints Single motif methods Regular expression patterns Types of Protein signatures (sequence based) 123

Introduction to InterPro Full domain alignment methods Profiles (Profile Library) Hidden Markov Models Mathematical model of amino acid probability Multiple motif methods Identity matrices Fingerprints Single motif methods Regular expression patterns Types of Protein signatures (sequence based) M1M2M3M4 I1 I2 I3 D2 D3

16 Introduction to InterPro CONTRIBUTING MEMBER DATA BASES Hidden Markov Models Finger- Prints ProfilesPatterns Sequence Clusters Structural Domains Functional annotation of families/domains Prediction of conserved domains Protein features (active sites…) Models built on either sequence or structural alignments Each MDB has its own focus

Introduction to InterPro DatabaseBasisInstitution Built from FocusURL PfamHMMSanger Institute Sequence alignment Family & Domain based on conserved sequence k/ Gene3DHMMUCL Structure alignment Structural Domain ucl.ac.uk/Gene3D/ SuperfamilyHMMUni. of Bristol Structure alignment Evolutionary domain relationships SMARTHMMEMBL Heidelberg Sequence alignment Functional domain annotation heidelberg.de/ TIGRFAMHMMJ. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification s/research/projects/tigrf ams/overview/ PantherHMMUni. S. California Sequence alignment Family functional classification rg/ PIRSFHMM PIR, Georgetown, Washington D.C. Sequence alignment Functional classification du/pirwww/dbinfo/pirsf. shtml PRINTSFingerprintsUni. of Manchester Sequence alignment Family functional classification hester.ac.uk/dbbrowser/ PRINTS/index.php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation e/ HAMAPProfilesSIB Sequence alignment Microbial protein family classification /hamap/ ProDom Sequence clustering PRABI : Rhône-Alpes Bioinformatics Center Sequence alignment Conserved domain prediction rodom/current/html/ho me.php

Introduction to InterPro Introduction to InterPro A Closer look at InterPro

Introduction to InterPro Master headline Manual curation Integration of signaturesInterPro Foundations of InterPro

Introduction to InterPro Master headline InterPro Curation Priniciples -To represent MDBs signatures as closely as possible to what they intended -To reflect biological reality as accurately as possible in the entry we create by using types, relationships, GO mapping -To provide as much information to the end user as possible about the signature by annotating signatuires and providing links to other databases.

Introduction to InterPro Master headline InterPro Entry Groups similar signature together Adds extensive annotation Linked to other databases Structural information and viewers Links related signatures

Introduction to InterPro Master headline Link related signatures - relationships 1) Parent - Child (subgroup of more closely related proteins) PFAM (75) (100) SMART Protein kinase Serine kinase PROSITE (25) Tyrosine kinase * PFAM (100)Protein kinase * No proteins in common SMARTPROSITEPFAM Protein kinase SMARTPROSITE Serine kinaseTyrosine kinase Parent Children Applies to domains and families

Introduction to InterPro Master headline The InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Biological units with defined boundaries Short sequences typically repeated within a protein PTM Active Site Binding Site Conserved Site

Introduction to InterPro Searching InterPro protein ID Paste in unknown sequence

Introduction to InterPro InterPro Search Results Structural data Link to PDBe Unintegrated signatures Domains and sites Family

Introduction to InterPro Links to signature databases Link to InterPro entry

Introduction to InterPro Select member databases

Introduction to InterPro Caveats We need your feedback! missing/additional references reporting problems requests InterPro entries are based on signatures supplied to us by our member databases....this means no signature, no entry!

Introduction to InterPro InterPro Team: ACKNOWLEDGEMENTS Sarah Hunter Craig McAnulla Phil Jones Siew-Yit Yong Sebastien Pesseat Alex Mitchell Matthew Fraser Amaia Sangrador Maxim Scheremetje w