InterPro An Introduction

Slides:



Advertisements
Similar presentations
Duncan Legge EMBL-EBI. Introduction to InterPro Introduction to InterPro Introduction to Protein Signatures & InterPro.
Advertisements

Pfam(Protein families )
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
Archives and Information Retrieval
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Matching Problems in Bioinformatics Charles Yan Fall 2008.
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Joint EBI-Wellcome Trust Summer School June 2010.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Regulatory Affairs Domain
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
EBI web resources II: Ensembl and InterPro Yanbin Yin Fall
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
EBI is an Outstation of the European Molecular Biology Laboratory. Amaia Sangrador InterPro curator Introduction to InterPro.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Domain Database
Teresa K.Attwood School of Biological Sciences University of Manchester, Oxford Road Manchester M13 9PT, UK Bioinformatics:
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
EBI is an Outstation of the European Molecular Biology Laboratory. Amaia Sangrador InterPro curator Introduction to InterPro.
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
Bioinformatics Summer School June 2011
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
ABSTRACT First genomic scale data about gene expression have recently started to become available in addition to complete genome sequence data and annotations.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data.
Center for Biologisk Sekvensanalyse Nikolaj Blom Center for Biological Sequence Analysis BioCentrum-DTU Technical University of Denmark
Protein families, domains and motifs in functional prediction May 31, 2016.
E- Patient Medical History System
Bio/Chem-informatics
EMBL’s European Bioinformatics Institute
Performance Management
Protein Families, Motifs & Domains.
Demo: Protein Information Resource
Biological Sequence Databases
UniProt: Universal Protein Resource
Genome Annotation Continued
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
CTI STIX SC Monthly Meeting
A brief on: Domain Families & Classification
Introduction to Databases
Shane B., Esther K., Curtis S., Jennifer W.
Lab 3 – BLAST – Directed It’s a BLAST! (too easy?)
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
A brief on: Domain Families & Classification
Presentation transcript:

InterPro An Introduction 2nd IMPACT workshop 5-6 May, 2010 InterPro An Introduction European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus Overview What InterPro is Where it came from What the vision was Has it evolved in line with that vision? Is it still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus What is InterPro? According to the User Manual: “InterPro is an integrated documentation resource for protein families, domains & sites. InterPro combines a number of databases that use different methodologies & a varying degree of biological information on well-characterised proteins to derive protein signatures. By uniting the member databases, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool.” European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus Where did it come from? The concept of an integrated protein family database emerged almost 20 years ago! at the 1991 BCA spring meeting in Sheffield Amos Bairoch had a poster on PROSITE I had one on a ‘fingerprint’ database… We recognised that our approaches were under-pinned by similar philosophies to provide meaningful biological information to provide high quality manual annotation European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus Where did it come from? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus Where did it come from? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus Where did it come from? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Where did it come from? PROSITE & PRINTS were different but somehow also the same… most importantly, they were complementary In combination, we gain powerful structural & functional insights European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus Where did it come from? So where next? we had created 30 family fingerprints PROSITE documented 375 families & functional sites PROSITE was way ahead! we were still on the starting blocks… Nevertheless, we decided to apply for an EU grant to unite the databases …seemed like a good idea at the time! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus What was the vision? Naïvely, we wanted to make life easier! We aimed to simplify & rationalise protein family analysis ensuring that entries & their linked signatures pointed to related information on the same biological object centralise & streamline the annotation process reduce manual annotation burdens facilitate automatic functional annotation of uncharacterised proteins European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus How has it evolved? The EU proposal was submitted in 1992 and was promptly declined! Later, in 1995, the EBI was established at Hinxton Visiting Fellowship in 1997 to help integrate my work more closely with that of EBI Rolf, Amos & I decided to try again for an EU grant by then, Profiles, ProDom & Pfam had also been created so it made sense to include them too With the bigger picture, the grant succeeded - InterPro was born! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus How has it evolved? Prosite Release 0.1 beta was made in October 1999 It contained 2,423 entries 1,370 PROSITE entries 1,465 Pfam entries 1,157 PRINTS entries 241 preliminary profiles Based on Swiss-Prot 38 & TrEMBL 11 ProDom ProDom PRINTS InterPro Profiles Pfam European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus How has it evolved? “Various factors rendered a step-wise approach to the development of InterPro desirable. First, the scale of the task of amalgamating just the first 3 databases was immense. The rational merging of apparently equivalent database entries that in fact simultaneously define a specific family, domains within that family, or even repeats within those domains, presented an enormous challenge.” European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus How has it evolved? domain family super-family families sub-families Unravelling the biological relationships is vital! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus How has it evolved? Clearly, the task of integration was hard understanding the biological relationships being represented within member databases, let alone between them, was proving to be a significant challenge Rather than making our lives easier, it was probably making them much harder! …& that was just with 3 databases! Today, with 11 sources, life is harder still… European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus How has it evolved? Release 26.0, March 2010 It contains 20,329 entries 1,023 Gene3D entries 620 HAMAP entries 2,234 Panther entries 2,744 PIRSF entries 1,975 PRINTS entries 1,291 PROSITE regexs 836 PROSITE profiles 11,056 Pfam entries 803 SMART entries 1,095 SUPERFAMILY entries 3,689 TIGRFams Release 0.1 beta was made in October 1999 It contained 2,423 entries 1,370 PROSITE entries 1,465 Pfam entries 1,157 PRINTS entries 241 preliminary profiles Based on Swiss-Prot 38 & TrEMBL 11 European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? The database has grown almost 10-fold in ~11 years Why was it created in the first place? to simplify & rationalise protein family analysis ensuring that entries & their linked signatures pointed to related information on the same biological object to centralise & streamline the annotation process & reduce manual annotation burdens to facilitate automatic functional annotation of uncharacterised proteins to make life easier!! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019 European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019 19

Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? Why separate out structurally & functionally relevant information? Remember this? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus What is InterPro? A reminder: “InterPro is an integrated documentation resource for protein families, domains & sites. InterPro combines a number of databases that use different methodologies & a varying degree of biological information on well-characterised proteins to derive protein signatures. By uniting the member databases, InterPro capitalises on their individual strengths, producing a powerful integrated database & diagnostic tool.” European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? Integration = greater than the sum of the parts - a perfect example… This integrated view is incredibly powerful & informative! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? What does it mean? European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? They’re still not the same! Let’s see what the alignments actually look like - consider just the first 3 TM domains… They’re not the same! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? In the process of growing bigger, InterPro has grown massively in complexity Its internal convolutions now challenge us to ask, “What does it mean?” what does it all mean to end users?! & what does it all mean to computers?! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Has it evolved in line with its vision? With IMPACT, yes, InterPro has an opportunity to realise its original vision it can rationalise protein family analysis it can help to streamline the annotation process it can facilitate functional annotation of proteins it can make life easier but it can only do these things if we’re prepared to empathise, collectively, with its growing pains! That’s why this workshop is important European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

Is InterPro still fit for purpose? “There is a tremendous amount of information regarding evolutionary history and biochemical function implicit in each sequence and the number of known sequences is growing explosively. We feel it is important to collect this significant information, correlate it into a unified whole and interpret it.” Margaret O. Dayhoff to C.Berkley, February 27th, 1967 That is still InterPro’s unique opportunity! “To kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact.” Charles Darwin, 1879 This remains IMPACT’s imperative! European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus A workshop 5-6 May, 2010 Day 1 09.00-09.30 Registration 09.30-09.35 Domestic 09.35-10.00 InterPro, an introduction (Terri) 10.00-10.30 Single-motif signatures: pros, cons & added-value to InterPro (Nicolas) 10.30-11.00 Multiple-motif signatures: pros, cons & added-value to InterPro (Alex) 11.00-11.30 Coffee 11.30-12.00 Domain-based signatures: pros, cons & added-value to InterPro (Rob) 12.00-12.30 Structural annotation: pros, cons & added-value to InterPro (Corin) 12.30-13.15 InterPro today [including GO mapping] (Sarah) 13.15-14.00 Lunch 14.00-14.30 How InterPro is used to add functional annotation to UniProt (Claire) 14.30-15.30 Hands-on examples 15.30-16.00 Coffee 16.00-17.00 Open discussion/feedback 19.30- Dinner European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019

European Bioinformatics Institute Wellcome Trust Genome Campus A workshop 5-6 May, 2010 Day 2 09.30-10.00 Issues with integrating different signatures: domains 10.00-10.30 Issues with integrating different signatures: families and subfamilies 10.30-11.00 Meaningful terms to group signatures and name entries 11.00-11.30 Coffee 11:30-12:00 Unexpected sequences in match lists & how to reconcile them 12.00-12.30 Improving InterPro’s interface to better visualise, integrate & maintain data 12.30-13.00 Open discussions 13.00-13.45 Lunch 13.45-??? Format/outline/organisation of November outreach event Future funding Reviewer feedback Review of EoY deliverables – status report & action plan AOB European Bioinformatics Institute Wellcome Trust Genome Campus 2/24/2019