Duncan Legge EMBL-EBI
Introduction to InterPro Introduction to InterPro Introduction to Protein Signatures & InterPro
Introduction to InterPro Introduction to InterPro Protein Signatures Protein Signature = an amino acid sequence (not necessarily consecutive) associated with a protein characteristic.
Introduction to InterPro Introduction to InterPro Manual curation Integration of signaturesInterPro Foundations of InterPro
Introduction to InterPro Introduction to InterPro InterPro Consortium Consortium of 11 major signature databases
Introduction to InterPro Better at finding proteins with common function Find more distant homologues than BLAST What value are signatures?
Introduction to InterPro Better at finding proteins with common function What value are signatures? Classification of proteins Associate proteins that share: Function Domains Sequence Structure
Introduction to InterPro What value are signatures? Annotation of protein sequences Define conserved regions of a protein -e.g. location and type of domains key structural or functional sites Classification of proteins Better at finding proteins with common function
Introduction to InterPro Introduction to InterPro Protein Signature Methods
Introduction to InterPro Introduction to InterPro How are protein signatures made? Multiple sequence alignment Protein family/domain Build modelSearch Significant matches ITWKGPVCGLDGKTYRNECALL AVPRSPVCGSDDVTYANECELK SVPRSPVCGSDGVTYGTECDLK HPPPGPVCGTDGLTYDNRCELR E-value 1e-49 E-value 3e-42 E-value 5e-39 E-value 6e-10 Protein signature Refine
Introduction to InterPro Types of Protein signatures (sequence based) Multiple protein alignment
Introduction to InterPro Single motif methods Regular expression patterns C - C - {P} - x(2) - C - [STDNEKPI] - C Types of Protein signatures (sequence based)
Introduction to InterPro C - C - {P} - x(2) - C - [STDNEKPI] - C Must be this { } = cannot be.. x = any AA ( ) = number of AAs x = any AA ( ) = number of AAs [ ] = any of Single motif methods Regular expression patterns Types of Protein signatures (sequence based)
Introduction to InterPro Multiple motif methods Identity matrices Fingerprints Single motif methods Regular expression patterns Types of Protein signatures (sequence based) 123
Introduction to InterPro Full domain alignment methods Profiles (Profile Library) Hidden Markov Models Mathematical model of amino acid probability Multiple motif methods Identity matrices Fingerprints Single motif methods Regular expression patterns Types of Protein signatures (sequence based) M1M2M3M4 I1 I2 I3 D2 D3
16 Introduction to InterPro CONTRIBUTING MEMBER DATA BASES Hidden Markov Models Finger- Prints ProfilesPatterns Sequence Clusters Structural Domains Functional annotation of families/domains Prediction of conserved domains Protein features (active sites…) Models built on either sequence or structural alignments Each MDB has its own focus
Introduction to InterPro DatabaseBasisInstitution Built from FocusURL PfamHMMSanger Institute Sequence alignment Family & Domain based on conserved sequence k/ Gene3DHMMUCL Structure alignment Structural Domain ucl.ac.uk/Gene3D/ SuperfamilyHMMUni. of Bristol Structure alignment Evolutionary domain relationships SMARTHMMEMBL Heidelberg Sequence alignment Functional domain annotation heidelberg.de/ TIGRFAMHMMJ. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification s/research/projects/tigrf ams/overview/ PantherHMMUni. S. California Sequence alignment Family functional classification rg/ PIRSFHMM PIR, Georgetown, Washington D.C. Sequence alignment Functional classification du/pirwww/dbinfo/pirsf. shtml PRINTSFingerprintsUni. of Manchester Sequence alignment Family functional classification hester.ac.uk/dbbrowser/ PRINTS/index.php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation e/ HAMAPProfilesSIB Sequence alignment Microbial protein family classification /hamap/ ProDom Sequence clustering PRABI : Rhône-Alpes Bioinformatics Center Sequence alignment Conserved domain prediction rodom/current/html/ho me.php
Introduction to InterPro Introduction to InterPro A Closer look at InterPro
Introduction to InterPro Master headline Manual curation Integration of signaturesInterPro Foundations of InterPro
Introduction to InterPro Master headline InterPro Curation Priniciples -To represent MDBs signatures as closely as possible to what they intended -To reflect biological reality as accurately as possible in the entry we create by using types, relationships, GO mapping -To provide as much information to the end user as possible about the signature by annotating signatuires and providing links to other databases.
Introduction to InterPro Master headline InterPro Entry Groups similar signature together Adds extensive annotation Linked to other databases Structural information and viewers Links related signatures
Introduction to InterPro Master headline Link related signatures - relationships 1) Parent - Child (subgroup of more closely related proteins) PFAM (75) (100) SMART Protein kinase Serine kinase PROSITE (25) Tyrosine kinase * PFAM (100)Protein kinase * No proteins in common SMARTPROSITEPFAM Protein kinase SMARTPROSITE Serine kinaseTyrosine kinase Parent Children Applies to domains and families
Introduction to InterPro Master headline The InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Biological units with defined boundaries Short sequences typically repeated within a protein PTM Active Site Binding Site Conserved Site
Introduction to InterPro Searching InterPro protein ID Paste in unknown sequence
Introduction to InterPro InterPro Search Results Structural data Link to PDBe Unintegrated signatures Domains and sites Family
Introduction to InterPro Links to signature databases Link to InterPro entry
Introduction to InterPro Select member databases
Introduction to InterPro Caveats We need your feedback! missing/additional references reporting problems requests InterPro entries are based on signatures supplied to us by our member databases....this means no signature, no entry!
Introduction to InterPro InterPro Team: ACKNOWLEDGEMENTS Sarah Hunter Craig McAnulla Phil Jones Siew-Yit Yong Sebastien Pesseat Alex Mitchell Matthew Fraser Amaia Sangrador Maxim Scheremetje w