Download presentation
Presentation is loading. Please wait.
Published byErnest O’Connor’ Modified over 8 years ago
1
Protein function and classification www.ebi.ac.uk/interpro Hsin-Yu Chang www.ebi.ac.uk
2
Greider and Balckburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ? 1. Tetrahymena 2. Saccharomyces cerevisiae 3. Mouse 4. Human
3
A single Tetrahymena cell has 40,000 telomeres, whereas a human cell only has 92. 1985 Discovery of telomerase Greider and Blackburn 1989 Telomere hypothesis of cell senescence Szostak 1995 Clone hTR 1995/1997 Clone hTERT 1997 Telomerase knockout mouse 1998 Ectopic expression of telomerase in normal fibroblasts and epithelial cells bypasses the Hayflick’s limit 1999/2000… Telomerase/telomere dysfunctions and cancer Gilson and Ségal-Bendirdjian, Biochimie, 2010.
4
Therefore, protein classification could help scientists to gain information about protein functions.
5
In the lab, what do we usually do to analyse protein sequences and find out their functions?
6
Protein BLAST Publications - text books or papers UniProt PDB Specialized protein databases such as SGD, the human protein atlas, etc. What I used to do:
7
BLAST it? Advantages: Relatively fast User friendly Very good at recognising similarity between closely related sequences Drawbacks: sometimes struggle with multi-domain proteins less useful for weakly- similar sequences (e.g., divergent homologues)
8
Using BLAST to find clues of protein functions -when it goes well
9
Pairwise alignment of two proteins: CD4 from two closely-related species
10
Using BLAST to find clues of protein functions -when it does not give you much information
12
Because BLAST performs local pairwise alignment, it: Cannot encode the information found in an multiple sequence alignment that show you conserved sites.
13
60S acidic ribosomal protein P0: multiple sequence alignment Using pairwise alignment could miss out on conserved residues
14
An alternative approach: protein signature search Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment Use these models to infer relationships with the characterised sequences (from which the alignment was constructed) This is the approach taken by protein signature databases
15
Three different protein signature approaches Patterns Single motif methods Fingerprints Multiple motif methods Profiles & HMMs hidden Markov models Full alignment methods
16
Patterns Sequence alignment Motif Pattern signature [AC] – x -V- x(4) - {ED} Regular expression PS00000 Pattern sequences ALVKLISG AIVHESAT CHVRDLSC CPVESTIS Patterns are usually directed against functional sequence features such as: active sites, binding sites, etc.
17
Patterns Advantages: Can anchor the match to the extremity of a sequence <M-R-[DE]-x(2,4)-[ALT]-{AM} Strict - a pattern with very little variability and forbidden residues can produce highly accurate matches Drawbacks: Simple but less flexible
18
Fingerprints: a multiple motif approach Sequence alignment Motif 2Motif 3Motif 1 Define motifs Fingerprint signature PR00000 Motif sequences xxxxxx Weight matrices
19
The significance of motif context order interval Identify small conserved regions in proteins Several motifs characterise family Offer improved diagnostic reliability over single motifs by virtue of the biological context provided by motif neighbours 1 2 3
20
Good at modeling the often small differences between closely related proteins Distinguish individual subfamilies within protein families, allowing functional characterisation of sequences at a high level of specificity Fingerprints
21
Sequence alignment Entire domain Define coverage Whole protein Use entire alignment of domain or protein family xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx Build model Profile or HMM signature Profiles & HMMs
22
Profiles Start with a multiple sequence alignment Amino acids at each position in the alignment are scored according to the frequency with which they occur Scores are weighted according to evolutionary distance using a BLOSUM matrix Good at identifying homologues
23
HMMs Amino acid frequency at each position in the alignment and their transition probabilities are encoded Insertions and deletions are also modelled Start with a multiple sequence alignment Very good at identifying evolutionarily distant homologues Can model very divergent regions of alignment
24
Three different protein signature approaches Patterns Single motif methods Fingerprints Multiple motif methods Profiles & HMMs hidden Markov models Full alignment methods
25
www.ebi.ac.uk/interpro
26
InterPro The aim of InterPro
27
What is InterPro? InterPro is an integrated sequence analysis resource It combines predictive models (known as signatures) from different databases to provide functional analysis of protein sequences by classifying them into families and predicting domains and important sites
28
First release in 1999 11 partner databases Forms part of the automated system that adds annotation to UniProtKB/TrEMBL Provides matches to over 80% of UniProtKB Source of >60 million Gene Ontology (GO) mappings to >17 million distinct UniProtKB sequences 50,000 unique visitors to the web site per month> 2 million sequences searched online per month. Plus offline searches with downloadable version of software Facts about InterPro
29
Structural domains Functional annotation of families/domains Protein features (sites) Hidden Markov Models Finger prints Profiles Patterns HAMAP
30
Signatures are provided by member databases They are scanned against the UniProt database to see which sequences they match Curators manually inspect the matches before integrating the signatures into InterPro InterPro signature integration process Signatures representing the same entity are integrated together Relationships between entries are traced, where possible Curators add literature referenced abstracts, cross-refs to other databases, and GO terms
31
http://www.ebi.ac.uk/interpro/
32
Search using protein sequences
33
Family
34
Type
35
InterPro entry types Proteins share a common evolutionary origin, as reflected in their related functions, sequences or structure Family Distinct functional, structural or sequence units that may exist in a variety of biological contexts Domain Short sequences typically repeated within a protein Repeats PTM Active Site Binding Site Conserved Site Sites
36
Type Name Identifier Contributing signatures Description GO terms References
41
Type Name Identifier Contributing signatures Description References Relationships
42
InterPro family and domain relationships
43
Family relationships in InterPro: Interleukin-15/Interleukin-21 family Interleukin-15 avian Interleukin-15 fish Interleukin-15 mammal
44
Relationships
45
InterPro relationships: domains Protein kinase-like domain Protein kinase catalytic domain Serine/threonine kinase catalytic domain Tyrosine kinase catalytic domain
46
A brief diversion into the Gene Ontology...
48
Gene Ontology Allow cross-species and/or cross-database comparisons Unify the representation of gene and gene product attributes across species
49
A way to capture biological knowledge in a written and computable form The Gene Ontology A set of concepts and their relationships to each other arranged as a hierarchy www.ebi.ac.uk/QuickGO Less specific concepts More specific concepts
50
The Concepts in GO 1. Molecular Function 2. Biological Process 3. Cellular Component protein kinase activity insulin receptor activity Cell cycle Microtubule cytoskeleton organisation
51
GO:0006955 Immune response GO:0016020 membrane
52
Summary Its member databases all have their particular niche or focus......but InterPro offers a combination of all their areas of expertise! InterPro is a sequence analysis resource that classifies sequences into protein families and predicts important domains and sites It uses protein signatures based on different methodologies from different member databases
53
Why use InterPro? Large amounts of manually curated data 35,634 signatures integrated into 25,214 entries Cites 38,877 PubMed publications Large coverage of protein sequence space Regularly updated ~ 8 week release schedule New signatures added Scanned against latest version of UniProtKB
54
Caution We need your feedback! missing/additional references reporting problems requests InterPro is a predictive protein signature database - results are predictions, and should be treated as such InterPro entries are based on signatures supplied to us by our member databases....this means no signature, no entry! EBI support pageEBI support page. And one more thing…..
55
The InterPro Team: Amaia Sangrador Craig McAnulla Matthew Fraser Maxim Scheremetjew Siew-Yit Yong Alex Mitchell Sebastien Pesseat Sarah Hunter Gift Nuka Hsin-Yu Chang Louise Daugherty
56
DatabaseBasisInstitution Built from FocusURL PfamHMMSanger Institute Sequence alignment Family & Domain based on conserved sequence http://pfam.sanger.ac.uk/ Gene3DHMMUCL Structure alignment Structural Domain http://gene3d.biochem.ucl.a c.uk/Gene3D/ SuperfamilyHMMUni. of Bristol Structure alignment Evolutionary domain relationships http://supfam.cs.bris.ac.uk/ SUPERFAMILY/ SMARTHMMEMBL Heidelberg Sequence alignment Functional domain annotation http://smart.embl- heidelberg.de/ TIGRFAMHMMJ. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification http://www.jcvi.org/cms/rese arch/projects/tigrfams/overv iew/ PantherHMMUni. S. California Sequence alignment Family functional classification http://www.pantherdb.org/ PIRSFHMM PIR, Georgetown, Washington D.C. Sequence alignment Functional classification http://pir.georgetown.edu/pir www/dbinfo/pirsf.shtml PRINTS Fingerprints Uni. of Manchester Sequence alignment Family functional classification http://www.bioinf.mancheste r.ac.uk/dbbrowser/PRINTS/i ndex.php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation http://expasy.org/prosite/ HAMAPProfilesSIB Sequence alignment Microbial protein family classification http://expasy.org/sprot/ham ap/ ProDom Sequence clustering PRABI : Rhône-Alpes Bioinformatics Center Sequence alignment Conserved domain prediction http://prodom.prabi.fr/prodo m/current/html/home.php
57
Thank you! www.ebi.ac.uk Twitter: @emblebi Facebook: EMBLEBI YouTube: EMBLMedia
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.