Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding proteins: resources for identification and annotation.

Similar presentations


Presentation on theme: "Understanding proteins: resources for identification and annotation."— Presentation transcript:

1 Understanding proteins: resources for identification and annotation

2 The Gene Ontology: Annotating protein function, role and localization Contact: Jane Lomax Coordinator, GO Editorial Office EBI-EMBL jane@ebi.ac.uk

3 What is an ontology?

4 →Collectibles & art →Stamps →UK (Great Britain)Victoria →1884 GREAT BRITAIN 10S SCOTT (11,999.99$) A definition... “A controlled representation of ideas, concepts or events in a given domain and the relationships between them.”

5 Why do we need ontologies? Help with data retrieval allow grouping of annotations brain20 hindbrain15 rhombomere10 Adapted from Barry Smith: http://ontology.buffalo.edu/smith/BioOntology_Course.html Query ‘brain’ without ontology20 Query ‘brain’ with ontology45 Make data (re-)usable through standards Common structure and terminology (controlled vocabulary) Avoid redundancies (single data source) Allow common tools, techniques, training, validation...

6 Gene ontology What is the gene ontology? Organized, controlled vocabulary of terms that describe gene products characteristics. http://geneontology.org/ Represents gene product properties, not gene products themselves Three branches (domains):  Cellular component  Molecular function  Biological process Species-independent (with taxonomic restrictions) Represents physiological processes Goes up to the level of the cell

7 The Gene Ontology is like a dictionary term: transcription initiation definition: Processes involved in the assembly of the RNA polymerase complex at the promoter region of a DNA template resulting in the subsequent synthesis of RNA from that promoter. id: GO:0006352 How does GO work?

8 Clark et al., 2005 part_of is_a GO tree and annotations

9 GO terms for Caspase 9 An annotation example…

10 attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Which processes are up- or down- regulated? Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

11 QuickGO: browsing GO Term definition http://www.ebi.ac.uk/QuickGO/

12 QuickGO: browsing GO Term relationships (ancestors)

13 QuickGO: browsing GO Term relationships (children)

14 QuickGO: browsing GO Proteins annotated to term

15 Annotation and ontology files www.geneontology.org/GO.downloads.shtml Ontology files: Hold ontology terms and structure Species-independent You can get GO-slims Annotation files: Hold list of terms and the proteins annotated with them You can get species- specific files or the whole annotation.

16 More about GO: EBI train online www.ebi.ac.uk/training/online/course/go-quick-tour www.ebi.ac.uk/training/online/course/uniprot-goa-quick-tour

17 Acknowledgements & questions Jane Lomax Coordinator, GO Editorial Office EBI-EMBL jane@ebi.ac.uk

18 UniProt: A repository of annotated protein sequences Contact: Duncan Legge UniProt Content Team EBI-EMBL help@uniprot.org dlegge@ebi.ac.uk

19 Background of UniProt Since 2002 a merger and collaboration of three databases: Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database Swiss-Prot & TrEMBLPIR-PSD

20 We Aim To Provide… o A high quality protein sequence database A non redundant protein database, with maximal coverage including splice isoforms, disease variant and PTMs. Sequence archiving essential. o Easy protein identification Stable identifiers and consistent nomenclature / controlled vocabularies o Thorough protein annotation Detailed information on protein function, biological processes, molecular interactions and pathways cross-referenced to external source

21 The Two Sides of UniProtKB Non-redundant, high-quality manual annotation - reviewed Redundant, automatically annotated - unreviewed UniProtKB/TrEMBL 1 entry per nucleotide submission UniProtKB/Swiss-Prot 1 entry per protein

22 UniProtKB/Swiss-Prot Manually annotated UniProtKB/TrEMBL Computationally annotated

23 Data sources of UniProtKB UniProt/TrEMBL VEGA (Sanger) WormBase FlyBase Sub/ Peptide Data PDB Patent Data Ensembl ENA (EMBL) DNA database mRNA Data

24 Curation of a UniProt/SwissProt entry Sequence Sequence variants Nomenclature Sequence features UniProt/TrEMBL UniProt/SwissProt Ontologies Literature Annotations References

25 UniProt Website www.uniprot.org

26 UniProt layout

27

28 Annotation comments FUNCTION SUBCELLULAR LOCATION ALTERNATIVE PRODUCTS TISSUE SPECIFICITY DEVELOPMENTAL STAGE INDUCTION SIMILARITY CATALYTIC ACTIVITY COFACTOR ENZYME REGULATION BIOPHYSICOCHEMICAL- PROPERTIES PATHWAY SUBUNIT INTERACTION PTM RNA EDITING MASS SPECTROMETRY DOMAIN POLYMORPHISM DISRUPTION PHENOTYPE ALLERGEN DISEASE TOXIC DOSE BIOTECHNOLOGY PHARMACEUTICAL MISCELLANEOUS CAUTION SEQUENCE CAUTION WEB RESOURCE

29 Controlled vocabularies used whenever possible Evidence tags to show source

30 Master headline

31 Proteomes in UniProt Complete proteomes Complete sets of proteins thought to be expressed by organisms whose genomes have been completely sequenced. Reference proteomes Some complete proteomes have been selected as reference proteome sets. These cover the proteomes of well- studied model organisms and other proteomes of interest for biomedical research.

32 Obtaining Proteomes

33 Help / Feedback Stuck? Just ask – active help and support team Feedback – if you find something incorrect, outdated, missing etc please tell us. help@uniprot.org

34 www.ebi.ac.uk/training/online/course/uniprot-quick-tour/ Find out more: EBI online courses

35 Acknowledgements & questions Duncan Legge UniProt Content Team EBI-EMBL dlegge@ebi.ac.uk

36 InterPro: An integrated protein sequence analysis resource Contact: Amaia Sangrador InterPro curation Team EBI-EMBL interhelp@ebi.ac.uk amaia@ebi.ac.uk

37 What is InterPro? InterPro is a sequence analysis resource that classifies sequences into protein families and predicts important domains and sites It combines predictive models (known as signatures) from different databases to provide functional analysis of protein sequences by classifying them into families and predicting domains and important sites

38 The aim of InterPro InterPro

39 Protein annotation: a predictive approach This is the approach taken by protein signature databases Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment We can use these models to infer relationships with the characterised sequences from which the alignment was constructed

40 Full alignment methods Single motif methods Patterns Multiple motif methods Fingerprints Three (4) different protein signature approaches Profiles & Hidden Markov models (HMMs)

41 Structural domains Functional annotation of families/domains Protein features (sites) Hidden Markov Models Finger prints Profiles Patterns HAMAP InterPro Consortium

42 DatabaseBasisInstitution Built from FocusURL PfamHMMSanger Institute Sequence alignment Family & Domain based on conserved sequence http://pfam.sanger.ac.uk/ Gene3DHMMUCL Structure alignment Structural Domain http://gene3d.biochem.ucl.a c.uk/Gene3D/ SuperfamilyHMMUni. of Bristol Structure alignment Evolutionary domain relationships http://supfam.cs.bris.ac.uk/ SUPERFAMILY/ SMARTHMMEMBL Heidelberg Sequence alignment Functional domain annotation http://smart.embl- heidelberg.de/ TIGRFAMHMMJ. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification http://www.jcvi.org/cms/rese arch/projects/tigrfams/overv iew/ PantherHMMUni. S. California Sequence alignment Family functional classification http://www.pantherdb.org/ PIRSFHMM PIR, Georgetown, Washington D.C. Sequence alignment Functional classification http://pir.georgetown.edu/pir www/dbinfo/pirsf.shtml PRINTS Fingerprints Uni. of Manchester Sequence alignment Family functional classification http://www.bioinf.mancheste r.ac.uk/dbbrowser/PRINTS/i ndex.php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation http://expasy.org/prosite/ HAMAPProfilesSIB Sequence alignment Microbial protein family classification http://expasy.org/sprot/ham ap/ ProDom Sequence clustering PRABI : Rhône-Alpes Bioinformatics Center Sequence alignment Conserved domain prediction http://prodom.prabi.fr/prodo m/current/html/home.php

43 Signatures are provided by member databases They are scanned against the UniProt database to see which sequences they match Curators manually inspect the matches before integrating the signatures into InterPro InterPro signature integration process  Signatures representing the same entity are integrated together  Relationships between entries are traced, where possible  Curators add literature referenced abstracts, cross-refs to other databases, and GO terms

44 http://www.ebi.ac.uk/interpro/

45 Search using the key word: CD4 Let’s find some information about T-cell surface antigen CD4 in InterPro Using InterPro

46 Results from the “CD4” key word search

47 Type Name Identifier Contributing signatures Description Go terms References Family-centered view

48 Search using human CD4 protein sequence Using InterPro

49 Type Name Identifier Domains Family Protein-centered view

50 Type Name Identifier Contributing signatures Description References Domain-centered view

51 Using InterPro with unknown sequences: InterProScan Search with unknown protein sequence InterProScan is the software package that allows sequences to be scanned against InterPro's signatures

52 InterPro entries and contributing signatures Unintegrated signatures (not reviewed)

53 InterPro usage within the EBI Used by UniProtKB curators in their annotation of Swiss-Prot proteins Forms part of the automated system that adds annotation to UniProtKB/TrEMBL Provides matches to over 80% of UniProtKB Source of >60 million Gene Ontology (GO) mappings to >17 million distinct UniProtKB sequences outside the EBI 50,000 unique visitors to the web site per month > 2 million sequences searched online per month Plus offline searches with downloadable version

54 Probabilistic models != biological certainty We are using biologically-unaware search tools and probabilistic models Ask questions, weigh the evidence Remember!

55 Caveats We need your feedback! missing/additional references reporting problems requests Sheer amount of data can be overwhelming Member databases do not always agree! InterPro entries are based on signatures supplied to us by our member databases....this means no signature, no entry! interhelp@ebi.ac.uk

56 www.ebi.ac.uk/training/online/course-list/introduction-protein-classification-ebi www.ebi.ac.uk/training/online/course/interpro-quick-tour www.ebi.ac.uk/training/online/course/interpro-functional-and-structural-analysis-protei Find out more: EBI online courses

57 Acknowledgements & questions Amaia Sangrador InterPro curation team EBI-EMBL amaia@ebi.ac.uk

58 PDBe: Protein Data Bank in Europe Contact: Gary Battle Project Leader Outreach PDBe battle@ebi.ac.uk http://www.facebook.com/proteindatabank http://twitter.com/PDBeurope

59 PDBe overview Mission: Bringing Structure to Biology Major activities: Deposition and annotation site for structural data on biomacromolecules (X-ray, NMR, EM) Integration of macromolecular structure data with important biological and chemical data resources Provide tools and services for accessing, exploiting and disseminating structural data to the wider biomedical community

60 Worldwide Protein Data Bank (wwPDB)

61

62

63 PDBeXplore Browse the PDB using familiar classification systems (enzymes, folds, families, compounds, taxonomy, sequence). Latest structures: pdbe.org/pdbexplore

64 PDBePISA Exploration of macromolecular (protein, DNA/RNA and ligand) interfaces and prediction of probable quaternary structures. Predict quaternary structure: pdbe.org/pisa

65 PDBeFold Interactive comparison, alignment and superposition based on protein secondary structure. Find similar structures: pdbe.org/fold

66 PDBeMotif Flexible 3D search and analysis of protein-ligand interactions, binding environments and structural motifs. Analyse binding sites and motifs: pdbe.org/motif

67 NMR resources and services Visualisation and validation of NMR models and data. NMR resources: pdbe.org/nmr

68 EM resources and services Comprehensive search and analysis tools for EMDB entries. EM resources: pdbe.org/em

69 Electron Microscopy Data Bank (EMDB) Global public repository for EM density maps of macromolecular complexes and subcellular structures Founded at EBI in 2002 Jointly operated by PDBe, RCSB and NCMI PDBe EM portal provides advanced search, visualisation and analysis services. http://pdbe.org/emdb

70 Educational resources: Quips Interactive exploration of interesting structures from the PDB Quite interesting PDB structures: pdbe.org/quips

71 Stay informed… http://www.facebook.com/proteindatabank http://twitter.com/PDBeurope

72 www.ebi.ac.uk/training/online/course/pdbe-quick-tour/ Find out more: EBI online courses

73 Acknowledgements & questions Gary Battle EBI-EMBL battle@ebi.ac.uk


Download ppt "Understanding proteins: resources for identification and annotation."

Similar presentations


Ads by Google