Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 EMBL Outstation — The European Bioinformatics Institute Added-Value Proteome Databases: SWISS-PROT, TrEMBL, InterPro.

Similar presentations


Presentation on theme: "1 EMBL Outstation — The European Bioinformatics Institute Added-Value Proteome Databases: SWISS-PROT, TrEMBL, InterPro."— Presentation transcript:

1 1 EMBL Outstation — The European Bioinformatics Institute Added-Value Proteome Databases: SWISS-PROT, TrEMBL, InterPro

2 2 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data: The Integrative Approach of SWISS-PROT + TrEMBL

3 3 EMBL Outstation — The European Bioinformatics Institute Times are changing

4 4 EMBL Outstation — The European Bioinformatics Institute ‘Data Waves’ F Biological sequences F Mutation F Metabolism F Polymorphism F Signaling F Expression F Size F Complexity F Integration

5 5 EMBL Outstation — The European Bioinformatics Institute The Challenge of the Genome Era F rapidly growing amounts of data lacking experimental determination of the biological function enhances the need for computational analyses of the data

6 6 EMBL Outstation — The European Bioinformatics Institute Need for Bioinformatics

7 7 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: 5 years ago..... F Pharmaceutical companies were not interested F Life scientists believed that it was an outlet for failed biologists who like to play with computers F Computer scientists did not even know of its existence

8 8 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: today..... F Pharmaceutical companies believe that it is a way to streamline the drug discovery process F Some life scientists believe that it is the solution to all problems in life sciences F Computer scientists find it most useful as a new way to get grants

9 9 EMBL Outstation — The European Bioinformatics Institute Bioinformatics: In 5 years..... F Pharmaceutical companies use it routinely complementary to experimental work F Life scientists use it efficiently and therefore forget that it exists F Computer scientists have jumped on another hot subject

10 10 EMBL Outstation — The European Bioinformatics Institute Bioinformatics F is a complement but no substitute of experimental research: it can help to plan experiments, but not replace experiments F is not cheap F takes a significant amount of time to be any good F Quality control is crucial: Some garbage in, a lot of garbage out!

11 11 EMBL Outstation — The European Bioinformatics Institute Materials and Methods F Materials: biological data F Methods: a wide range of computational techniques

12 12 EMBL Outstation — The European Bioinformatics Institute Essential in Bioinformatics: Databases as a tool for computational analysis and data- mining (with SWISS-PROT being the gold-standard)

13 13 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT F is a curated protein sequence data bank established in July 1986 by Amos Bairoch in Geneva and maintained collaboratively with EMBL since June 1987 F contains currently 76 000 protein sequence entries

14 14 EMBL Outstation — The European Bioinformatics Institute Essential criteria for a sequence data bank ¶ it must be complete with minimal redundancy · it must contain as much up-to-date information as possible on each sequence ¸ all the information items must be retrievable by computer programs in a consistent manner ¹ it should be integrated (cross-referenced) with other sequence related data banks

15 15 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F 76 000 SWISS-PROT entries F abstracted from > 60 000 references F linked by > 275 000 direct pointers to 30 related or specialized data collections

16 16 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F EMBL Nucleotide Sequence Database F PDB F Genomic databases (FlyBase, SubtiList, MaizeDB, EcoGene, LISTA, SGD, StyGene) F 2D-Gel databases (ECO2DBASE, SWISS- 2DPAGE, Aarhus/Ghent, YEPD, Harefield) F Specialized collections (OMIM, PROSITE, ENZYME, GCRDB, Transfac, HSSP)

17 17 EMBL Outstation — The European Bioinformatics Institute Connections between databases

18 18 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT Growth

19 19 EMBL Outstation — The European Bioinformatics Institute Nucleotide sequence database growth

20 20 EMBL Outstation — The European Bioinformatics Institute The Bottleneck: Annotation

21 21 EMBL Outstation — The European Bioinformatics Institute Annotation consists of the description of: F Function(s) of the protein F Post-translational modification(s) F Domains and sites F Secondary structure F Quaternary structure F Similarities to other proteins F Disease(s) associated with deficiencie(s) in the protein F Sequence conflicts, variants, etc.

22 22 EMBL Outstation — The European Bioinformatics Institute Annotation sources: F publications that report new sequence data F review articles to periodically update the annotation of families or groups of proteins F external experts

23 23 EMBL Outstation — The European Bioinformatics Institute TrEMBL F is a Computer-annotated supplement to SWISS-PROT F consists of entries in SWISS-PROT format F translations of CDS in the Nucleotide Sequence Database not in SWISS-PROT

24 24 EMBL Outstation — The European Bioinformatics Institute August 1998: SWISS-PROT 36 + TrEMBL 7 F 327 000 CDS in corresponding EMBL release F 74 000 SWISS-PROT entries F 109 000 CDS integrated in SWISS-PROT F the remaining 216 000 CDS were merged whenever possible to reduce redundancy

25 25 EMBL Outstation — The European Bioinformatics Institute TrEMBL release 7 F 194 000 TrEMBL entries F 54 000 000 amino acids F linked by > 300 000 direct pointers to F 14 related or specialized data collections

26 26 EMBL Outstation — The European Bioinformatics Institute The Production of TrEMBL ¶ translation and entry creation · sorting the entries ¸ post-processing the SP-TrEMBL entries

27 27 EMBL Outstation — The European Bioinformatics Institute Translation and entry creation ¶ translation of every CDS not yet cross-referenced to SWISS-PROT · parsing of information in EMBL entries into TrEMBL entries

28 28 EMBL Outstation — The European Bioinformatics Institute Sorting the entries F into SP-TrEMBL and REM-TrEMBL F SP-TrEMBL is split in taxonomic divisions

29 29 EMBL Outstation — The European Bioinformatics Institute Post-processing ¶ reducing redundancy · enhancing the information content

30 30 EMBL Outstation — The European Bioinformatics Institute Improving Automatic Annotation F will streamline flow into TrEMBL F will bring TrEMBL nearer to SWISS- PROT quality F will make the transition from TrEMBL to SWISS- PROT easier

31 31 EMBL Outstation — The European Bioinformatics Institute Demands on a system for automated data analysis and annotation F Correctness F Scalability F Updateable F Low level of redundant information F Completeness F Standardized vocabulary

32 32 EMBL Outstation — The European Bioinformatics Institute Standardized transfer of annotation from characterized proteins in SWISS-PROT to TrEMBL entries F TrEMBL entry is reliably recognized by a given method as a member of a certain group of proteins F corresponding group of proteins in SWISS-PROT shares certain annotation F common annotation is transferred to the TrEMBL entry and flagged as annotated by similarity

33 33 EMBL Outstation — The European Bioinformatics Institute Environment for Distributed Information Transfer to TrEMBL (EDITtoTrEMBL) F RuleBase F Analyzers F Dispatchers

34 34 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL

35 35 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: RuleBase F SWISS-PROT as source of annotation: correctness and controlled vocabulary F Rules can be semi-automatically and/or manually created F Rules can be updated

36 36 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Analyzers F Directly implement an algorithm or communicate with external programs F Query other databases F Use rules to add information to TrEMBL entries

37 37 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Examples of Analyzers F sequence analysis tools (PROSITE, PFAM, PRINTS, TM, Coiled Coils, Signal etc) F sequence similarity searching (FASTA, SW, BLAST) F database scanning/parsing (MGD, FlyBase, ENZYME, etc)

38 38 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Dispatchers F Control of annotation flow F Error checking F Removal of redundant information

39 39 EMBL Outstation — The European Bioinformatics Institute Automated post-processing of TrEMBL entries F redundancy removal: affects currently around 20% of the entries F improvements of annotation: affects currently around 25% of the entries

40 40 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT + TrEMBL F complete and up-to-date protein sequence collection F minimal redundancy: SP_TR_NRDB F linked by > 500 000 direct pointers to 30 related or specialized data collections F deeper integration between the EMBL Nucleotide Sequence Database and SWISS- PROT + TrEMBL by using PID numbers

41 41 EMBL Outstation — The European Bioinformatics Institute Integrated resource of Protein domain and functional sites (InterPro) F Integration of different pattern recognition methods (PROSITE, PRINTS and PFAM) F Incorporation of new families and domains into InterPro F Enhancing the functional annotation of TrEMBL entries F Enhancing genome annotation

42 42 EMBL Outstation — The European Bioinformatics Institute The InterPro project participants F Co-ordinated by EBI (R. Apweiler) F PROSITE (A. Bairoch, P. Bucher) F PRINTS (T. Attwood) F PFAM (R. Durbin, E. Birney, A. Bateman, E. Sonnhammer) F PRODOM (D. Kahn) F PRATT (I. Jonassen) F GENE-IT (J.-J. Codani) F LION bioscience AG (R. Schneider)

43 43 EMBL Outstation — The European Bioinformatics Institute 1.9.1998: SWISS-PROT ceased to be in the public domain

44 44 EMBL Outstation — The European Bioinformatics Institute What has changed F No changes for academic users F Almost no restrictions on the redistribution of SWISS-PROT by academic servers or software companies F Commercial users are required to pay yearly subscription fees. These fees will be used to complement the existing grants in order to provide stable long-term funding

45 45 EMBL Outstation — The European Bioinformatics Institute Credits SWISS-PROT at EBI F Rolf Apweiler F Sergio Contrino F Wolfgang Fleischmann F Gill Fraser F Henning Hermjakob F Viv Junker F Alexander Kanapin F Youla Karavidopoulou F Evguenia Kriventseva F Fiona Lang F Claire O'Donovan F Michele Magrane F Maria Jesus Martin F Nicoletta Mitaritonna F Steffen Moeller F Evgenui Zdobnov Collaborators F Amos Bairoch F Jean-Jacques Codani F Keith Tipton F Marvin Edelman F Compugen F Paracel F Sue Povey and Julia White F MGD F Flybase F Neil Rawlings F Network of > 200 external experts

46 46 EMBL Outstation — The European Bioinformatics Institute Take-home message: F Bioinformatics is not essential for biologists, since 2 months in the lab can easily save you an afternoon at the computer


Download ppt "1 EMBL Outstation — The European Bioinformatics Institute Added-Value Proteome Databases: SWISS-PROT, TrEMBL, InterPro."

Similar presentations


Ads by Google