Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data.

Similar presentations


Presentation on theme: "1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data."— Presentation transcript:

1 1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data

2 2 EMBL Outstation — The European Bioinformatics Institute The Challenge F rapidly growing amounts of data lacking experimental determination of the biological function enhances the need for computational analyses of the data

3 3 EMBL Outstation — The European Bioinformatics Institute Databases are essential tools in Bioinformatics for computational analysis and data-mining (with SWISS-PROT being the gold-standard)

4 4 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT F is a curated protein sequence data bank established in 1986 by Amos Bairoch in Geneva and maintained collaboratively with EMBL since 1987 F contains currently 75 000 protein sequence entries

5 5 EMBL Outstation — The European Bioinformatics Institute Essential criteria for a sequence data bank ¶ it must be complete with minimal redundancy · it must contain as much up-to-date information as possible on each sequence ¸ all the information items must be retrievable by computer programs in a consistent manner ¹ it should be integrated (cross-referenced) with other sequence related data banks

6 6 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F 75 000 SWISS-PROT entries F abstracted from > 60 000 references F linked by > 180 000 direct pointers to F 28 related or specialized data collections

7 7 EMBL Outstation — The European Bioinformatics Institute Integration with other databases F EMBL Nucleotide Sequence Database F PDB F Genomic databases (FlyBase, SubtiList, MaizeDB, EcoGene, LISTA, SGD, StyGene) F 2D-Gel databases (ECO2DBASE, SWISS- 2DPAGE, Aarhus/Ghent, YEPD, Harefield) F Specialized collections (OMIM, PROSITE, ENZYME, GCRDB, Transfac, HSSP)

8 8 EMBL Outstation — The European Bioinformatics Institute Connections between databases

9 9 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT Growth

10 10 EMBL Outstation — The European Bioinformatics Institute Nucleotide sequence database growth

11 11 EMBL Outstation — The European Bioinformatics Institute The Bottleneck: Annotation

12 12 EMBL Outstation — The European Bioinformatics Institute Annotation consists of the description of: F Function(s) of the protein F Post-translational modification(s) F Domains and sites F Secondary structure F Quaternary structure F Similarities to other proteins F Disease(s) associated with deficiencie(s) in the protein F Sequence conflicts, variants, etc.

13 13 EMBL Outstation — The European Bioinformatics Institute Annotation sources: F publications that report new sequence data F review articles to periodically update the annotation of families or groups of proteins F external experts

14 14 EMBL Outstation — The European Bioinformatics Institute TrEMBL F is a Computer-annotated supplement to SWISS-PROT F consists of entries in SWISS-PROT format F translations of CDS in the Nucleotide Sequence Database not in SWISS-PROT F the translation tools used are based on the program trembl written by Thure Etzold at the EMBL in Heidelberg

15 15 EMBL Outstation — The European Bioinformatics Institute August 1998: SWISS-PROT 36 + TrEMBL 7 F 327 000 CDS in corresponding EMBL release F 74 000 SWISS-PROT entries F 109 000 CDS integrated in SWISS-PROT F the remaining 216 000 CDS were merged whenever possible to reduce redundancy

16 16 EMBL Outstation — The European Bioinformatics Institute TrEMBL release 7 F 194 000 TrEMBL entries F 54 000 000 amino acids F linked by > 300 000 direct pointers to F 14 related or specialized data collections

17 17 EMBL Outstation — The European Bioinformatics Institute The Production of TrEMBL ¶ translation and entry creation · sorting the entries ¸ post-processing the SP-TrEMBL entries

18 18 EMBL Outstation — The European Bioinformatics Institute Translation and entry creation ¶ translation of every CDS not yet cross-referenced to SWISS-PROT · parsing of information in EMBL entries into TrEMBL entries

19 19 EMBL Outstation — The European Bioinformatics Institute Sorting the entries F into SP-TrEMBL and REM-TrEMBL F SP-TrEMBL is split in taxonomic divisions

20 20 EMBL Outstation — The European Bioinformatics Institute Post-processing ¶ reducing redundancy · enhancing the information content

21 21 EMBL Outstation — The European Bioinformatics Institute Improving Automatic Annotation F will streamline flow into TrEMBL F will bring TrEMBL nearer to SWISS- PROT quality F will make the transition from TrEMBL to SWISS- PROT easier

22 22 EMBL Outstation — The European Bioinformatics Institute Demands on a system for automated data analysis and annotation F Correctness F Scalability F Updateable F Low level of redundant information F Completeness F Standardized vocabulary

23 23 EMBL Outstation — The European Bioinformatics Institute Components of a system for automated data analysis and annotation F sequence analysis tools (PROSITE, TM, Coiled Coils, Signal etc) F sequence similarity searching (FASTA, SW, BLAST) F database scanning/parsing (MGD, Flybase, ENZYME, etc) F information transfer decided by rule-based system

24 24 EMBL Outstation — The European Bioinformatics Institute Environment for Distributed Information Transfer to TrEMBL (EDITtoTrEMBL) F RuleBase F Analyzers F Dispatchers

25 25 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL

26 26 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: RuleBase F SWISS-PROT as source of annotation: correctness and controlled vocabulary F Rules can be automatically and/or manually created F Rules can be updated

27 27 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Analyzers F Directly implement an algorithm or communicate with external programs F Query other databases F Use rules to add information to TrEMBL entries

28 28 EMBL Outstation — The European Bioinformatics Institute EDITtoTrEMBL: Dispatchers F Control of annotation flow F Error checking F Removal of redundant information

29 29 EMBL Outstation — The European Bioinformatics Institute Standardized transfer of annotation from characterized proteins in SWISS-PROT to TrEMBL entries F TrEMBL entry is reliably recognized by a given method as a member of a certain group of proteins F corresponding group of proteins in SWISS-PROT shares certain annotation F common annotation is transferred to the TrEMBL entry and flagged as annotated by similarity

30 30 EMBL Outstation — The European Bioinformatics Institute Automated post-processing of TrEMBL entries F redundancy removal: affects currently >10% of the entries F improvements of annotation: affects currently >20% of the entries

31 31 EMBL Outstation — The European Bioinformatics Institute Integrated resource of Protein domain and functional sites (InterPro) F Integration of different pattern recognition methods (PROSITE, PRINTS and PFAM) F Incorporation of new families and domains into InterPro F Enhancing the functional annotation of TREMBL entries F Enhancing genome annotation

32 32 EMBL Outstation — The European Bioinformatics Institute The InterPro project participants F Co-ordinated by EBI (R. Apweiler) F PROSITE (A. Bairoch, P. Bucher) F PRINTS (T. Attwood) F PFAM (R. Durbin, E. Birney, A. Bateman, E. Sonnhammer) F PRODOM (D. Kahn) F PRATT (I. Jonassen) F GENE-IT F LION bioscience AG

33 33 EMBL Outstation — The European Bioinformatics Institute SWISS-PROT + TrEMBL F complete and up-to-date protein sequence collection F minimal redundancy: SP_TR_NRDB F linked by > 380 000 direct pointers to F 28 related or specialized data collections F deeper integration between the EMBL Nucleotide Sequence Database and SWISS- PROT + TrEMBL by using PID numbers

34 34 EMBL Outstation — The European Bioinformatics Institute Credits SWISS-PROT at EBI F Rolf Apweiler F Sergio Contrino F Christian Desaintes F Wolfgang Fleischmann F Henning Hermjakob F Viv Junker F Fiona Lang F Claire O'Donovan F Michele Magrane F Maria Jesus Martin F Nicoletta Mitaritonna F Steffen Moeller F Stephanie Kappus F Sheila Rose Collaborators F Amos Bairoch F Jean-Jacques Codani F Keith Tipton F Marvin Edelman F Compugen F Sue Povey and Julia White F MGD F Flybase F Neil Rawlings F Network of > 200 external experts


Download ppt "1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data."

Similar presentations


Ads by Google