Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,

Similar presentations


Presentation on theme: "Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,"— Presentation transcript:

1 Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1, D. Jones 3, P. Jones 4, N. Martin 2, S. Oliver 1, C. Orengo 3, N.W. Paton 1, M. Pentony 3, A. Poulovassilis 2, J. Siepen, R.D. Stevens 1, C. Taylor 4, L. Zamboulis 2, and W. Zhu 4 1 University of Manchester 2 Birkbeck College 3 University College London 4 European Bioinformatics Institute

2 All Hands Meetings, 2005 2 Outline Experimental proteomics ISPIDER architecture Example use cases Conclusion

3 All Hands Meetings, 2005 3 Separation Protein digestion Mass Spectrometry Experimental proteomics An essential component for elucidation of the biological functions of proteins The study of the set of proteins produced by an organism with the aim of understanding their behaviour under varying conditions Protein DB 2D gel electrophoresis Maldi TOF Enzymatic digestion Identification Protein ID

4 All Hands Meetings, 2005 4 Experimental proteomics Development of new technologies for: –protein separation (2D-SDS-PAGE, HPLC, Capillary Electrophoresis) –mass spectrometry (Multi-Dimensional protein identification) Availability of publicly accessible protein sequence databases Proteomics databases (PedroDB, gpmDB, PepSeeker, Pride, …) Building experiments involving analysis services orchestration and data processing and integration

5 All Hands Meetings, 2005 5 Objectives of ISPIDER A Grid dedicated to the creation of bioinformatics experiments for proteomics Develop, or make, existing Proteome databases and Grid-enabled services Develop Middleware support for developing and executing new proteome analyses, based on distributed query processing and workflow technologies Undertake proteomic studies that demonstrate the effectiveness of the resulting infrastructure

6 All Hands Meetings, 2005 6 Outline Experimental proteomics ISPIDER architecture Example use cases Conclusion and future directions

7 All Hands Meetings, 2005 7 ISPIDER ExistingE-ScienceInfrastructure ISPIDER Proteomics Grid Infrastructure ISPIDER Proteomics Clients PublicProteomicsResources Proteome Request Handler Instance Ident/Mapping Services Proteomic Ontologies/ Vocabularies Source Selection Services Data Cleaning Services my Grid Ontology Services my Grid DQP AutoMed my Grid Workflows KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package Vanilla Query Client 2D Gel Visualisation Client + Aspergil. Extensions + Phosph. Extensions PPI Validation + Analysis Client Protein ID Client Web services Existing Resources PS WS PF WS TR WS GS WS FA WS PPI WS PID WS PRIDE WS PEDRo WS ISPIDER Resources Phos WS

8 All Hands Meetings, 2005 8 Outline Experimental proteomics ISPIDER architecture Example use cases Conclusion and future directions

9 All Hands Meetings, 2005 9 Motivation Protein identification experiments are usually used as input into further analysis processes. – Gathering evidence for a biological hypothesis – Suggesting new hypothesesObjective Augment the identification results with additional information on the identified proteinImplementation Taverna workflow system Value-added protein datasets

10 All Hands Meetings, 2005 10 Value-added protein datasets PepMapper Web Service GO Services Auxiliary Services

11 All Hands Meetings, 2005 11 Genome-focused protein identification Motivation Currently, protein identification searches performed over large data sets. This means fewer false negatives, but false positives are also more likely.Objective More focused and thus more efficient protein identificationImplementation Taverna workflow system DQP, a service-based query processor

12 All Hands Meetings, 2005 12 Genome-focused protein identification DQP Web Service IPI PepMapper web service GOA Web Service select p.Name, p.Seq from p in db_proteinSequences where p.OS='HomoSapiens';

13 All Hands Meetings, 2005 13 Integrated access to proteome databases Motivation Ability to analyse existing proteomics results en masse is limited, because of the heterogeneities between the schemas of the different databasesObjective Providing integrated access to proteome databases through a common schemaImplementation AutoMed, a framework for mapping heterogeneous schemata DQP, a service-based query processor

14 All Hands Meetings, 2005 14 Integrated access to proteome databases Automed Wrappers PRIDEPedroDBgpmDB Automed Repository OGSA-DAI Activity OGSA-DAI Activity OGSA-DAI Activity OGSA Distributed Query Processor Automed Query Processor Automed DQP Wrapper User query Result OQL query OQL result

15 All Hands Meetings, 2005 15 Conclusions + Available e-science technologies provide rapid prototyping facilities for bioinformatics analyses + Combining such technologies is possible and opens up more possibilities  Taverna + DQP  Automed + DQP - Writing custom code is usually required –Processing service output to extract inputs for following services –Transforming results between data formats –Dealing with mismatches between identifiers Developing a user-guided environment for the detection and resolution of mismatches Development of Proteomics client applications (PepMapper, PepSeeker and PRIDE)


Download ppt "Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,"

Similar presentations


Ads by Google