Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,

Similar presentations


Presentation on theme: "Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,"— Presentation transcript:

1 Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard, S. M. Embury, N. W. Paton

2 Overview The ISPIDER project Data Access & Integration of Proteomics Resources Challenges Middleware Proteomics resources & global schema System architecture & query processing Future Work

3 ISPIDER Project Goals: Build an integrated platform of proteomic resources Use existing resources – produce new ones Create clients for querying, visualisation, etc.

4 ISPIDER Objective: develop an integrated platform of proteome-related resources, using existing standards Benefits: Access to increased breadth of information More reliable analyses Integration brings added value

5 Challenges Proteomics repositories in disparate locations  need for distributed solution: common access, distributed query processing  need for integration: overlapping data, different representations Data/schemas constantly updated/evolve  need virtual or hybrid integration  need schema evolution support

6 Middleware (1/2) OGSA-DAI: middleware exposing data sources on Grids via web services open-source and extensible uniform access to relational & XML data sources supports a variety of operations, e.g. querying/updating, data transformation, data delivery OGSA-DQP: service-based distributed query processor supports querying of relational OGSA-DAI data sources offers implicit parallelism for data-intensive requests

7 Middleware (2/2) AutoMed: heterogeneous data transformation and integration system subsumes traditional data integration approaches handles various data models – easily extensible virtual/materialised/hybrid integration schema evolution data warehousing tools

8 Data Integration Approaches Global-As-View (GAV) approach: describe GS constructs with view definitions over LS i constructs Local-As-View (LAV) approach: describe LS i constructs with view definitions over GS constructs

9 Both-As-View (BAV) Approach Schema transformation approach For each pair (LS i,GS): incrementally modify LS i /GS to match GS/LS i

10 BAV Example Transformation pathway consists of primitive transformations Pathway contains both GAV & LAV definitions Transformations are automatically reversible Metadata in AutoMed Repository

11 Proteomics Resources PEDRo collection of descriptions of experimental data sets in proteomics has been used as a format for exchanging proteomics data gpmDB contains a large number of proteins and peptide identifications initially designed to assist in the validation of peptide MS/MS spectra and protein coverage patterns PepSeeker developed as part of the ISPIDER project comprehensive resource of peptide/protein identifications PRIDE centralised, standards compliant, public proteomics repository contains protein/peptide identifications + evidence supporting them

12 Global Schema Trade-off between: being able to answer specific user queries a full integration Properties: Based on PEDRo’s peptide/ protein identification section and … expanded with information unique in other resources Entities identified by LSIDs

13 System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

14 System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

15 System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

16 System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

17 System Architecture Sources wrapped with OGSA-DAI AutoMed toolkit wraps OGSA-DAI resources Integration of OGSA- DAI resources Queries submitted to AutoMed QP are evaluated with the help of OGSA-DQP

18 Query Processing Query is submitted to AutoMed’s GQP: Reformulated Optimised AutoMed-DQP Wrapper: IQL  OQL OGSA-DQP evaluates OQL queries OQL result  IQL result

19 Query Processing Query is submitted to AutoMed’s GQP: Reformulated Optimised AutoMed-DQP Wrapper: IQL  OQL OGSA-DQP evaluates OQL queries OQL result  IQL result

20 Summary Proteomics repositories in disparate locations  need for distributed solution  need for integration Data/schemas constantly updated/evolve  need virtual or hybrid integration  support schema evolution

21 Future Work Schema evolution Evaluation of AutoMed advantage Expose AutoMed functionality to the Grid AutoMed and Taverna integration

22 Future Work Taverna: tool for Web Service orchestration in workflows Related services may be incompatible Current solution involves writing custom code for every pair of WS Use AutoMed toolkit for semi- automatic integration of XML Web Services mappings from WS to ontologies automatic integration

23 ISPIDER Project Members Birkbeck College Nigel Martin Alex Poulovassilis Lucas Zamboulis (R.A.) Hao Fan (former R.A.) European Bioinformatics Institute Rolf Apweiler Henning Hermjakob Weimin Zhu Chris Taylor Phil Jones Nisha Vinod University of Manchester Simon Hubbard Steve Oliver Suzanne Embury Norman Paton Carol Goble Robert Stevens Khalid Belhajjame (R.A.) Jennifer Siepen (R.A.) U.C.L. David Jones Christine Orengo Melissa Pentony (R.A.)


Download ppt "Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,"

Similar presentations


Ads by Google