Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,

Slides:



Advertisements
Similar presentations
Integration of Heterogeneous Informations Sources for Proteomics and Transcriptomics Steffen Möller University of Rostock Proteome Center.
Advertisements

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
December 2009 Data Integration in Grid Environments Alex Poulovassilis, Birkbeck, U. of London.
Architectural Constraints on Current Bioinformatics Integration Systems Norman Paton Department of Computer Science University of Manchester Manchester,
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Intersection Schemas as a Dataspace Integration Technique 8/21/20141 Richard BrownlowAlex Poulovassilis.
From Genome to Proteome Juang RH (2004) BCbasics Systems Biology, Integrated Biology.
Data Access & Integration in the ISPIDER Proteomics Grid N. Martin – A. Poulovassilis – L. Zamboulis
IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands.
Data Access & Integration in the ISPIDER Proteomics Grid L. Zamboulis, H. Fan, K. Bellhajjame, J. Siepen, A. Jones, N. Martin, A. Poulovassilis, S. Hubbard,
Slides thanks to Steve Lynden Amy Krause EPCC Distributed Query Processing with OGSA-DQP Principles and Architectures for Structured Data Integration:
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Application architectures
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Automatic Data Ramon Lawrence University of Manitoba
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Application architectures
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Towards the Management of Information Quality in Proteomics David Stead University of Aberdeen.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
IPAW'08 – Salt Lake City, Utah, June 2008 Exploiting provenance to make sense of automated decisions in scientific workflows Paolo Missier, Suzanne Embury,
Copyright © 2009 Pearson Education, Inc. Art and Photos in PowerPoint ® Concepts of Genetics Ninth Edition Klug, Cummings, Spencer, Palladino Chapter 21.
The ACGT Workflow Editing & Enactment Environment Giorgos Zacharioudakis Institute of Computer Science, Foundation for Research & Technology – Hellas (ICS-FORTH)
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Copyright © 2009 Pearson Education, Inc. Genomics, Bioinformatics, and Proteomics Chapter 21 Lecture Concepts of Genetics Tenth Edition.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
Data access and integration with OGSA-DAI: OGSA-DQP Steven Lynden University of Manchester.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE User Forum, Manchester, 10 May ‘07 Nicola Venuti
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
RealityGrid Peter Coveney 1 and John Brooke 2 1. Centre for Computational Science, Department of Chemistry, Queen Mary, University of London 2. Manchester.
Data Integration in Bioinformatics Using OGSA-DAI The BioDA Project Shirley Crompton, Brian Matthews (CCLRC) Alex Gray, Andrew Jones, Richard White (Cardiff.
EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob Henning Hermjakob
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
OGSA-DAI 简介及其它在 China-VO DAS 系统中的应用 杨阳 中国虚拟天文台研发团队 Chinese Virtual Observatory.
Vertical Integration Across Biological Scales A New Framework for the Systematic Integration of Models in Systems Biology University College London CoMPLEX.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Virtual Laboratory Amsterdam L.O. (Bob) Hertzberger Computer Architecture and Parallel Systems Group Department of Computer Science Universiteit van Amsterdam.
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Development of Information Grid
Grid Based Data Integration with Automatic Wrapper Generation
Supporting High-Performance Data Processing on Flat-Files
Presentation transcript:

Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1, D. Jones 3, P. Jones 4, N. Martin 2, S. Oliver 1, C. Orengo 3, N.W. Paton 1, M. Pentony 3, A. Poulovassilis 2, J. Siepen, R.D. Stevens 1, C. Taylor 4, L. Zamboulis 2, and W. Zhu 4 1 University of Manchester 2 Birkbeck College 3 University College London 4 European Bioinformatics Institute

All Hands Meetings, Outline Experimental proteomics ISPIDER architecture Example use cases Conclusion

All Hands Meetings, Separation Protein digestion Mass Spectrometry Experimental proteomics An essential component for elucidation of the biological functions of proteins The study of the set of proteins produced by an organism with the aim of understanding their behaviour under varying conditions Protein DB 2D gel electrophoresis Maldi TOF Enzymatic digestion Identification Protein ID

All Hands Meetings, Experimental proteomics Development of new technologies for: –protein separation (2D-SDS-PAGE, HPLC, Capillary Electrophoresis) –mass spectrometry (Multi-Dimensional protein identification) Availability of publicly accessible protein sequence databases Proteomics databases (PedroDB, gpmDB, PepSeeker, Pride, …) Building experiments involving analysis services orchestration and data processing and integration

All Hands Meetings, Objectives of ISPIDER A Grid dedicated to the creation of bioinformatics experiments for proteomics Develop, or make, existing Proteome databases and Grid-enabled services Develop Middleware support for developing and executing new proteome analyses, based on distributed query processing and workflow technologies Undertake proteomic studies that demonstrate the effectiveness of the resulting infrastructure

All Hands Meetings, Outline Experimental proteomics ISPIDER architecture Example use cases Conclusion and future directions

All Hands Meetings, ISPIDER ExistingE-ScienceInfrastructure ISPIDER Proteomics Grid Infrastructure ISPIDER Proteomics Clients PublicProteomicsResources Proteome Request Handler Instance Ident/Mapping Services Proteomic Ontologies/ Vocabularies Source Selection Services Data Cleaning Services my Grid Ontology Services my Grid DQP AutoMed my Grid Workflows KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package Vanilla Query Client 2D Gel Visualisation Client + Aspergil. Extensions + Phosph. Extensions PPI Validation + Analysis Client Protein ID Client Web services Existing Resources PS WS PF WS TR WS GS WS FA WS PPI WS PID WS PRIDE WS PEDRo WS ISPIDER Resources Phos WS

All Hands Meetings, Outline Experimental proteomics ISPIDER architecture Example use cases Conclusion and future directions

All Hands Meetings, Motivation Protein identification experiments are usually used as input into further analysis processes. – Gathering evidence for a biological hypothesis – Suggesting new hypothesesObjective Augment the identification results with additional information on the identified proteinImplementation Taverna workflow system Value-added protein datasets

All Hands Meetings, Value-added protein datasets PepMapper Web Service GO Services Auxiliary Services

All Hands Meetings, Genome-focused protein identification Motivation Currently, protein identification searches performed over large data sets. This means fewer false negatives, but false positives are also more likely.Objective More focused and thus more efficient protein identificationImplementation Taverna workflow system DQP, a service-based query processor

All Hands Meetings, Genome-focused protein identification DQP Web Service IPI PepMapper web service GOA Web Service select p.Name, p.Seq from p in db_proteinSequences where p.OS='HomoSapiens';

All Hands Meetings, Integrated access to proteome databases Motivation Ability to analyse existing proteomics results en masse is limited, because of the heterogeneities between the schemas of the different databasesObjective Providing integrated access to proteome databases through a common schemaImplementation AutoMed, a framework for mapping heterogeneous schemata DQP, a service-based query processor

All Hands Meetings, Integrated access to proteome databases Automed Wrappers PRIDEPedroDBgpmDB Automed Repository OGSA-DAI Activity OGSA-DAI Activity OGSA-DAI Activity OGSA Distributed Query Processor Automed Query Processor Automed DQP Wrapper User query Result OQL query OQL result

All Hands Meetings, Conclusions + Available e-science technologies provide rapid prototyping facilities for bioinformatics analyses + Combining such technologies is possible and opens up more possibilities  Taverna + DQP  Automed + DQP - Writing custom code is usually required –Processing service output to extract inputs for following services –Transforming results between data formats –Dealing with mismatches between identifiers Developing a user-guided environment for the detection and resolution of mismatches Development of Proteomics client applications (PepMapper, PepSeeker and PRIDE)