Presentation on theme: "SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel."— Presentation transcript:
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel Rojas, EML Research gGmbH, Germany
Goal of SysMO Eleven individual projects Different research outcomes A cross-section of microorganisms, including bacteria, archaea and yeast. Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models. Pool research capacities and know-how.
The crunch No one concept of experimentation or modelling No planned, shared infrastructure for pooling
SysMO-DB Retrofit a data access, model handling and data integration platform: To support and manage the diversity of Data and Models Competencies That promotes shared understanding Using a common platform and common technologies DB
Web-based solution to facilitate: exchange of data, models and processes (intra- and inter- consortia) search for data, models and processes across the initiative maximisation of the "shelf life" and utility of the data, models and processes generated dissemination of results DB SysMO-DB
Our experimental conditions…. Progressive and incre mental Something in it for me all the way along Low hanging fruit immediately Return that matches investment Realistic Eases pressure points and concerns of the groups Lower barriers of engagement Sustainable Flexible, extensible and open
Experimental data Models Processes SysMO DB SysMO-DB Concept SysMO-HUB web interface
SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Prof Jacky Snoep Models EML Research gGmbH, Germany Data Metadata Prof Isabel Rojas University of Manchester, UK Processes (Workflow) Portal Infrastructure Prof Carole Goble
SysMO-DB Team University of Manchester, UK Workflow Portal Infrastructure Software Engineer Stuart Owen University of Manchester, UK Workflow Metadata Bioinformatician Katy Wolstencroft EML Research gGmbH, Germany Databases Metadata Isabel Rojas and Olga Krebs
Construct pathway model in SBML Model analysis New hypothesis Experimental validation Data analysis & integration Model update Model Building New data Simulation New data Validation Predict
Construct pathway model in SBML Model analysis New hypothesis Experimental validation Data analysis & integration Model update New data Predict
Construct pathway model in SBML Model analysis New hypothesis Experimental validation Data analysis & integration Model update Workflows New data JWS Online COPASI Workflows SysMO Data SABIO- RK External Data and Applications Predict
SysMO-Hub Portal Data Models Workflows External Resources
SysMO-Hub Portal My Stuff Data Models Workflows External Resources Private Access Controlled publication
SysMO-Hub Portal My Stuff Data Models Workflows External Resources Private Access Controlled publication Metadata SysMO-SEEK Access Control
Stitching it together Metadata on everything recommendations, MIBBI, our own controlled vocabularies that incrementally evolve Web services simple interfaces that incrementally evolve Web 2.0 style Atom feeds, blogs, wikis, mash ups, REST
JERM Web Service Access Interface Metadata SysMO Data Models JERM Extractor Metadata External Resources Web Service Access Interface Taverna Workflows SysMO HUB Portal (Liferay) Metadata Workflows SysMO SEEK Repositories & Resources Service Interface Integration Discovery, Access Annotation & Collaboration Results Cache myExperiment JWS Online SABIO- RK Metadata Bio Catalogue Access Control
Major Technologies Access – SysMO-Hub Portal (using Liferay) Discovery – SysMO-SEEK SysMO Data Assets and Projects registry Search over myExperiment, JWS Online, BioCatalogue… Data Management – local solutions, recommended data specific solutions, (e.g. SABIO-RK for reaction data) Data Publishing – Just Enough Results Model Web interface Models – publishing, mgt and running –(using JWS Online) Integration and population - Workflows (using Taverna)
Customised web portal Unified access to SysMO resources, and integrated queries across data, workflow and model catalogues, and repositories A common entry to the information created by the SysMO partners. Pre-cooked queries and processes Umbrella for eGroupWare, OpenWetWare, wikis and other solutions Liferay (http://www.liferay.com) portal framework Web Access - SysMO-Hub
Data Exchanges Use existing community standards e.g: MIRIAM: Minimum Information Requested for the Annotation of (biochemical) Models MIAME: Minimum Information for the Annotation of Microarray Experiments MIAPE: Minimum Information for the Annotation of Proteomics Experiments SBML: Systems Biology Markup Language Definition of minimal sets for information exchange within the consortia
Data and Metadata “Just Enough Results Model” minimum metadata for exchange Where storage solutions exist Expose through JERM Where storage solutions do not exist SABIO-RK, iChiP, Brenda, MeMo and many more JWS Online, BioModels COPASI myExperiment Ontologies, catalogues and controlled vocabularies for annotation SysMO SEEK: Registry JERM Web Service Access Interface Metadata SysMO Data JERM Extractor SABIO-RK Access Control
Discovery SysMO-SEEK Self-curated, access-controlled catalogue of assets to promote cooperation Metadata database (who has what) Progressive refinement Projects, Group, Provenance, Files It will NOT hold results. Meta catalogue Search over other catalogues BioCatalogue, myExperiment, JWS Online, BioModels Is itself a web service Incorporate in your own group ware environments and applications
SysMO SEEK Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? and many more
Models Publish, manage, run, validate JWS Online Database of curated models and a simulator Web service enabled Each SysMO projects will have a separate password protected website.
Processes - Workflows Applications and services become accessible to the workflow machinery as Web services or Java applications. Data and application integration and analysis Model construction and population Repeatable and shareable plan Transparent provenance log Taverna Workflow Management System
Example - Manipulation of SBML models in workflows Using libSBML For data integration For constructing and annotating SBML models libSBML written in C then wrapped with a Java API
Related Activities BioCatalogue Community and Expert Curated Catalogue of Life Science Web Services Started June 2008. Target Practice Informatic and metabolomic assessment of biological network changes and of drug-cell interactions Utopia, Taverna workflows Solutions held by SysMO partners eGroupWare, PHProjekt, Basecamp, wikis etc
Training, Consultancy, Know-how Us: Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata. Kick-starting, toolkits, templates You: Social networking for shared content, know-how and best practice Contribution Best of breed solutions in place already User Focus Group of PALS PhD Students and Post Docs
1Falko KrauseTRANSLUCENTBioinformaticsBerlin, Germany 2Leif SteilBaCell-SysMOExperimentalist, databases ?? 3Walter GlaserMOSESBioinformaticianVienna, Austria 4Malkhey VermaMOSES + SulfoSysExperiment/ modeller interface Manchester, UK 5Femke MensonidesMOSES + SulfoSysExperiment/ modeller interface Vrije University, Netherlands 6Hanan Messiha GirgisMOSESExperimentalistManchester, UK 7Pawel SierocinskiSulfoSysExperimentalistWageningen, Netherlands 8Maria RodriguesKOSMOBACModellerVigo. Spain 9Afsaneh Maleki-DizajiSUMOBioinformaticianSheffield, UK 10John HeapCOSMICExperimentalistNottingham, UK 11Walid OmarSTREAMExperimentalistWarwick, UK 12Elon CorreaVallaModellerManchester, UK 13Renate KaniaSysMO-LABDatabaseEML, Germany 14Mark MustersSysMO-LABModellerWageningen, Netherlands 15Terry McGenity’s postdoc?PSysMOExperimentalistEssex, UK 16Maksim ZakhartsevMOSESExperimentalistStuttgart, Germany
Hands On Which data do you need to exchange? i.e. What do you need, what can you give? What are the minimal exchange formats? How to best to annotate your data (giving semantics to your data)? How to cross-relate different types of data (e.g. Genomic, Transcriptomic, Proteomic, Metabolomic, Kinetic, and modelling data) What should be in the SysMO SEEK? What should the portal look like?
Steps so far….Questionnaire Current situation in each project Contribute to design of work packages Responses from: Project 1: BaCell-SysMO Project 2: COSMIC Project 3: SUMO Project 4: KOSMOBAC Project 6: Psysmo Project 7: Pseudomonas fluorescens Project 9: Translucent Project 10: Streptomyces coelicolor Project 11: Silicon cell model
……..Results A spectrum of resources and data management and integration expertise Each project is concerned with data, models and processes, but each partner may not do all All projects are concerned with sharing between their sites. Some are not yet ready to share with all of SysMO. Respect privacy. Governance.
Produced on site and stored in files or excel spread-sheets Not consolidated between group members or project members. No common database solutions; Only who and when produced. Does not to conform to existing minimum metadata ‘omics standards Google search over basic indexing Common format or in a database or repository Group members and project partners, but not the rest of SysMO or outside Annotation of data, may be free-text, but may not conform to existing standards. Google search over basic indexing and annotations. Stored and indexed in relational databases from consortium or other formats Project partners & SysMO but not outside. Some web service interface access to data resources Minimum metadata standards Fully searchable Stored and indexed in relational databases, using databases from consortium or using other formats Fully searchable Project partners, SysMO & the Systems Biology community via web services and data services. Some data exported to public repositories Minimum metadata standards Storage AccessAnnotationDiscovery Data
Model Data but No Models Models are developed in a non-SBML format and are not converted to SBML. None Models are submitted to JWS online in their native format Models are developed in SBML, or in another format, and converted into SBML Little or no annotation of the models using current standards, such as, MIRIAM Models are submitted to JWS Online in SBML Models are submitted to JWS online. Models are developed in SBML Fully annotated using MIRIAM RepresentationAnnotationAccess
All processes are manual with no scripted pipelines or workflows; Some data may also be gathered from external sources, Data is produced and stored locally No automation of routine processes No reference to external resources Data used for models by other groups in same project or locally Some of gathering or model population automated workflows Some web service interfaces to locally generated data and tools Some of gathering or model population is mediated by workflows Verifying simulation results against experimental data is mediated by workflows. Web service interfaces to all locally generated data and tools Workflows are annotated and published on myExperiment for SysMO consortium members. Processes
Silver or Gold Pilots Project 1 BaCell-SysMo Produce datasets, use models, workflow ready Project 7 Pseudomonas fluorescens and Project 6 Psysmo Pseudomonas organisms, use third party data sets and produce their own, model ready, workflow ready Project 10 Streptomyces coelicolor Omics and standards compliant, use third party standard data, workflow ready, model standards skeptic but use models Project 3 SUMO Produce own data and own models, have their own wiki for sharing data, workflow ready and model ready, using COPASI MOSES (though no questionnaire) Local, using models, produce their own data, similar work in Target Practice using UTOPIA and Taverna workflows already SulfoSYS Data solutions, eGroupWare Project 9 TRANSLUCENT Our first Pal! Protein-protein interaction data. PHProjekt SysMOLab and MeMo (though no questionnaire) Wikis, SABIO-RK, etc
Bronze Data Pilot Data storage solutions for project partners who need it Many work mainly with Excel or flat files Need data storage first to disseminate to others and start collaborating KOSMOBAC (Booth) Group
Development Approach You already got something, we will not reinvent. Development and deployment of all components will be incremental Metadata specs SW rapid prototyping Leverage Limited -> Sophisticated Cater for different levels of readiness Customised for each project
Comprehensive up to date audit and list of meetings. First cut Hub and SEEK Project areas set up & access control scheme SysMO-SEEK of data assets & projects with interface Collection of queries/use cases for SEEK and Hub Data With Gold and Silver pals define the first cut JERM With Bronze pal identify storage solution Establish best practices on data annotation Prepare two or three SysMO datasets for workflow readiness Models and Workflows Access to JWS Online and myExperiment Seed with SysMO-specific workflows and models Identify useful workflow packs Engagement Project web site and wiki Build up our PALS team Visits and training timetable JERM and SEEK workshop First Steps – end October 2008
JERM and SEEK Workshop First Pals face 2 face 18-19 September 2008 EML, Heidelberg, Germany Facilitated Preparation: Audit Sweet spots & pains In Meeting: SysMO-SEEK Just Enough Results Model for Exchange
Audit The repositories you use now and plan to use to store your experimental data: home grown; standard; public; private The other repositories you use now and plan to use The data formats you use now and plan to use The SOPs you have in place or plan to The software you use for data management, group ware & project management, model simulation etc: e.g. Rosetta, Oracle, Matlab, R, Mathematica, eGroupware, PHProjekt, wikis Software you have that would be of benefit to all, and are willing to share – e.g. Falko’s Semantic SBML tool, MCISB SBML annotation tool The programming and software environments you use – e.g: Java, Python, C++, Ruby on Rails, Perl Your local expertise available for data management e.g. full time bioinformatician, database manager, commercial outsourcing, none What facilities do you have for coping with external access – how do you export data now? Design a systematic collection mechanism with two of the pals Wiki mining
Sweet spots and Pains Confidentially……in your humble opinion…. What would be the first three low hanging fruit for your project? And what are three obstacles / barriers? Tell us about your experimentalists, modellers and bioinformaticians What doesn’t work right now? What does?
SysMO-SEEK - not the results themselves What schemas or metadata do you have for groupware, projects, SOPs, procedures we can use as a basis for the SEEK model and for sourcing the content? How do you know who has what and what are they doing? What controlled vocabularies do you use for this if any? Which data do you need / would like to know from others in SysMO and outside SysMO? What data would you be willing to give? What is the data release policy of your project? Availability, conditions of use, permissions, credit etc What is the lifecycle of your data Versioning policy,
Just Enough Results Model Exchange Which data do you need / would like to know from others? What data would you be willing to give? How do you annotate your data? How do you cross relate different types of data? What standards for data do you already use and know about?
Revised Hub and SEEK Enhanced SysMO-SEEK of data assets & projects Data Gold and Silver - the first JERM interface Access to a few data sets through Hub using JERM Bronze - established a storage solution JERM-based SysMO datasets for workflow readiness Disseminate best practices on data annotation Models and Workflows Models and Workflows on JWS Online and myExperiment Demoed workflow using data sets through JERM interface Useful workflow packs & launch workflows from portal Engagement Devising next steps with PALS team Visits and training timetable First Steps end March 2009
Back up Teams PALS and DMG Data Management Group SysMO-DB Delivery Team Back up Technical Teams SysMO-Pals Funders Steering Review Governance Hands on engagement SysMO Projects