Presentation is loading. Please wait.

Presentation is loading. Please wait.

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,

Similar presentations


Presentation on theme: "SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,"— Presentation transcript:

1 SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany

2 Pan European collaboration. Systems Biology of Microorganisms. The transition from growing to non-growing Bacillus subtilis cells Energy and Saccharomyces cerevisiae Biology of Clostridium acetobutylicum Gene interaction networks and models of cation homeostasis in Saccharomyces cerevisiae http://www.sysmo.net

3 Eleven individual projects, 91 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast. Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models. Pool research capacities and know-how. Already running since April 2007. Runs for 3-5 years. http://www.sysmo.net BaCell-SysMO COSMIC SUMO KOSMOBAC SysMO-LAB PSYSMO Valla MOSES TRANSLUCENT STREAM SulfoSYS

4 The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

5 Own solutions Suspicion Data issues Resource Issues Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

6 Produced on site and stored in files or excel spread-sheets Not consolidated between group members or project members. No common database solutions; Only who and when produced. Does not to conform to existing minimum metadata ‘omics standards Google search over basic indexing Common format or in a database or repository Group members and project partners, but not the rest of SysMO or outside Annotation of data, may be free-text, but may not conform to existing standards. Google search over basic indexing and annotations. Stored and indexed in relational databases from consortium or other formats Project partners & SysMO but not outside. Some web service interface access to data resources Minimum metadata standards Fully searchable Stored and indexed in relational databases, using databases from consortium or using other formats Fully searchable Project partners, SysMO & the Systems Biology community via web services and data services. Some data exported to public repositories Minimum metadata standards Storage AccessAnnotationDiscovery Data

7 Started July 2008, 3 years, 3+3 people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. DB SysMO-DB

8 SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep EML Research gGmbH, Germany Isabel Rojas University of Manchester, UK Carole Goble Olga Krebs Stuart Owen Katy Wolstencroft

9 Principles… 1. A series of small victories Low hanging fruit and early wins 2. Realistic Ease real pressure points and concerns 3. Don‘t reinvent (1) Borrow, link up, spread around what the consortiums already have. 4. Don‘t reinvent (2) Use what is already available in the open community and off the shelf 5. Sustainable Flexible, extensible and open 6. Migrate to standards Encourage standards adoption

10 Modellers Minimum exchange Experimentalists Minimum exchange Bioinformaticians

11 Social Approach Questionnaires Ranked projects Bronze, Silver, Gold and Platinum PALS 18 Postdocs and PhD students All three kinds of people Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

12 Experimental data Models Processes SysMO DB Technical Approach SysMO-SEEK web interface JWS Online SOPs Workflows Public Datasets Consortium Datasets Spreadsheets Assets and Yellow Pages Catalogues

13 Discovery SysMO-SEEK Single, web based, access point Single sign-on access control & versioning management Single search point over yellow pages and assets catalogue People, Expertise, SOP, Equipment Metadata about Data – spreadsheets and databases Models (JWS Online), workflows (myExperiment), public web services (BioCatalogue) Call out to external resources (e.g. PubMed) Does not hold results; holds metadata on results and links to results – pilot COSMIC consortium A component for SysMO groups to incorporate in their own environments and applications

14

15 SysMO SEEK (20 questions) Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? ? ? ? ?

16 Models Database of curated models and a model simulator Web service enabled to run from workflows Separate password protected websites for each project Through SEEK…. Special instance of JWS Online for SysMO Validate and run models from SysMO-SEEK and publish later. Access control as do for other assets Access to other resources (Biomodels, Copasi) Semantic SBML from TRANSLUCENT project SBML and MIRIAM education Publish, manage, run, validate SBML models

17 Experimental Processes Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more.

18 Experimental Processes Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more.

19 Experimental Processes Deposition

20 Workflow Management System Bioinformatics Processes: Workflows Automated, repeatable and shareable specification for linking and running multiple computational tasks. Transparent provenance log of execution and results. Chaining together distributed analysis tools and data sources: Annotation pipelines, data analysis pipelines, text mining, data integration, simulation sweeps SBML model construction and population Data sets and tools accessible to a workflow engine – Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) Free and Open Source

21 Manipulation of SBML models in workflows libSBML: data integration & constructing and annotating SBML models

22 Already in use by individual groups for Research Ramp up when more data resources become workflow accessible Libraries of SysMO workflows

23 Experimental Data Comparison and Exchange Public data sources model organism databases – (e.g. SGD) BRENDA …. Data produced by SysMO SABIO-RK, iChiP, MeMo …. Local databases & Files Remain at the sites and retain control in the groups. Excel Spreadsheets The most common form of experimental data format. SEEK repository asset Metadata SABIO-RK BRENDA myDB mySpread Sheet

24 Minimum metadata for SysMO exchange; what an experiment is. Extract metadata from datasets for the Assets catalogue - exchange Ontologies and controlled vocabularies for annotation Expose data results through a JERM interface – access Access controlled by consortiums, groups and individuals Harvesting standards, current practice and consortium schemas and spreadsheets Inspired by MCISB Key Results initiative and SBRML [Paton] Metadata SABIO-RK BRENDA myDB mySpread Sheet JERM Web Service Access Interface JERM Extractor and Access Wrapper Access Control SysMO SEEK Just Enough Results Model

25 Data Type Specific JERM First Cut General What type of data is it: Microarray, growth curve, enzyme activity… Each data type has a different “minimal model” Phase 1 - Microarray and Metabolomics Careful mapping to the MIBBI standards (e.g. MIAME) What was measured: Gene expression, OD, metabolite concentration…. What do the values in the datasets mean: Units, time series, repeats… Experiment binding Each individual results set is bound to an experiment/ investigation for exchange across different types of data

26 User's local file store XML SysMO Seek; Assets catalogue Corresponding JERM schema Tag Metadata of the file and Information about what is measured Controlled vocabulary plug-in Source and sink for workflows Controlled deposit in spreadsheet repository Local Spreadsheet respository

27 JERM Exchange Pilot Spring 2009 SysMO-LAB COSMIC MOSES BaCell-SysMO “20 questions”

28 Yellow Pages JERM Web Service Access Interface Metadata SysMO Data Models JERM Ext & Wrap Metadata External Resources Web Service Access Interface Taverna Workflows SysMO SEEK Metadata Workflows Assets Repositories & Resources Service Interface Integration Discovery, Access Annotation & Collaboration Results Cache myExperiment JWS Online SABIO- RK Metadata Bio Catalogue Access Control

29 Related initiatives and sources OpenWetWare Cold Spring Harbor Protocols MIBBI National Centre for BioOntologies OBO Foundary Wikipathways Pathway commons Straininfo ONDEX Pubmed

30 Training and Know-how SysMO-DB Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata. Kick-starting toolkits, workflows and SOP templates Summer schools SysMO consortium (esp. PALS) Social networking for shared content, know-how and best practice Contribution Best of breed solutions in place already

31 Summary SysMO-DB is an exercise in: Sensitively retrofitting a data access, model handling and data integration platform. Supporting the diversity of data, models and competencies Social mediation and manipulation Towards Just Enough™ exchange

32 Acknowledgements SysMO-DB Team SysMO-PALS myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EBI, MCISB

33 Links myExperiment: http://www.myexperiment.orghttp://www.myexperiment.org Taverna: http://www.mygrid.org.ukhttp://www.mygrid.org.uk JWS Online: http://jjj.biochem.sun.ac.za/http://jjj.biochem.sun.ac.za/ SABIO-RK http://sabio.villa-bosch.de/http://sabio.villa-bosch.de/


Download ppt "SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,"

Similar presentations


Ads by Google