SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe Carole Goble, University of Manchester,
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Providing an environment where every data-driven researcher will thrive Professor Carole Goble University of Manchester,
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
Academic Services Interactive Media Managing the Web with Java JA-SIG Winter 2002 Robert Sherratt Academic Services, Interactive Media.
RightField Rich Annotation of Experimental Biology through Stealth Using Spreadsheets Katy Wolstencroft, Stuart Owen, Matthew Horridge, Olga Krebs, Wolfgang.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Good practice in Research Data Management Module 6: Tools, training and support.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Taverna and my Grid Basic overview and Introduction Tom Oinn
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.
Data-driven research with e-Laboratories Stuart Owen University of Manchester
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
The Physiome Model Repository – PMR David Nickerson Auckland Bioengineering Institute The University.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Professor Carole Goble
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Sharing Models. How Can I Exchange Models? SBML (Systems Biology Markup Language): de facto standard for representing cellular networks. A large number.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
The Digital Library for Earth System Science: Contributing resources and collections GCCS Internship Orientation Holly Devaul 19 June 2003.
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
A Fedora 3 to 4 Migration Case Study for UNSW Australia Library Fedora 4 Training Workshop, eResearch Australasia 2015, Brisbane UNSW Library Arif Shaon,
Web Technologies for Bioinformatics Ken Baclawski.
 9 European Countries  1 Third Country  14 Research Centers of Excellence  5 Universities  4 SMEs  1 Venture Capital.
1 Gateways. 2 The Role of Gateways  Generally associated with primary sites in ESG-CET  Provides a community-facing web presence  Can be branded as.
SEEK & JERM Progress Stuart Owen December Alphabetical pagination Requested by several users. Will also be applied to Sops, Models & Data – (needs.
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
Describing and Annotating Experimental Data: Hands On.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
Linked Open Data Approaches within the ARIADNE project
Professor Carole Goble University of Manchester, UK
VI-SEEM Data Repository
An ecosystem of contributions
Malte Dreyer – Matthias Razum
Xpath service Getting data out of XML Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester 1.
SDMX IT Tools SDMX Registry
Presentation transcript:

SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany

Pan European collaboration. Systems Biology of Microorganisms. The transition from growing to non-growing Bacillus subtilis cells Energy and Saccharomyces cerevisiae Biology of Clostridium acetobutylicum Gene interaction networks and models of cation homeostasis in Saccharomyces cerevisiae

Eleven individual projects, 91 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast. Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models. Pool research capacities and know-how. Already running since April Runs for 3-5 years. BaCell-SysMO COSMIC SUMO KOSMOBAC SysMO-LAB PSYSMO Valla MOSES TRANSLUCENT STREAM SulfoSYS

The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

Own solutions Suspicion Data issues Resource Issues Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

Produced on site and stored in files or excel spread-sheets Not consolidated between group members or project members. No common database solutions; Only who and when produced. Does not to conform to existing minimum metadata ‘omics standards Google search over basic indexing Common format or in a database or repository Group members and project partners, but not the rest of SysMO or outside Annotation of data, may be free-text, but may not conform to existing standards. Google search over basic indexing and annotations. Stored and indexed in relational databases from consortium or other formats Project partners & SysMO but not outside. Some web service interface access to data resources Minimum metadata standards Fully searchable Stored and indexed in relational databases, using databases from consortium or using other formats Fully searchable Project partners, SysMO & the Systems Biology community via web services and data services. Some data exported to public repositories Minimum metadata standards Storage AccessAnnotationDiscovery Data

Started July 2008, 3 years, 3+3 people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. DB SysMO-DB

SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep EML Research gGmbH, Germany Isabel Rojas University of Manchester, UK Carole Goble Olga Krebs Stuart Owen Katy Wolstencroft

Principles… 1. A series of small victories Low hanging fruit and early wins 2. Realistic Ease real pressure points and concerns 3. Don‘t reinvent (1) Borrow, link up, spread around what the consortiums already have. 4. Don‘t reinvent (2) Use what is already available in the open community and off the shelf 5. Sustainable Flexible, extensible and open 6. Migrate to standards Encourage standards adoption

Modellers Minimum exchange Experimentalists Minimum exchange Bioinformaticians

Social Approach Questionnaires Ranked projects Bronze, Silver, Gold and Platinum PALS 18 Postdocs and PhD students All three kinds of people Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Experimental data Models Processes SysMO DB Technical Approach SysMO-SEEK web interface JWS Online SOPs Workflows Public Datasets Consortium Datasets Spreadsheets Assets and Yellow Pages Catalogues

Discovery SysMO-SEEK Single, web based, access point Single sign-on access control & versioning management Single search point over yellow pages and assets catalogue People, Expertise, SOP, Equipment Metadata about Data – spreadsheets and databases Models (JWS Online), workflows (myExperiment), public web services (BioCatalogue) Call out to external resources (e.g. PubMed) Does not hold results; holds metadata on results and links to results – pilot COSMIC consortium A component for SysMO groups to incorporate in their own environments and applications

SysMO SEEK (20 questions) Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? ? ? ? ?

Models Database of curated models and a model simulator Web service enabled to run from workflows Separate password protected websites for each project Through SEEK…. Special instance of JWS Online for SysMO Validate and run models from SysMO-SEEK and publish later. Access control as do for other assets Access to other resources (Biomodels, Copasi) Semantic SBML from TRANSLUCENT project SBML and MIRIAM education Publish, manage, run, validate SBML models

Experimental Processes Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more.

Experimental Processes Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more.

Experimental Processes Deposition

Workflow Management System Bioinformatics Processes: Workflows Automated, repeatable and shareable specification for linking and running multiple computational tasks. Transparent provenance log of execution and results. Chaining together distributed analysis tools and data sources: Annotation pipelines, data analysis pipelines, text mining, data integration, simulation sweeps SBML model construction and population Data sets and tools accessible to a workflow engine – Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) Free and Open Source

Manipulation of SBML models in workflows libSBML: data integration & constructing and annotating SBML models

Already in use by individual groups for Research Ramp up when more data resources become workflow accessible Libraries of SysMO workflows

Experimental Data Comparison and Exchange Public data sources model organism databases – (e.g. SGD) BRENDA …. Data produced by SysMO SABIO-RK, iChiP, MeMo …. Local databases & Files Remain at the sites and retain control in the groups. Excel Spreadsheets The most common form of experimental data format. SEEK repository asset Metadata SABIO-RK BRENDA myDB mySpread Sheet

Minimum metadata for SysMO exchange; what an experiment is. Extract metadata from datasets for the Assets catalogue - exchange Ontologies and controlled vocabularies for annotation Expose data results through a JERM interface – access Access controlled by consortiums, groups and individuals Harvesting standards, current practice and consortium schemas and spreadsheets Inspired by MCISB Key Results initiative and SBRML [Paton] Metadata SABIO-RK BRENDA myDB mySpread Sheet JERM Web Service Access Interface JERM Extractor and Access Wrapper Access Control SysMO SEEK Just Enough Results Model

Data Type Specific JERM First Cut General What type of data is it: Microarray, growth curve, enzyme activity… Each data type has a different “minimal model” Phase 1 - Microarray and Metabolomics Careful mapping to the MIBBI standards (e.g. MIAME) What was measured: Gene expression, OD, metabolite concentration…. What do the values in the datasets mean: Units, time series, repeats… Experiment binding Each individual results set is bound to an experiment/ investigation for exchange across different types of data

User's local file store XML SysMO Seek; Assets catalogue Corresponding JERM schema Tag Metadata of the file and Information about what is measured Controlled vocabulary plug-in Source and sink for workflows Controlled deposit in spreadsheet repository Local Spreadsheet respository

JERM Exchange Pilot Spring 2009 SysMO-LAB COSMIC MOSES BaCell-SysMO “20 questions”

Yellow Pages JERM Web Service Access Interface Metadata SysMO Data Models JERM Ext & Wrap Metadata External Resources Web Service Access Interface Taverna Workflows SysMO SEEK Metadata Workflows Assets Repositories & Resources Service Interface Integration Discovery, Access Annotation & Collaboration Results Cache myExperiment JWS Online SABIO- RK Metadata Bio Catalogue Access Control

Related initiatives and sources OpenWetWare Cold Spring Harbor Protocols MIBBI National Centre for BioOntologies OBO Foundary Wikipathways Pathway commons Straininfo ONDEX Pubmed

Training and Know-how SysMO-DB Training on databases, models, workflow systems and web services, and best practice for the annotation of resources by metadata. Kick-starting toolkits, workflows and SOP templates Summer schools SysMO consortium (esp. PALS) Social networking for shared content, know-how and best practice Contribution Best of breed solutions in place already

Summary SysMO-DB is an exercise in: Sensitively retrofitting a data access, model handling and data integration platform. Supporting the diversity of data, models and competencies Social mediation and manipulation Towards Just Enough™ exchange

Acknowledgements SysMO-DB Team SysMO-PALS myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EBI, MCISB

Links myExperiment: Taverna: JWS Online: SABIO-RK