SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe Carole Goble, University of Manchester,
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Providing an environment where every data-driven researcher will thrive Professor Carole Goble University of Manchester,
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
RightField Rich Annotation of Experimental Biology through Stealth Using Spreadsheets Katy Wolstencroft, Stuart Owen, Matthew Horridge, Olga Krebs, Wolfgang.
1 FACS Data Management Workshop The Immunology Database and Analysis Portal (ImmPort) Perspective Bioinformatics Integration Support Contract (BISC) N01AI40076.
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Good practice in Research Data Management Module 6: Tools, training and support.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
CASIMIR Networking Meeting Heathrow, July 2007 CASIMIR WP4 Data Representation John Hancock Duncan Davidson.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Rahul Raman, Ram Sasisekharan Bioinformatics Core Massachusetts Institute of Technology Glue Grants Bioinformatics Meeting April 22-23, 2004 San Diego,
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
Data-driven research with e-Laboratories Stuart Owen University of Manchester
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
The DiVA System: Current Status and Ongoing Development Uwe Klosa Electronic Publishing Centre, Uppsala University, Sweden Eva Müller.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal MINCyT,
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
The Environmental Genomics Thematic Programme Data Centre Dawn Field, Director.
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013.
Got genom e? Community Meetings GMOD.org The GMOD community meets semi- annually to discuss GMOD components, best practices,
The Physiome Model Repository – PMR David Nickerson Auckland Bioengineering Institute The University.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
PIXUS - The JISC Image Portal Demonstrator Portals & Portlets 2003 e-Science Institute Sandy Buchanan
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
Web Technologies for Bioinformatics Ken Baclawski.
SEEK & JERM Progress Stuart Owen December Alphabetical pagination Requested by several users. Will also be applied to Sops, Models & Data – (needs.
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
High throughput biology data management and data intensive computing drivers George Michaels.
Open Science (publishing) as-a-Service Paolo Manghi (OpenAIRE infrastructure) Institute of Information Science and Technologies Italian Research Council.
Describing and Annotating Experimental Data: Hands On.
International Planetary Data Alliance Registry Project Update September 16, 2011.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Technologies Stuart N. Wrigley 1, Raúl García-Castro 2 and Cassia Trojahn 3 1.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
IODE Ocean Data Portal - technological framework of new IODE system Dr. Sergey Belov, et al. Partnership Centre for the IODE Ocean Data Portal.
An Overview of Data-PASS Shared Catalog
Flanders Marine Institute (VLIZ)
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
An ecosystem of contributions
Metadata supported full-text search in a web archive
SDMX IT Tools SDMX Registry
Presentation transcript:

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany 2 nd Evaluation Conference, May 2009, Vienna, Austria

Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB

SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep EML Research gGmbH, Germany Isabel Rojas University of Manchester, UK Olga Krebs Wolfgang Müller Sergejs Aleksejevs Carole Goble Stuart Owen Katy Wolstencroft

Own solutions Suspicion Data issues Resource Issues Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

Principles… A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards Provide instant gratification Address doubt and anxiety Build it rather than write about it.

Another view on the goal File Management systems Plone, Alfresco, PHProjekt, eGroupWare, Wikis Specialist databases that you make your own: BASE, maxD, myExperiment Specialist public databases you have a bit of: SABIO, JWS Online, myExperiment Specialist public databases BRENDA, PDB, BioModels, WikiPathways, KEGG, UniProt, GenBank, SGD, PubMed Project Public Reference Data Sets Community Supported Data Sets Pile of spread sheets on my hard drive Personal SysMO

Some numbers & Some consequences 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist 11 projects, 91 institutes 20 person days/year/project 2.5 person days/year/institute  “just in case“ approach impossible  Focus on real needs  “just in time“, “just enough“  The right 20%  Help people help themselves  Communication! rule: 80% of the features won‘t be used anyway Useful features

Social Approach Questionnaires PALS 19 Postdocs and PhD students All three kinds of people Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

Communication via PALs DB teamPALSProjects Show what is there Suggest what is possible Ask for requirements Give requirements Tell priorities Rate outcomes Suggest improvements Double check Transmit Disseminate Collect answers

SysMO-DB  PALs Meeting statistics 10 months 2 PAL all hands meetings 2 PAL chapter meetings 9 visits to 6 SysMO projects Numerous Skype chats, mails, telcons Impact on development? See later in talk

“We need a way of collecting structuring and collecting and sharing Standard Operating Procedures” “Excel spreadsheets are our most common way of collecting and processing data” “I need a kind of “yellow pages” that tells me who is in what project and what they are working on”

Modellers Exchange Experimentalists Exchange Bioinformaticians

Spreadsheet Repository SBML Models Repository SOP Repository Workflow Repository Consortium Data Models Processes Sops and Workflows SysMO Approach SysMO-SEEK web portal interface JWS Online Assets Catalogue Yellow Pages Search SysMO DB JERM Public data SBML Nature Protocols Workflow Management System

Discovery SysMO-SEEK Single, web based, access point Access control & Versioning management Yellow pages (“who is who”) People, Expertise, Equipment Assets catalogue (“who has what”) SOPs, Spreadsheets, pre-published models Metadata about Data held by projects Access to other repositories Models (JWS Online), Workflows (myExperiment), Public web services (BioCatalogue) Call out to external resources e.g. PubMed Does not hold results. Holds metadata on results and links to results A component for SysMO groups to incorporate in their own environments and applications

Demo

Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? ? ? ? ?

Finding and Exchanging Project Data “Just Enough” Exchange

Data Comparison and Exchange Public data sources model organism databases – (e.g. SGD) BRENDA …. Data produced by SysMO SABIO-RK, iChiP, MeMo …. Local databases & Files Excel Spreadsheets The most common form of experimental data format. Proteomics Metadata Metabolomics Microarray Proteomics Single Cell Data

COSMIC and BaCell ( Alfresco, document management system)

SysMO LAB Spreadsheet Experiment measurem entn umb er Glucos e Ethano lAcetate Lactat e Formia te Succin at e Pyruva te Acetoi n 2,3 Butan ediol mM 113,57016,6111,570003, ,857,035,7300,564,210 Our Extra Work!!

Challenge Aim: Maintain the independence of the projects Data registered in the SEEK Assets Catalogue Data remains at the host project site Data pulled from host project site on request 1. Need to map to a common metadata model for each data type (microarray, metabolomic…) so data can be found, understood and compared. Just Enough Results Models (JERM) 2. Need to create software that interfaces with the different existing project data management setups (Alfresco, eGroupWare, MediaWiki, BASE, Excel…) JERM Adapters and Extractors

JERM: Just Enough Results Model Way to “wrap“ data sources to match our agreed common data model for each data type Minimum information needed to exchange data of each type Databases Content management Systems Excel Spreadsheets Data File Store JERM ExtractExport Import Proteomics Metadata Metabolomics Microarray Proteomics Single Cell Data

What is Metadata? Information, additional to the raw/processed data itself. What a potential user of the data would need to know to be able to make full and accurate use of the data in a subsequent scientific analysis. Machine readable descriptions of Data, Models, Services, Resources, Applications [COSMIC]

CIMRCIMR Core Information for Metabolomics Reporting MIABEMIABE Minimal Information About a Bioactive Entity MIACAMIACA Minimal Information About a Cellular Assay MIAMEMIAME Minimum Information About a Microarray Experiment MIAME/EnvMIAME/Env MIAME / Environmental transcriptomic experiment MIAME/NutrMIAME/Nutr MIAME / Nutrigenomics MIAME/PlantMIAME/Plant MIAME / Plant transcriptomics MIAME/ToxMIAME/Tox MIAME / Toxicogenomics MIAPAMIAPA Minimum Information About a Phylogenetic Analysis MIAPARMIAPAR Minimum Information About a Protein Affinity Reagent MIAPEMIAPE Minimum Information About a Proteomics Experiment MIAREMIARE Minimum Information About a RNAi Experiment MIASEMIASE Minimum Information About a Simulation Experiment MIENSMIENS Minimum Information about an ENvironmental Sequence MIFlowCytMIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGenMIGen Minimum Information about a Genotyping Experiment MIGSMIGS Minimum Information about a Genome Sequence MIMIxMIMIx Minimum Information about a Molecular Interaction Experiment MIMPPMIMPP Minimal Information for Mouse Phenotyping Procedures MINIMINI Minimum Information about a Neuroscience Investigation MINIMESSMINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQEMINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFEMIPFE Minimal Information for Protein Functional Evaluation MIQASMIQAS Minimal Information for QTLs and Association Studies MIqPCRMIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAMMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIEMISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDASTRENDA Standards for Reporting Enzymology Data TBCTBC Tox Biology Checklist BioPAX : Biological Pathways Exchange FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions MIBBI: Minimum Information for Biological and Biomedical Investigations Minimum Information Initiatives

Just Enough Results Model Inspired by MCISB Key Results initiative and SBRML [Paton et al] Harvested standards Analysed current practice and consortium schemas and spreadsheets Designing the corresponding JERMs Mapping data sources of the projects to JERMs.

What does it cover?

Experimental Data Metadata People Projects Assay Study Experimental conditions Factors studied Models SOPs Homogenised terminology and values in the datasets themselves Workflows ISA-TAB compliant Investigation Where is it used?

Minimum metadata for SysMO exchange What an experiment is. Find Extract metadata from datasets for the Assets catalogue Access Expose data results through a JERM interface Access controlled by consortiums, groups and individuals Just Enough Results Model Metadata SABIO- RK BRENDA myDB mySpread Sheet JERM Web Service Access Interface Access Control JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor

COSMIC Alfresco BaCell-SysMO Alfresco MOSES Wiki SysMOLab Wiki SABIO-RK Public Resources SABIO-RK Spread sheets Spread sheets Spread sheets Spread sheets BASE

COSMIC BaCell- SysMO SysMOLab MOSES Alfresco Wiki ANOTHER A DATA STORE

In Practice for Spreadsheets Native JERM TemplateJERMed + + +

Register Extract Matched to the JERM Adding metadata browse search + + Now Whole record

Register Extract Matched to the JERM Adding metadata here browse search Whole record Near future Filtered record Enriched record

Register Extract Matched to the JERM Adding metadata here browse search + + Future Collections of Records + Meta-analysis

JERM Source Extractor Generator New spreadsheets adopt JERM template Legacy spreadsheet JERM mapper. Databases have JERM mapper Spreadsheet Ontology Annotator Restrict the values that a range of fields can have. Just Enough Results Model Tools Metadata SABIO- RK BRENDA myDB mySpread Sheet JERM Web Service Access Interface Access Control JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor

Models

Model JWS Online - database of curated models and a model simulator. ToBiN – platform for storage and analysis of genome scale metabolic networks (PSYSMO) Biomodels - database of curated models (EMBL-EBI) Copasi – Complex Pathway Simulator (Mendes et al) Pre-publication SEEK store Semantic SBML (TRANSLUCENT); SBRML (MCISB) More After the Demo!

Processes

Experimental Processes Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more.

Experimental Processes Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more.

Workflow Management System Bioinformatics Processes: Workflows Data preparation, annotation and analysis pipelines SBML model construction and population Linking together Data sets, Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) Free and Open Source

Data integration: workflows for model parameterisation and validation. Building models using workflows Manipulation of SBML models in workflows LibSBML: data integration & constructing and annotating SBML models [Li et al]

Ramp up when more data resources become workflow accessible Libraries of SysMO workflows Spreadsheet Smart.

Microarray Analysis SBML Model manipulation Pathway Analysis Chemical structure analysis Protein structure analysis Kinetic data Excel Spreadsheet handling Controlled vocabulary look- ups

User's local file store XML SysMO Seek; Assets catalogue Corresponding JERM schema Tag Metadata of the file and Information about what is measured Controlled vocabulary plug-in Source and sink for workflows Controlled deposit in spreadsheet repository Local Spreadsheet respository

Now… Demo!!!!!! Everyone contributed But obviously we only have time for a few examples

Models JWS Online model interface Sysmo models interface at JWS Online SBML upload and webservices JWS update, new interface (to be released soon), SBGN schema’s

JWS Online SysMO home ~/sysmo

MOSES models selection

MOSES models

JWS Online interface MOSES model link to localhost /sysmo

SBML model upload

JWS Online access via web services ~/axis/services/QueryJWS?wsdl {getRates, getAllModels, getAllBiomodels, getAllBiomodelsIds, getModelsByOrganism, getModelsByCategory, getModelInfo, getNmat, getKmat, getLmat, getSteadyStateTable, getTimecourse, getJacob, getEigenv, getCmat, getEmat, getRateEquations, getRateEquationFormulae, getExtVar, getExternalMetabValues, getInitMetabValues, getParamValues, hasFunction}

JWS Online new interface (α)

Spreadsheet Repository SBML Models Repository SOP Repository Workflow Repository Consortium Data Models Processes Sops and Workflows What we have done.... SysMO-SEEK web portal interface JWS Online Assets Catalogue Yellow Pages Search SysMO DB JERM Public data StandardsSBML Nature Protocols Workflow Management System

What we have done.... Setup of DB and PALS communication infrastructure SysMO-SEEK yellow pages First prototypes of JERM JWS-online repository with rights Set up myExperiment repository, deposition of useful workflows Advise on adoption of Model DB Minimum metadata standards Data solutions SOP repository setup, description standard Disseminate & promote

Training, Know-how and Dissemination SysMO-DB Training Kick-start toolkits, workflows and SOP templates SysMO consortium (esp. PALS) Social networking for shared content, know-how and best practice Contribution and Best of breed solutions in place Outside consortium 6 presentations 2 tutorials More in the pipeline

SABIO-RK User Meeting June 15-16, 2009 Heidelberg, Germany Costs supported by SysMO

Deviations from initial plan Merge HUB and SEEK into SEEK software reuse SBML models repository authorisation and rights New, requested functionality Yellow pages in SEEK SOP support in SEEK JERM for MS-Excel Much more work, but worth it

Future: more, more, more! Extend and stabilize software More JERM  more data in SEEK More JERM extractors, data, search possibilities More Models More data into JWS, Integrate more tools to SysMO-SEEK More SOPs More Workflows Facilitate workflow-ready solutions, Data collection/analysis workflow, Workflow player in SEEK More semantics Closed vocabularies, Ontologies More training

Timetable SEEK LaunchJune 2009 JERM Phase 1 demoJuly 2009 Workflow with JWS-Online and SABIO-RKJuly 2009 JERM model stablisedSept 2009 Spreadsheet toolsNov 2009 Model comparisonNov 2009 SEEK controlled vocabulariesFeb 2010 JERM toolingFeb 2010 MIRIAM comparisonMar 2010 Workflow authoring and harvestingMar 2010 Workflow Player in SEEKJune 2010 Training and Outreachongoing

How to get there Update SEEK and Share data Do not need to share full content tell people about existence of data; help people avoid duplicate work; find contacts After publication data ready for sharing with the scientific world SysMO-DB will sign a NDA where needed Retaining data at sites comes with responsibility Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project

Talk to your PAL Right requirements Right software Steer the project Lots of work under the hood Make sure your PAL has a voice in your project. Look at our wiki Thanks!

Acknowledgements SysMO-DB Team SysMO-PALS myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB

Thank you! Questions?