Presentation is loading. Please wait.

Presentation is loading. Please wait.

SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,

Similar presentations


Presentation on theme: "SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,"— Presentation transcript:

1 SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch, South Africa Isabel Rojas, EML Research gGmbH, Germany 2 nd Evaluation Conference, 19-20 May 2009, Vienna, Austria

2 Started July 2008, 3 years, 3 staff + 3 investigators people, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intra- and inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB

3 SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep EML Research gGmbH, Germany Isabel Rojas University of Manchester, UK Olga Krebs Wolfgang Müller Sergejs Aleksejevs Carole Goble Stuart Owen Katy Wolstencroft

4 Own solutions Suspicion Data issues Resource Issues Own data solutions and collaboration environments. wikis, e-Groupware, PHProjekt, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared Different organisms, different strains. No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping

5 Principles… A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards Provide instant gratification Address doubt and anxiety Build it rather than write about it.

6 Another view on the goal File Management systems Plone, Alfresco, PHProjekt, eGroupWare, Wikis Specialist databases that you make your own: BASE, maxD, myExperiment Specialist public databases you have a bit of: SABIO, JWS Online, myExperiment Specialist public databases BRENDA, PDB, BioModels, WikiPathways, KEGG, UniProt, GenBank, SGD, PubMed Project Public Reference Data Sets Community Supported Data Sets Pile of spread sheets on my hard drive Personal SysMO

7 Some numbers & Some consequences 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist 11 projects, 91 institutes 20 person days/year/project 2.5 person days/year/institute  “just in case“ approach impossible  Focus on real needs  “just in time“, “just enough“  The right 20%  Help people help themselves  Communication! 80-20-rule: 80% of the features won‘t be used anyway Useful features

8 Social Approach Questionnaires PALS 19 Postdocs and PhD students All three kinds of people Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

9 Communication via PALs DB teamPALSProjects Show what is there Suggest what is possible Ask for requirements Give requirements Tell priorities Rate outcomes Suggest improvements Double check Transmit Disseminate Collect answers

10 SysMO-DB  PALs Meeting statistics 10 months 2 PAL all hands meetings 2 PAL chapter meetings 9 visits to 6 SysMO projects Numerous Skype chats, mails, telcons Impact on development? See later in talk

11 “We need a way of collecting structuring and collecting and sharing Standard Operating Procedures” “Excel spreadsheets are our most common way of collecting and processing data” “I need a kind of “yellow pages” that tells me who is in what project and what they are working on”

12 Modellers Exchange Experimentalists Exchange Bioinformaticians

13 Spreadsheet Repository SBML Models Repository SOP Repository Workflow Repository Consortium Data Models Processes Sops and Workflows SysMO Approach SysMO-SEEK web portal interface JWS Online Assets Catalogue Yellow Pages Search SysMO DB JERM Public data SBML Nature Protocols Workflow Management System

14 Discovery SysMO-SEEK Single, web based, access point Access control & Versioning management Yellow pages (“who is who”) People, Expertise, Equipment Assets catalogue (“who has what”) SOPs, Spreadsheets, pre-published models Metadata about Data held by projects Access to other repositories Models (JWS Online), Workflows (myExperiment), Public web services (BioCatalogue) Call out to external resources e.g. PubMed Does not hold results. Holds metadata on results and links to results A component for SysMO groups to incorporate in their own environments and applications

15 Demo

16

17 Is there any group generating kinetic data? Is this data available? Who is working with which organism? What methods are been used to determine enzyme activity? Under which experimental conditions are my partners working on for the measurement of glucose concentration? ? ? ? ?

18 Finding and Exchanging Project Data “Just Enough” Exchange

19 Data Comparison and Exchange Public data sources model organism databases – (e.g. SGD) BRENDA …. Data produced by SysMO SABIO-RK, iChiP, MeMo …. Local databases & Files Excel Spreadsheets The most common form of experimental data format. Proteomics Metadata Metabolomics Microarray Proteomics Single Cell Data

20 COSMIC and BaCell ( Alfresco, document management system)

21 SysMO LAB Spreadsheet Experiment measurem entn umb er Glucos e Ethano lAcetate Lactat e Formia te Succin at e Pyruva te Acetoi n 2,3 Butan ediol mM 113,57016,6111,570003,060 210032,857,035,7300,564,210 Our Extra Work!!

22 Challenge Aim: Maintain the independence of the projects Data registered in the SEEK Assets Catalogue Data remains at the host project site Data pulled from host project site on request 1. Need to map to a common metadata model for each data type (microarray, metabolomic…) so data can be found, understood and compared. Just Enough Results Models (JERM) 2. Need to create software that interfaces with the different existing project data management setups (Alfresco, eGroupWare, MediaWiki, BASE, Excel…) JERM Adapters and Extractors

23 JERM: Just Enough Results Model Way to “wrap“ data sources to match our agreed common data model for each data type Minimum information needed to exchange data of each type Databases Content management Systems Excel Spreadsheets Data File Store JERM ExtractExport Import Proteomics Metadata Metabolomics Microarray Proteomics Single Cell Data

24 What is Metadata? Information, additional to the raw/processed data itself. What a potential user of the data would need to know to be able to make full and accurate use of the data in a subsequent scientific analysis. Machine readable descriptions of Data, Models, Services, Resources, Applications [COSMIC]

25 CIMRCIMR Core Information for Metabolomics Reporting MIABEMIABE Minimal Information About a Bioactive Entity MIACAMIACA Minimal Information About a Cellular Assay MIAMEMIAME Minimum Information About a Microarray Experiment MIAME/EnvMIAME/Env MIAME / Environmental transcriptomic experiment MIAME/NutrMIAME/Nutr MIAME / Nutrigenomics MIAME/PlantMIAME/Plant MIAME / Plant transcriptomics MIAME/ToxMIAME/Tox MIAME / Toxicogenomics MIAPAMIAPA Minimum Information About a Phylogenetic Analysis MIAPARMIAPAR Minimum Information About a Protein Affinity Reagent MIAPEMIAPE Minimum Information About a Proteomics Experiment MIAREMIARE Minimum Information About a RNAi Experiment MIASEMIASE Minimum Information About a Simulation Experiment MIENSMIENS Minimum Information about an ENvironmental Sequence MIFlowCytMIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGenMIGen Minimum Information about a Genotyping Experiment MIGSMIGS Minimum Information about a Genome Sequence MIMIxMIMIx Minimum Information about a Molecular Interaction Experiment MIMPPMIMPP Minimal Information for Mouse Phenotyping Procedures MINIMINI Minimum Information about a Neuroscience Investigation MINIMESSMINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQEMINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFEMIPFE Minimal Information for Protein Functional Evaluation MIQASMIQAS Minimal Information for QTLs and Association Studies MIqPCRMIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAMMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIEMISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDASTRENDA Standards for Reporting Enzymology Data TBCTBC Tox Biology Checklist BioPAX : Biological Pathways Exchange http://www.biopax.org/http://www.biopax.org/ FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions http://www.mibbi.org/index.php/MIBBI_portal MIBBI: Minimum Information for Biological and Biomedical Investigations Minimum Information Initiatives

26 Just Enough Results Model Inspired by MCISB Key Results initiative and SBRML [Paton et al] Harvested standards Analysed current practice and consortium schemas and spreadsheets Designing the corresponding JERMs Mapping data sources of the projects to JERMs.

27 What does it cover?

28 Experimental Data Metadata People Projects Assay Study Experimental conditions Factors studied Models SOPs Homogenised terminology and values in the datasets themselves Workflows ISA-TAB compliant Investigation Where is it used?

29 Minimum metadata for SysMO exchange What an experiment is. Find Extract metadata from datasets for the Assets catalogue Access Expose data results through a JERM interface Access controlled by consortiums, groups and individuals Just Enough Results Model Metadata SABIO- RK BRENDA myDB mySpread Sheet JERM Web Service Access Interface Access Control JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor

30 COSMIC Alfresco BaCell-SysMO Alfresco MOSES Wiki SysMOLab Wiki SABIO-RK Public Resources SABIO-RK Spread sheets Spread sheets Spread sheets Spread sheets BASE

31 COSMIC BaCell- SysMO SysMOLab MOSES Alfresco Wiki ANOTHER A DATA STORE

32 In Practice for Spreadsheets Native JERM TemplateJERMed + + +

33 Register Extract Matched to the JERM Adding metadata browse search + + Now Whole record

34 Register Extract Matched to the JERM Adding metadata here browse search + + + Whole record Near future Filtered record Enriched record

35 Register Extract Matched to the JERM Adding metadata here browse search + + Future Collections of Records + Meta-analysis

36 JERM Source Extractor Generator New spreadsheets adopt JERM template Legacy spreadsheet JERM mapper. Databases have JERM mapper Spreadsheet Ontology Annotator Restrict the values that a range of fields can have. Just Enough Results Model Tools Metadata SABIO- RK BRENDA myDB mySpread Sheet JERM Web Service Access Interface Access Control JERM Extractor and Access Wrapper Layer JERM Template Source Access and Harvester Source Extractor

37 Models

38 Model JWS Online - database of curated models and a model simulator. ToBiN – platform for storage and analysis of genome scale metabolic networks (PSYSMO) Biomodels - database of curated models (EMBL-EBI) Copasi – Complex Pathway Simulator (Mendes et al) Pre-publication SEEK store Semantic SBML (TRANSLUCENT); SBRML (MCISB) More After the Demo!

39 Processes

40 Experimental Processes Protocol Title Authors Keywords Abstract Materials Reagents Reagent Set Up Equipment Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more.

41 Experimental Processes Protocols and SOPs SOPs assets deposited or linked to SOP gathering Nature Protocols format recommendation High level classification for indexing and tagging Got a few, need more. http://www.molmeth.org http://openwetware.org

42 Workflow Management System Bioinformatics Processes: Workflows Data preparation, annotation and analysis pipelines SBML model construction and population Linking together Data sets, Web Services, R scripts, BioMART, Java libraries, Grid Services, (MATLAB in beta) Free and Open Source

43 Data integration: workflows for model parameterisation and validation. Building models using workflows Manipulation of SBML models in workflows LibSBML: data integration & constructing and annotating SBML models [Li et al]

44 Ramp up when more data resources become workflow accessible Libraries of SysMO workflows Spreadsheet Smart.

45 Microarray Analysis SBML Model manipulation Pathway Analysis Chemical structure analysis Protein structure analysis Kinetic data Excel Spreadsheet handling Controlled vocabulary look- ups http://myexperiment.org

46 User's local file store XML SysMO Seek; Assets catalogue Corresponding JERM schema Tag Metadata of the file and Information about what is measured Controlled vocabulary plug-in Source and sink for workflows Controlled deposit in spreadsheet repository Local Spreadsheet respository

47 Now… Demo!!!!!! Everyone contributed But obviously we only have time for a few examples

48 Models JWS Online model interface http://jjj.mib.ac.uk http://jjj.bio.vu.nl http://jjj.biochem.sun.ac.za Sysmo models interface at JWS Online SBML upload and webservices JWS update, new interface (to be released soon), SBGN schema’s

49 JWS Online SysMO home ~/sysmo

50 MOSES models selection

51 MOSES models

52 JWS Online interface MOSES model link to localhost /sysmo

53 SBML model upload

54 JWS Online access via web services ~/axis/services/QueryJWS?wsdl {getRates, getAllModels, getAllBiomodels, getAllBiomodelsIds, getModelsByOrganism, getModelsByCategory, getModelInfo, getNmat, getKmat, getLmat, getSteadyStateTable, getTimecourse, getJacob, getEigenv, getCmat, getEmat, getRateEquations, getRateEquationFormulae, getExtVar, getExternalMetabValues, getInitMetabValues, getParamValues, hasFunction}

55 JWS Online new interface (α)

56 Spreadsheet Repository SBML Models Repository SOP Repository Workflow Repository Consortium Data Models Processes Sops and Workflows What we have done.... SysMO-SEEK web portal interface JWS Online Assets Catalogue Yellow Pages Search SysMO DB JERM Public data StandardsSBML Nature Protocols Workflow Management System

57 What we have done.... Setup of DB and PALS communication infrastructure SysMO-SEEK yellow pages First prototypes of JERM JWS-online repository with rights Set up myExperiment repository, deposition of useful workflows Advise on adoption of Model DB Minimum metadata standards Data solutions SOP repository setup, description standard Disseminate & promote

58 Training, Know-how and Dissemination SysMO-DB Training Kick-start toolkits, workflows and SOP templates SysMO consortium (esp. PALS) Social networking for shared content, know-how and best practice Contribution and Best of breed solutions in place Outside consortium 6 presentations 2 tutorials More in the pipeline

59 SABIO-RK User Meeting June 15-16, 2009 Heidelberg, Germany Costs supported by SysMO http://projects.eml.org/sdbv/events/SABIORK_UserMeeting/index.html

60 Deviations from initial plan Merge HUB and SEEK into SEEK software reuse SBML models repository authorisation and rights New, requested functionality Yellow pages in SEEK SOP support in SEEK JERM for MS-Excel Much more work, but worth it

61 Future: more, more, more! Extend and stabilize software More JERM  more data in SEEK More JERM extractors, data, search possibilities More Models More data into JWS, Integrate more tools to SysMO-SEEK More SOPs More Workflows Facilitate workflow-ready solutions, Data collection/analysis workflow, Workflow player in SEEK More semantics Closed vocabularies, Ontologies More training

62 Timetable SEEK LaunchJune 2009 JERM Phase 1 demoJuly 2009 Workflow with JWS-Online and SABIO-RKJuly 2009 JERM model stablisedSept 2009 Spreadsheet toolsNov 2009 Model comparisonNov 2009 SEEK controlled vocabulariesFeb 2010 JERM toolingFeb 2010 MIRIAM comparisonMar 2010 Workflow authoring and harvestingMar 2010 Workflow Player in SEEKJune 2010 Training and Outreachongoing

63 How to get there Update SEEK and Share data Do not need to share full content tell people about existence of data; help people avoid duplicate work; find contacts After publication data ready for sharing with the scientific world SysMO-DB will sign a NDA where needed Retaining data at sites comes with responsibility Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project

64 Talk to your PAL Right requirements Right software Steer the project Lots of work under the hood Make sure your PAL has a voice in your project. Look at our wiki Thanks!

65 Acknowledgements SysMO-DB Team SysMO-PALS myGrid, EML and JWS Online teams OMII-UK, Uni Southampton EMBL-EBI, MCISB

66 Thank you! Questions?


Download ppt "SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,"

Similar presentations


Ads by Google