Presentation is loading. Please wait.

Presentation is loading. Please wait.

SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.

Similar presentations


Presentation on theme: "SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester."— Presentation transcript:

1 SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester

2 SysMO-DB approach Linking data to models SysMO-DB, the e-Laboratory

3 SysMO-DB A data access, model handling and data integration platform for Systems Biology: To support and manage the diversity of Data, Models and experimental protocols Local data management systems That promotes shared understanding Using a common platform and common technologies DB

4 Systems Biology Challenges Interdisciplinary work Heterogeneous data and models Modellers and experimentalists have different skills, training, experience Modellers and experimentalists have different vocabularies and jargon Working together

5 Pan European collaboration Eleven individual projects, 91 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave http://www.sysmo.net Systems Biology of Microorganisms

6 The Problem No one concept of experimentation or modelling No planned, shared infrastructure for pooling 

7 Types of data Multiple omics genomics, transcriptomics proteomics, metabolomics fluxomics, reactomics Images Molecular biology Reaction Kinetics Models Metabolic, gene network, kinetic Relationships between data sets/experiments Procedures, experiments, data, results and models Analysis of data

8 Linking and using Data Models Constructed from experimental data Constructed by using parameters from literature Data Analysed and compared and integrated Statistics, pipelines and workflows Identification of the same entities in different data sets Identification of where data sets overlap Experimental context

9 Started in June 2008 Web-based solution to facilitate: exchange of data, models and processes (intra- and inter- consortia) search for data, models and processes across the initiative maximisation of the "shelf life" and utility of the data, models and processes generated dissemination of results DB SysMO-DB

10 SysMO-DB Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep Hits, Germany Isabel Rojas University of Manchester, UK Olga Krebs Wolfgang Müller Carole Goble Stuart Owen Katy Wolstencroft Finn Bacall SABIO-RK JWS Online Taverna myExperiment

11 SysMO-DB PALS team Power Contributors. 21 Postdocs and PhD students Design and technical collaboration team Intense collaboration UK and Continental PALS Chapters Audits and Sharing. Methods, data, models, standards, software, schemas, spreadsheets, SOPs….. 20 questions Deployment into Projects

12 Principles… A series of small victories Realistic Don‘t reinvent Sustainable and extensible Migrate to standards Provide instant gratification Incremental development Fitting in with normal lab practices

13 The Lowest Hanging Fruit SysMO SEEK – a catalogue of assets SysMO Yellow Pages The people and their expertise The institutions and their facilities Data – experimental data sets Data – analysed results Data – external reference data sets Models Processes – laboratory protocols and bioinformatics analyses Publications The catalogue references assets held elsewhere

14 SEEK screenshot?

15 COSMIC BaCell- SysMO SysMOLab MOSES Alfresco Wiki ANOTHER A DATA STORE Harvesters

16 Why not a central Warehouse? Protective of models in progress vs published models. Access and Version management Curator-Rival conflict Reluctant to share data Even within their own projects Legacy spreadsheets dominate Curation practices vary Centralised archive take-up Point to Point Exchange People don’t mind sharing methods People want to advertise publications Nature 461, 145 (10 Sept09)

17 Access Permissions Just Enough Sharing Reusing myExperiment

18 Data Models Processes SysMO DB SysMO-DB Architecture SysMO-SEEK web interface Assets and Yellow Pages Catalogues JERM

19 Making use of the Assets Understanding the content of the data Linking assets together Linking assets to experimental context Running comparisons between data files Running model simulations Running data analysis pipelines

20 What is the JERM? JERM “Just Enough Results Model” Minimum information to exchange data What type of data is it Microarray, growth curve, enzyme activity… What was measured Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats…. Which experiment does it relate to? How does it relate to models? How was the data created SOPs and protocols

21 CIMRCIMR Core Information for Metabolomics Reporting MIABEMIABE Minimal Information About a Bioactive Entity MIACAMIACA Minimal Information About a Cellular Assay MIAMEMIAME Minimum Information About a Microarray Experiment MIAME/EnvMIAME/Env MIAME / Environmental transcriptomic experiment MIAME/NutrMIAME/Nutr MIAME / Nutrigenomics MIAME/PlantMIAME/Plant MIAME / Plant transcriptomics MIAME/ToxMIAME/Tox MIAME / Toxicogenomics MIAPAMIAPA Minimum Information About a Phylogenetic Analysis MIAPARMIAPAR Minimum Information About a Protein Affinity Reagent MIAPEMIAPE Minimum Information About a Proteomics Experiment MIAREMIARE Minimum Information About a RNAi Experiment MIASEMIASE Minimum Information About a Simulation Experiment MIENSMIENS Minimum Information about an ENvironmental Sequence MIFlowCytMIFlowCyt Minimum Information for a Flow Cytometry Experiment MIGenMIGen Minimum Information about a Genotyping Experiment MIGSMIGS Minimum Information about a Genome Sequence MIMIxMIMIx Minimum Information about a Molecular Interaction Experiment MIMPPMIMPP Minimal Information for Mouse Phenotyping Procedures MINIMINI Minimum Information about a Neuroscience Investigation MINIMESSMINIMESS Minimal Metagenome Sequence Analysis Standard MINSEQEMINSEQE Minimum Information about a high-throughput SeQuencing Experiment MIPFEMIPFE Minimal Information for Protein Functional Evaluation MIQASMIQAS Minimal Information for QTLs and Association Studies MIqPCRMIqPCR Minimum Information about a quantitative Polymerase Chain Reaction experiment MIRIAMMIRIAM Minimal Information Required In the Annotation of biochemical Models MISFISHIEMISFISHIE Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments STRENDASTRENDA Standards for Reporting Enzymology Data TBCTBC Tox Biology Checklist BioPAX : Biological Pathways Exchange http://www.biopax.org/http://www.biopax.org/ FuGE Functional Genomics Experiment MGED: Microarray Experimental Conditions http://www.mibbi.org/index.php/MIBBI_portal Minimum Information Models

22 The Idea For each data type….. Transcriptomics Proteomics Metabolomics Single Cell Data Generate and apply…. JERM template JERM extractor for data host Subset registered in SEEK Access / export through JERM interface / template Define a JERM….. Top down analysis of standards Bottom up analysis of practice 1 2 3 ISA- TAB

23 Experimental Data Metadata People Projects Assay Study Experimental conditions Factors studied Models SOPs Homogenised terminology and values in the datasets themselves Workflows Based on ISA-TAB Investigation SEEK + JERM

24 For publishing JERM data needs to be related to SOPs, experimental context (ISA) and other data JERM must be “MIBBI” compliant for exporting to public repositories e.g. Microarray data needs to be MIAME compliant

25 ISA-TAB Relating data and its experimental context Investigation, Study, Assay TAB = tabular A format suitable for spreadsheets http://isatab.sourceforge.net/

26 ISA Provides.... A common framework for relating different types of data e.g. microarrays and proteomics Facilitates submission to international public repositories of genomics, transcriptomics and proteomics studies

27 Identifying Biological Objects What do you have in your data? Proteins/enzymes, genes/expression levels, metabolites Where/how do these objects interact? Pathways, flux, experimental conditions What models describe these interactions Possible when using common frameworks, naming schemes and controlled vocabularies

28 BioPortal Integration for Searching Repository for submitting and sharing Biological ontologies http://bioportal.bioontology.org/http://bioportal.bioontology.org/ Search for concepts across all or selected ontologies BioPortal provides a number of Restful Webservices Search Concept lookup Visualisation Integrated within SEEK as a plugin

29 Tools to help manage data: Annotation standards by stealth Controlled vocabulary plug in BioPortal

30 Following Standards We recommend formats but we do not enforce them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum information models Models – SBML and related standards Publications – PubMed and DOI If you follow the prescribed formats, you get more out, but if you don’t, you can still participate lowering the adoption barrier

31 Off the shelf Except for the JERM, we have only used community resources, vocabularies and services You can get a long way by implementing community practices and providing ways to integrate them

32 SysMO-DB and Models

33 Nicolas Le Novere, Data Integration in the Life Sciences, Manchester, 2009

34 Models: Incentives for using Standards Models can be shared in SysMO-SEEK in any format SBML is the recommended format We also recommend MIRIAM compliance and SBO annotation If you use SBML, you can use JWS Online to run simulations in SEEK

35 Screenshot of JWS Online JWS Online Plugin online simulator, runs in your browser upload models in SBML format Web Service enabled SBGN schemas, with annotations and external links

36 Falko Krause, Humboldt-University, Berlin http://www.semanticsbml.org/aym

37 Models Resources Models can be published in public repositories JWS-Online, BioModels Models can be annotated SBML, MIRIAM, SBO No public resources currently for sharing models with associated data, or for loading new data into models

38 Linking Data to Models Relating data and models Where did the data come from for developing the model? Where did the data come from for validating the model? What were the results of model simulations?

39 Current Functionality in SEEK Show all data used for construction together with the model, such that process can be repeated Uploaded models loaded with this data by default Manually alter parameters and run simulations

40 Next Steps: Model Validation Test/compare model with experimental data for complete system Find data in SEEK Upload data from elsewhere Automatically load into model Run simulations and compare with original results JERM for models Mapping tools – allows you to identify columns/rows in spreadsheets containing the right information

41 ISA for Models Modelling and experimental work intersect Investigations, Study, Assay.....or modelling analysis..... Modelling analysis types Metabolic models, gene networks Modelling type ODE, algebraic Studies – combinations of experimental assays, modelling analyses, and informatics analyses

42 SysMO-DB the e-Laboratory An e-Laboratory is an information system for bringing together people, data and analytical methods at the point of investigation or decision-making

43 Current Status Finding things so that we can compare them Understanding who has what Understanding what can be compared with what – the experimental context

44 Where we are going… A dynamic resource for analysis as well as browsing Automatic comparison of data from inside files Understanding where and how data and models are linked Running simulations with new experimental data Running analyses and workflows over the data and models

45 Workflows from myExperiment Data preparation, annotation and analysis Systems Biology workflow Pack on myExperiment Microarray analysis and text mining Created by Afsaneh Maleki-Dizaji from SUMO, University of Sheffield Based on previous work by Paul Fisher, University of Manchester http://www.myexperiment.org/workflows/187

46 SEEK as a data analysis and meta analysis service SBML model construction and population Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Calibration by COPASI web service Peter Li

47 Data analysis and meta analysis SEEK Analysis Service with pre-cooked analysis tools. Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Calibration by COPASI web service Peter Li Load model: Load data: GO

48 New Directions

49 Opening SysMO Out Using SysMO as a dissemination space for the SysMO consortium Supplementary material in publications Data citation Packaging software so that others can use it Easy to install a SEEK for yourself Packaging and exchanging JERM Templates Helping with standardisation Promotion and example work with SBRML and data and models linkage

50 SysMO-DB Approach in Other projects SysMO2 – new projects and legacy EraSysBio+ Lungsys and SBCancer Virtual Liver

51 New Considerations Eukaryotic organisms Interactions between host and pathogen Human disease multicellular interactions, tissues, organs multiscale modelling

52 Outstanding Issues Keeping data at project sites has responsibilities Reliability - Sites available continuously and promptly Support - Must be proof against virus attacks, etc. Archiving - Beyond the lifetime of the project.

53 How it works Find a solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work PhD students, Post-docs Let the scientists retain control over their data and who can see it Don’t reinvent. Use available vocabularies, minimal model standards Help prevent people duplicating work by linking the people as well as the resources

54 Acknowledgements SysMO-DB Team SysMO-PALS myGrid, Hits and JWS Online teams EMBL-EBI, MCISB http://www.sysmo-db.org


Download ppt "SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester."

Similar presentations


Ads by Google