Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-driven research with e-Laboratories Stuart Owen University of Manchester

Similar presentations


Presentation on theme: "Data-driven research with e-Laboratories Stuart Owen University of Manchester"— Presentation transcript:

1 Data-driven research with e-Laboratories Stuart Owen University of Manchester stuart.owen@manchester.ac.uk

2 Social collaboration environments for sharing, curating and cataloguing personal, group and community contributed scientific assets. BSD 5000+ registered users, 56 countries 1600+ workflows, 1700+ services Scientific workflow management system for accessing open, public data services, assembling data processing and analysis pipelines and recording provenance. LGPL 361 organisation, 48 countries 70,000+ binary downloads, ~4000 source http://www.mygrid.org.uk Handy tools for data management tasks in bioinformatics. BSD

3 Scientific workflows, scripts and pipelines Now also neuroscience, music and numerical analysis Developed with Oxford and Southampton Web-based Software & Sharing Services “Mobilising the long tail of scientists for all our benefit” Common Ruby on RAILS platform Common and exchanged codebases Systems Biology models, data and protocols Adopted by 4 EU wide consortiums and 4 UK sites Developed with HITS and Stellenboch Crowd sourced curated Web services Adopted by EdUnify and ELDA education projects Developed with EBI and EMBRACE network Find experts, advice, scripts, variable sets Towards interface for UK Data Archives Developed with NIBHI

4 SysMO-DB Project A data access, model handling and data integration platform for Systems Biology: To support and manage the diversity of –Data, Models and experimental protocols (SOPs) from a consortium Web based Standards compliant DB

5 Pan European collaboration 13 individual projects, >100 institutes –Different research outcomes –A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects joined and 6 left http://www.sysmo.net Systems Biology of Microorganisms

6 Data Driven Multiple omics –genomics, transcriptomics –proteomics, metabolomics –fluxomics, reactomics Images Molecular biology Reaction Kinetics Models –Metabolic, gene network, kinetic Relationships between data sets/experiments –Procedures, experiments, data, results and models Analysis of data

7 SOP A Tree View of Assets InvestigationStudiesAssay Construction Validation SOP ISA infrastructure provides a directory structure for experiments http://isatab.sourceforge.net/

8

9 Access Permissions Just Enough Sharing...we don’t talk about security

10 Attribution. Trust. Credit Reward and Provenance Reusing myExperiment

11 COSMIC SysMOLab MOSES Alfresco Wiki ANOTHER A DATA STORE Just Enough sharing SOP Fetch on Request Direct Upload

12 RightField: Annotation by Stealth http://rightfield.org.uk

13

14 SEEK, the e-Laboratory A dynamic resource for analysis as well as browsing Automatic comparison of data from inside files Understanding where and how data and models are linked Running simulations with new experimental data Running analyses and workflows over the data and models

15 Open Integration: JWS Simulator Web based easy to use interface: “runs in your browser”, integrated in SEEK Models can be accessed via browser, SEEK and web services. Data linked to models via file upload (e.g. Excel), or via database connection. Standard simulation functionality

16 Data Fuse

17 Available services http://www.taverna.org.uk Workflow diagram Workflow Explorer Taverna Workbench

18 The Taverna Open Suite of Tools Client User Interfaces GUI Workbench Workflow Repository Service Catalogue Third Party Tools Programming and APIs Web Portals Activity and Service Plug-in Manager Provenance Store Workflow Server Open Provenance Model Secure Service Access Workflow Engine Virtual Machine

19 Taverna and the ‘Cloud’ Analysing Next Generation Sequencing Data +

20 Analysing African Cattle with Taverna 2.2 10,000 years separation African Livestock adaptations: Hardier Better disease resistance Potential outcomes: Food security Understanding resistance Understanding environmental Conditions Drought Parasites Understanding diversity http://www.bbc.co.uk/news/10403254

21 The Analysis Pipeline (in Perl) MAP FILTER ANALYSIS Input SNP data from sequencer Map between Genome Builds (Liftover) Filter for SNPs in Exons SNP consequences Identifying damaging SNPs (Polyphen) Harry Noyes – University of Liverpool

22 Workflow and phases Input SNP file Populate DB with start SNP’s and resource version numbers Lift-over: maps between UMD3 and BTA4 cow assemblies Exon positions from ENSMBL Find SNPs in Exon regions PolyPhen to mark “damaging” SNP’s

23 Accessing Taverna on the Cloud

24 Architecture overview

25 Jobs Status Input Provenance Experiment Metadata Input data summary Loading inputs

26 Summary of Workflow Output Non-synonymous coding SNPs Polyphen predictions: probably damaging 11 Million SNP for N’ Dama The result can be downloaded as a MySQL database or TSV / CSV download

27 Why use the Cloud? This is a highly repetitive task – And “embarrassingly parallel” But it also needs to be done on demand And within the financial reach of researchers – Who do not always have access to their own compute We have very fast network access – So we don’t need to do this in-house

28 Timings

29

30 SEEK as a data analysis and meta analysis service SBML model construction and population Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Calibration by COPASI web service Peter Li

31 Search and Analysis across data sets, models and stuff Analysis pool Analysis As A Cloud Service Analysis using Cloud Computing Services Run analysis tools and knowledge bases Li et al, BMC Bioinformatics 2010, 11:582, doi:10.1186/1471-2105-11-582, highly accessed Hucka and Le Novère, BMC Biology 2010, 8:140, doi:10.1186/1741-7007-8-140 Automated Model Generation MCISB Centre (Li) Annotation pipeline SUMO SysMO project (Maleki-Dizaji) Workflow Management System Next Gen Seq annotation pipelines using Amazon Cloud Services (Noyes, Li )

32

33 SysMO-DB Dev Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep Heidelberg Institute for Theoretical Studies Germany University of Manchester, UK Olga Krebs Wolfgang Müller Sergejs Aleksejevs Carole Goble Stuart Owen Katy Wolstencroft Finn Bacall Franco du Preez Quyen Ngyen

34 Further Information myGrid –http://www.mygrid.org.ukhttp://www.mygrid.org.uk Taverna –http://www.taverna.org.ukhttp://www.taverna.org.uk myExperiment –http://www.myexperiment.orghttp://www.myexperiment.org BioCatalogue –http://www.biocatalogue.orghttp://www.biocatalogue.org SEEK –http://www.sysmo-db.orghttp://www.sysmo-db.org RightField –http://www.rightfield.org.ukhttp://www.rightfield.org.uk MethodBox –http://www.methodbox.org.ukhttp://www.methodbox.org.uk


Download ppt "Data-driven research with e-Laboratories Stuart Owen University of Manchester"

Similar presentations


Ads by Google