Data-driven research with e-Laboratories Stuart Owen University of Manchester

Slides:



Advertisements
Similar presentations
David De Roure Social Networking and Workflows in Research.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
SysMo-DB: Supporting Data Access and Integration Carole Goble, University of Manchester UK Jacky Snoep, Uni of Manchester / Stellenbosch, S Africa Isabel.
RightField The Semantic Annotation of Experimental Data using Spreadsheets, The Semantic Annotation of Experimental Data using Spreadsheets, Katy Wolstencroft,
SysMO-DB: A pragmatic approach to sharing information amongst Systems Biology projects in Europe Carole Goble, University of Manchester,
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Katy Wolstencroft, University of Manchester, UK.
SysMO-DB: Towards “just enough” data exchange for the SysMO Consortium Stuart Owen, University of Manchester.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Design of Web-based Systems IS Development: lecture 10.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
© Rheinmetall Defence 2013 The Geospatial Catalogue and Database Repository (GCDR) and the Knowledge Management System (KMS) Shane Reschke – Technical.
E-BIOGENOUEST: A REGIONAL LIFE SCIENCES INITIATIVE FOR DATA INTEGRATION Datacite Annual Conference Nancy Olivier Collin – IRISA/INRIA
Cytoscape A powerful bioinformatic tool Mathieu Michaud
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Taverna and my Grid Basic overview and Introduction Tom Oinn
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester.
RightField: Semantic Enrichment of Systems Biology Data using Spreadsheets Katy Wolstencroft myGrid, SysMO-DB University of Manchester.
SysMo-DB: Towards “just enough” data exchange for the SysMO Consortium Carole Goble, Uni of Manchester, UK Jacky Snoep, Uni of Manchester, UK / Stellenbosch,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Copyright OpenHelix. No use or reproduction without express written consent1.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
SysMO-DB: Sharing and Exchanging Data and Models in Systems Biology Katy Wolstencroft University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Using Biological Cyberinfrastructure Scaling Science and People: Applications in Data Storage, HPC, Cloud Analysis, and Bioinformatics Training Scaling.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
SysMO-DB and ISA Katy Wolstencroft, University of Manchester, UK.
A collaborative tool for sequence annotation. Contact:
Project Database Handler The Project Database Handler is a brokering application, which will mediate interactions between the project database and other.
Analysing African and European cattle with Taverna 2.2 Stuart Owen Based on the work by : Professor Andy Brass and Mohammad Khodadadi.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Workshop: Linking Models and Data in SysMO Katy Wolstencroft, SysMO-DB University of Manchester, UK.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
High throughput biology data management and data intensive computing drivers George Michaels.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Online BIOS QTL atlases
Professor Carole Goble University of Manchester, UK
A portal interface to myGrid workflow technology
Introduction to D4Science
Module 01 ETICS Overview ETICS Online Tutorials
Code Analysis, Repository and Modelling for e-Neuroscience
Taverna workflow management system
Code Analysis, Repository and Modelling for e-Neuroscience
Presentation transcript:

Data-driven research with e-Laboratories Stuart Owen University of Manchester

Social collaboration environments for sharing, curating and cataloguing personal, group and community contributed scientific assets. BSD registered users, 56 countries workflows, services Scientific workflow management system for accessing open, public data services, assembling data processing and analysis pipelines and recording provenance. LGPL 361 organisation, 48 countries 70,000+ binary downloads, ~4000 source Handy tools for data management tasks in bioinformatics. BSD

Scientific workflows, scripts and pipelines Now also neuroscience, music and numerical analysis Developed with Oxford and Southampton Web-based Software & Sharing Services “Mobilising the long tail of scientists for all our benefit” Common Ruby on RAILS platform Common and exchanged codebases Systems Biology models, data and protocols Adopted by 4 EU wide consortiums and 4 UK sites Developed with HITS and Stellenboch Crowd sourced curated Web services Adopted by EdUnify and ELDA education projects Developed with EBI and EMBRACE network Find experts, advice, scripts, variable sets Towards interface for UK Data Archives Developed with NIBHI

SysMO-DB Project A data access, model handling and data integration platform for Systems Biology: To support and manage the diversity of –Data, Models and experimental protocols (SOPs) from a consortium Web based Standards compliant DB

Pan European collaboration 13 individual projects, >100 institutes –Different research outcomes –A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects joined and 6 left Systems Biology of Microorganisms

Data Driven Multiple omics –genomics, transcriptomics –proteomics, metabolomics –fluxomics, reactomics Images Molecular biology Reaction Kinetics Models –Metabolic, gene network, kinetic Relationships between data sets/experiments –Procedures, experiments, data, results and models Analysis of data

SOP A Tree View of Assets InvestigationStudiesAssay Construction Validation SOP ISA infrastructure provides a directory structure for experiments

Access Permissions Just Enough Sharing...we don’t talk about security

Attribution. Trust. Credit Reward and Provenance Reusing myExperiment

COSMIC SysMOLab MOSES Alfresco Wiki ANOTHER A DATA STORE Just Enough sharing SOP Fetch on Request Direct Upload

RightField: Annotation by Stealth

SEEK, the e-Laboratory A dynamic resource for analysis as well as browsing Automatic comparison of data from inside files Understanding where and how data and models are linked Running simulations with new experimental data Running analyses and workflows over the data and models

Open Integration: JWS Simulator Web based easy to use interface: “runs in your browser”, integrated in SEEK Models can be accessed via browser, SEEK and web services. Data linked to models via file upload (e.g. Excel), or via database connection. Standard simulation functionality

Data Fuse

Available services Workflow diagram Workflow Explorer Taverna Workbench

The Taverna Open Suite of Tools Client User Interfaces GUI Workbench Workflow Repository Service Catalogue Third Party Tools Programming and APIs Web Portals Activity and Service Plug-in Manager Provenance Store Workflow Server Open Provenance Model Secure Service Access Workflow Engine Virtual Machine

Taverna and the ‘Cloud’ Analysing Next Generation Sequencing Data +

Analysing African Cattle with Taverna ,000 years separation African Livestock adaptations: Hardier Better disease resistance Potential outcomes: Food security Understanding resistance Understanding environmental Conditions Drought Parasites Understanding diversity

The Analysis Pipeline (in Perl) MAP FILTER ANALYSIS Input SNP data from sequencer Map between Genome Builds (Liftover) Filter for SNPs in Exons SNP consequences Identifying damaging SNPs (Polyphen) Harry Noyes – University of Liverpool

Workflow and phases Input SNP file Populate DB with start SNP’s and resource version numbers Lift-over: maps between UMD3 and BTA4 cow assemblies Exon positions from ENSMBL Find SNPs in Exon regions PolyPhen to mark “damaging” SNP’s

Accessing Taverna on the Cloud

Architecture overview

Jobs Status Input Provenance Experiment Metadata Input data summary Loading inputs

Summary of Workflow Output Non-synonymous coding SNPs Polyphen predictions: probably damaging 11 Million SNP for N’ Dama The result can be downloaded as a MySQL database or TSV / CSV download

Why use the Cloud? This is a highly repetitive task – And “embarrassingly parallel” But it also needs to be done on demand And within the financial reach of researchers – Who do not always have access to their own compute We have very fast network access – So we don’t need to do this in-house

Timings

SEEK as a data analysis and meta analysis service SBML model construction and population Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Calibration by COPASI web service Peter Li

Search and Analysis across data sets, models and stuff Analysis pool Analysis As A Cloud Service Analysis using Cloud Computing Services Run analysis tools and knowledge bases Li et al, BMC Bioinformatics 2010, 11:582, doi: / , highly accessed Hucka and Le Novère, BMC Biology 2010, 8:140, doi: / Automated Model Generation MCISB Centre (Li) Annotation pipeline SUMO SysMO project (Maleki-Dizaji) Workflow Management System Next Gen Seq annotation pipelines using Amazon Cloud Services (Noyes, Li )

SysMO-DB Dev Team University of Stellenbosch, South Africa University of Manchester, UK Jacky Snoep Heidelberg Institute for Theoretical Studies Germany University of Manchester, UK Olga Krebs Wolfgang Müller Sergejs Aleksejevs Carole Goble Stuart Owen Katy Wolstencroft Finn Bacall Franco du Preez Quyen Ngyen

Further Information myGrid – Taverna – myExperiment – BioCatalogue – SEEK – RightField – MethodBox –