Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester.

Similar presentations


Presentation on theme: "Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester."— Presentation transcript:

1 Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

2 Interoperability, Integration and Collaboration Access to distributed and local resources Iteration over data sets Automation of data flow Agile software development Extensible Experimental protocols Part of the myGrid toolkit Taverna Workflows

3 What is myGrid? An e-Science Collaboration Since 2001 Software ● Services ● Content ● Skills ● Community Manchester, Southampton, Oxford and the EMBL-EBI + an alliance of intl. contributing projects and partners Sustainable production level quality –Open Middleware Infrastructure Institute UK –Software Sustainability Institute –Mixture of developers, bioinformaticians and researchers Open source development and content LGPL or BSD

4 Connecting Things Together Data Resources –Genome databases –Kinetic/metabolite data Analysis tools –Sequence alignment –Similarity searching –Pattern matching Knowledge Resources –Ontologies –Controlled vocabularies

5 Create and run workflows Share, discover and reuse workflows Manage the metadata needed and generated RDF, OWL Discover and reuse services Feta A Collection of Components

6 Scientific workflow management system for accessing public data services, assembling data processing and analysis pipelines and recording provenance. Social collaboration environments (“e-Laboratories”) for sharing, curating and cataloguing personal, group and community contributed scientific assets. Accelerating Science

7 What is a Workflow Set of services (web services, RESTful, local scripts, other workflows) Set of data links between services - “ put output X from service A as input Y to service B ” –If needed: List handling, control links This can be called a data-oriented workflows (dataflow) –Say where you want the data to flow instead of what you want to do –Compare with more procedural workflow languages like BPEL Beneficial way of thinking for much data-driven scientific research

8 Kepler Triana BPEL Ptolemy II Taverna

9 Workflow diagram Tree view of workflow structure Available services Taverna Open source and extensible

10 Taverna Gui and Enactor Taverna Remote Execution service T-REX Graphical Workbench Drag and drop interface Plug-in architecture Nested Workflows Workflow Enactor Local and remote enactor Implicit iteration over data collections Automation of data flow Logging and data provenance tracking

11 Taverna http://www.taverna.org.uk Software Release Taverna first released 2004. Current versions 1.7.2 and Taverna 2.1.2 Currently 1500 + users per month, 350+ organizations, ~40 countries, 80000+ downloads across versions Availability Freely available, open source LGPL Windows, Mac OS, and Linux Resources http://www.taverna.org.uk, http://www.mygrid.org.uk User and developer workshops, documentation, email help desk Collaborations with numerous groups including NCI’s cancer biomedical informatics grid (caBIG), EMBL-EBI, NCBI, Concept Web Alliance, Bio2RDF Software ● Services ● Content ● Skills ● Community ●

12 What types of service? WSDL Web Services BioMart R-processor BioMoby Soaplab Grid Services Local Java services Beanshell Workflows Coming soon.....New REST support

13 Who Provides the Services? Open domain services and resources Taverna accesses 3500+ services (11,874 operations) Third party – we don’t own them – we didn’t build them All the major providers –NCBI, DDBJ, EBI … Enforce NO common data model. Quality Web Services considered desirable

14 What do Scientists use Taverna for? Astronomy Music Meteorology Social Science Cheminformatics

15 UK Institutes Systems Biology International Institutes International Networks Universities Projects Lots of Universities Taverna Adoption

16 Hypothesis Construction and Explanation from the Literature my BioAID, Vl-e Manipulation of SBML models in workflows Pharmacogenomics Association study of Nevirapine- induced skin rash in Thai Population Data Warehousing tGRAP Database Rescue

17 Genome-wide SNP Analysis Analysis over compute clusters Automate annotation of results Mine annotation data for patterns [Hoyle] Shared Genomics

18 Taverna Grid Use Cases –KnowArc – The Grid-enabled Know-how Sharing Technology Based on ARC Services and Open Standards –caGrid – US Cancer Research project –Moteur – A medical imaging project running on EGEE

19 MicroArray from tumor tissue Microarray preprocessing Lymphoma prediction Lymphoma Prediction Workflow Wei Tan Univ. Chicago Ack. Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT) caArray GenePattern Use gene- expression patterns associated with two lymphoma types to predict the type of an unknown sample.

20 caGrid Plugin for Taverna Taverna support for GAARDS- secured caGrid services Wrap existing 3 rd party services (that are used by existing Taverna users) for caGrid and annotate them to match compatibility guidelines Enables discovery of services in caGrid service registry Lymphoma type prediction workflow

21 Genotype Phenotype Studies Mouse whipworm infection - parasite model of the human parasite - Trichuris trichuria Understanding Phenotype Comparing resistant vs susceptible strains – Microarrays Understanding Genotype Mapping quantitative traits – Classical genetics QTL Joanne Pennock, Richard Grencis University of Manchester

22 Workflow Results Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified Joanne Pennock, Richard Grencis University of Manchester

23 Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified JO IS A LAB BIOLOGIST JO HAS NEVER BUILT A WORKFLOW Joanne Pennock, Richard Grencis University of Manchester Workflow Results

24 Understanding Phenotype Comparing resistant vs susceptible strains – Microarrays Understanding Genotype Mapping quantitative traits – Classical genetics QTL Integrated Microarray data, genomic sequences, pathway data, literature mining. Trypanosomiasis Study Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance Paul Fisher, et al Nucleic Acids Research, 2007, 35(16) http://www.youtube.com/watch?v=x83pzMMw7lk http://www.youtube.com/watch?v=Y6_Kz5L010g

25

26 Just Enough Sharing…. myExperiment can provide a central location for workflows from one community/group myExperiment allows you to say –Who can look at your workflow –Who can download your workflow –Who can modify your workflow –Who can run your workflow

27 The most important aspect of myExperiment - Designed by scientists Ownership and Attribution

28 Packs allow you to collect different items together, like you might with a "wish list" or "shopping basket" You can collect internal things (such as workflows, files and even other packs) as well as link to things outside myExperiment Your packs can then be shared, tagged, discovered and discussed easily on myExperiment Packs

29 Bringing myExperiment to the Taverna User myExperiment Plugin in Taverna

30 Running Workflows Through myExperiment Taverna Remote Execution (T-REX)

31 PREFIX rdf: PREFIX myexp: PREFIX sioc: select ?friend1 ?friend2 ?acceptedat where {?z rdf:type. ?z myexp:has-requester ?x. ?x sioc:name ?friend1. ?z myexp:has-accepter ?y. ?y sioc:name ?friend2. ?z myexp:accepted-at ?acceptedat } All accepted Friendships including accepted-at time Semantically-Interlinked Online Communities

32 Service Discovery There are thousands of distributed services. How do we find an appropriate one? We need to annotate services by their functions (and not their names!) The services might be distributed, but a registry of service descriptions can be central and queried

33 BioCatalogue www.biocatalogue.org A “Web 2.0” catalogue for sharing, discovering and monitoring web services for the Life Sciences. Community and expert curation Community and provider contribution Launched mid 2009. Currently: 370+ members, 1700+ services, 11,870+ operations 110+ providers, 110+ different countries REST APIs Linked Open Data Software Open source BSD Software ● Services ● Content ● Skills ● Community ●

34 Data and Provenance Workflows can generate vast amount of data - how can we manage and track it? We need to manage data AND metadata AND experimental provenance Scientists need to check back over past results, compare workflow runs and share workflow runs with colleagues Scientists need to look at intermediate results when designing and debugging

35 Provenance ## Another slide here Screenshot of provenance view

36 myGrid Open Suite of Tools Client User Interfaces Workflow GUI Workbench Workflow Repository Service Catalogue Third Party Tools Programming and APIs Web Portal Activity and Service Plug-in Manager Provenance Store Workflow Server Open Provenance Model Secure Service Access

37 Toolkits “Taverna Inside” Workflows under the hood e-Laboratories (portals) –Systems Biology, e-Health Web based execution –Running workflows over the web through myExperiment Visualisation clients that call workflows in the background

38 Open e-Lab Platforms Customised myExperiment instances –Australian Kepler Repository –eStat, NeuroHub, Nema, –SpaceBook, HPC/NA –Microsoft Trident BioCatalogue installations –Emory – ed unify project –Eli Lilly SysMO-SEEK e-Laboratory for interlinking and sharing data, models, SOPS and workflows for Systems Biology in Europe ISA-TAB & SBML/MIRIAM compliant Software ● Services ● Content ● Skills ● Community ●

39 Current Work

40 Taverna 2.2 Released end June Workflow diagnostics and error resolution Retry and parallelisation Stop/pause/resume workflows Intermediate results display

41 Taverna Roadmap Next Generation Workbench Access to service, data and workflow repositories More data driven Component families for vertical markets Workflow Patterns Taverna from Excel “myGrid-in-a-Box” –Virtualised Taverna server deployment and distribution, bundle of myExperiment, BioCatalogue and database/tools components.

42 Taverna Labs Semantic Taverna –Semantic provenance Open Provenance Model –Linked Open Data Dutch NBIC Aida toolkit –Automated workflow planning through reasoning e-Lico with U Zurich and Rapid- Miner Taverna in the Cloud Blogging the lab book –Blog3 with Southampton U

43 Training Tutorials and Training –58+ tutorials to >900 people. –>20 universities, Life Science institutes, and networks. –Major Bio conferences –Summer schools in Biology and Middleware. Developer and User Days –Annotation Jamborees Undergraduate and Postgraduate Bioinformatics in > 30 universities. Software ● Services ● Content ● Skills ● Community

44

45 More Information myGrid –http://www.mygrid.org.ukhttp://www.mygrid.org.uk Taverna –http://www.taverna.org.ukhttp://www.taverna.org.uk myExperiment –http://www.myexperiment.orghttp://www.myexperiment.org –http://wiki.myexperiment.orghttp://wiki.myexperiment.org BioCatalogue –http://www.biocatalogue.orghttp://www.biocatalogue.org –http://beta.biocatalogue.orghttp://beta.biocatalogue.org


Download ppt "Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester."

Similar presentations


Ads by Google