Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,

Slides:



Advertisements
Similar presentations
Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK.
Advertisements

David De Roure Social Networking and Workflows in Research.
Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Center for Bioinformatics, University of Tübingen
Peter Rice Bioinformatics and Grid: Progress and Potential Peter Rice, EBI ISGC, April 2005.
Classical and myGrid approaches to data mining in bioinformatics
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Doing it again: Workflows and Ontologies Supporting Science Phillip Lord Frank Gibson Newcastle University.
Workflows within Taverna Stuart Owen University of Mancester, UK
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
The Representation of Scientific Data
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
An Introduction to Taverna Dr. Georgina Moulton and Stian Soiland The University of Manchester
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman.
Science, Workflows and Collections Professor Carole Goble The University of Manchester, UK
The Taverna Workbench: Integrating and analysing biological and clinical data with computerised workflows Dr Katy Wolstencroft myGrid University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness TA Weijing Chen Semantic eScience Week 10, November 7, 2011.
1 Foundations V: Infrastructure and Architecture, Middleware Deborah McGuinness and Joanne Luciano With Peter Fox and Li Ding CSCI Week 10, November.
OMII-UK Software Activities Steven Newhouse, Director.
(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
Towards an understanding of Genotype-Phenotype correlations Paul Fisher et al.,
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Taverna Workbench Stuart Owen University of Mancester, UK
My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
First International Workshop on Portals for Life Sciences Sandra Gesing
EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft and Aleksandra Pawlik.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Katy Wolstencroft University of Manchester
Distributed Computing for System Biology using Taverna Workflows
Taverna workflow management system
Presentation transcript:

Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane, Australia, 14 December 2006

2 Overview The situation in –omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

3 Bioinformaticians & co.

4 Open environment Data, Data, Data EBI SeqHound SRS National Center for Biotechnology Information (USA) Cambridge, UK Tokyo, Japan

acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

6 The situation in {genomics, transcriptomics, proteomics, metabolomics..} Lots of data Lots of parameters to choose An analysis takes a long time The analysis services are unreliable Lots of analysis steps Need to record and explain your steps

7 Enter workflows Lots of data [high throughput] Lots of parameters to choose [best practice] An analysis takes a long time [long running] The analysis services are unreliable [fault tolerance] Lots of analysis steps [data and control flow] Need to record and explain your steps [provenance]

acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg Workflow-based middleware

9 my Grid my Grid UK e-Science pilot project since 2001 Part of the Open Middleware Infrastructure Institute UK Build middleware for Life Scientists that enables them to undertake in silico experiments and share those experiments and their results. Individual scientists, in under-resourced labs, who use other people’s applications. Open source. Workflows & Semantic Techologies for metadata management. Data flows. Ad hoc & exploratory

10 Overview The situation in -omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

11 ? 200 Microarray + QTL Genes captured in microarray experiment and present in QTL region Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping Genotype Phenotype [Andy Brass, Steve Kemp, Paul Fisher, 2006]

12 Key: A – Retrieve genes in QTL region B – Annotate genes with external database Ids C – Cross-reference Ids with KEGG gene ids D – Retrieve microarray data from MaxD database E – For each KEGG gene get the pathways it’s involved in F – For each pathway get a description of what it does G – For each KEGG gene get a description of what it does [Andy Brass, Steve Kemp, Paul Fisher, 2006]

13 Result Captured the pathways returned by QTL and Microarray workflows over the MaxD microarray database Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance. Manually analysis on the microarray and QTL data had failed to identify this gene as a candidate. [Andy Brass, Steve Kemp, Paul Fisher, 2006]

14 Trichuris muris (mouse whipworm) infection Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified Workflows: trypanosomiasis cattle experiment, was reused without change. Analysis of the data by a biologist found the processes in a couple of days. [Joanne Pennock, Paul Fisher, 2006]

15 Changing scientific practice Systematic and comprehensive automation. Eliminated user bias and premature filtering of datasets and results leading to single sided, expert- driven hypotheses Dry people hypothesise, wet people validate. “make sense of this data” -> “does this make sense?” Workflow factories. Different dataset, different result Accurate provenance.

16 Overview The situation in -omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

17 User Uptake ~25000 downloads Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput screening Phenotypical studies Plants, Mouse, Human Astronomy Dilbert Cartoons

18 Finding and Sharing Tools Taverna Workbench 3 rd Party Applications and Portals Workflow Enactor Service Management Results Management Provenance log Metadata Default Data Store Custom Store DAS KAVEBAKLAVA Feta myExperiment Utopia Clients LSIDs Workflow enactor

19 Taverna workbench

services Open domain services and resources, Third party. Enforce NO common data model. No common typing, Missing metadata. Soaplab InstantSoap

21 Services Landscape

22 User Interaction Allows a workflow to call out to an expert human user E.g. Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline [University of Bergen]

23 Tools, Tools, Tools Feta Search tool Pedro Annotation tool

24 Capture and Curation Effort Ontology and Annotation Curation Team Franck Tanoh and Katy Wolstencroft Community Service Providers Community Scientists

25 Scufl Model Taverna Workbench Shielding & Extensible plug-ins Workflow Execution Application Workflow enactor Processor Plain Web Service Soap lab Processor Local Java App Processor WF Enactor Processor Bio MOBY Processor Seq Hound Processor Bio MART Processor WS RF Processor Beanshell Simple Conceptual Unified Flow Language Nested workflows, Automatic iterations, Best guess data type handling

26 Service incompatibility Fix up the services to be compatible or…. Shims – libraries of adapters. Automated data type matching using reasoning over a mismatch and service ontology Duncan Hull, myGrid Khalid Belhajjame, ISPIDER

27 Shim identification Mismatch detection

28 Service failure? Most services are owned by other people No control over service failure Some are research level Workflows only as good as the services they connect. Notify failures Instigate retries Set criticality Substitute services

29 Provenance Collection Observes events from the workflow engine Populates an RDF triple store with information from these events Browse interface Simple browser replicates Taverna’s existing result and status browser Graphical browser ProQA Query API urn:data: f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data1 2 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2 …. urn:hit1 … urn:hit50 ….. [instanceOf] [similar_sequence_to] Data generated by services/workfl ows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10 ….. [contains] [instanceOf] urn:BlastNInvocation3 urn:invocation 5 urn:data: f1 [output] New sequence Missed sequence [hasName ] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFrom ] [Zhao et al 07 provenance challenge paper]

30

31 Provenance Tracking From which Ensembl gene does pathway mmu come from?

32 Pathway_idKEGG_idUniprotEnsembl_gene_id Entrez dF Workflows over Results Automatically backtrack through the data provenance graph

33 A workflow marketplace

34 webTaverna GUI - main

35 Overview The situation in -omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

36 Ingest Early adopters Pioneers Conservatives Early adopters Pioneers my Grid Pre-release my Grid Release OMII-UK Release Software Engineering XP Software Engineering Quality & Test Evaluation OMII Software Engineering Quality & Test Prioritise & Plan Prioritise & Plan Production Applications & Professional Services my Grid Alliance my Grid Alliance Source-forge community Source-forge community

37 Who are the OMII Users? Increasing variation in requirements with the scientific domain. Different scientific/research domains End Users Application Developers Service and Middleware Developers Middleware Deployers Different activities Systems Administrators

38 Taverna is now part of OMII-UK Taverna 1.5 – Today! Taverna 1.6 myExperiment

39 Integrated provenance Raven release mechanism to simplify updates for the user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data transactions Redeveloped plug in and enactor framework, improved iteration events, data management Taverna 1.5

40 Integrated provenance Taverna 1.5

41 Integrated provenance Raven release mechanism to simplify updates for the user Taverna 1.5

42 Integrated provenance Raven release mechanism to simplify updates for the user +/- 300 semantic annotations for core services Add_ncbi_to_string : beanshell script, need to ask Paul for more details Input: Output: Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping] string: External ID. e.g. NCBI ID [Genebank_GI] return: KEGG gene ID [KEGG_record_id] Get_pathways_by_genes: Search all pathways which include all the given genes [Searching] Input: List of KEGG genes id [KEGG_gene_id] Output: Return a list of pathway_id of specified KEGG genes_id Merge_pathways Stringlist Concatenated This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to the start of each gene id. These gene ids are then cross-referenced to KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway database and its relevant pathways returned. Taverna 1.5

43 Integrated provenance Raven release mechanism to simplify updates for the user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data transactions Redeveloped plug in and enactor framework, improved iteration events, data management Taverna 1.5

44 Taverna 1.6 Due out Summer 2007 Revised enactment core Native support for long running workflows Data proxy to deal with bulk data transactions Improved service discovery and provenance management

45 Future Directions myExperiment pilot prototype Enhancements to the Workflow Core Enhancements to user interface and experience Expanded use of semantic web technologies Engagement with new user communities – cheminformatics, humanities, social sciences etc. Code remains open source and always will

46 Obtaining Taverna Taverna is available under the LGPL from our project site on Sourceforge.net Win32, Solaris / Linux & OS-X Includes online and downloadable user manual, examples etc. Support via project mailing lists

47 Conclusions See plans for Taverna 2.0 on myGrid wiki Taverna development is user-driven Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers Taverna my Grid OMII-UK

48 Phase1 my Grid researchers, Phase2 OMII-UK, my Grid Research Team Peter Li, Paul Fisher, Andy Brass, Robert Stevens, Mark Wilkinson EPSRC, Wellcome Foundation, EU Acknowledgements