Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,

Similar presentations


Presentation on theme: "Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,"— Presentation transcript:

1 Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane, Australia, 14 December 2006

2 2 Overview The situation in –omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

3 3 Bioinformaticians & co.

4 4 Open environment Data, Data, Data EBI SeqHound SRS National Center for Biotechnology Information (USA) Cambridge, UK Tokyo, Japan

5 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

6 6 The situation in {genomics, transcriptomics, proteomics, metabolomics..} Lots of data Lots of parameters to choose An analysis takes a long time The analysis services are unreliable Lots of analysis steps Need to record and explain your steps

7 7 Enter workflows Lots of data [high throughput] Lots of parameters to choose [best practice] An analysis takes a long time [long running] The analysis services are unreliable [fault tolerance] Lots of analysis steps [data and control flow] Need to record and explain your steps [provenance]

8 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg Workflow-based middleware

9 9 my Grid my Grid UK e-Science pilot project since 2001 Part of the Open Middleware Infrastructure Institute UK Build middleware for Life Scientists that enables them to undertake in silico experiments and share those experiments and their results. Individual scientists, in under-resourced labs, who use other people’s applications. Open source. Workflows & Semantic Techologies for metadata management. Data flows. Ad hoc & exploratory

10 10 Overview The situation in -omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

11 11 ? 200 Microarray + QTL Genes captured in microarray experiment and present in QTL region Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping Genotype Phenotype [Andy Brass, Steve Kemp, Paul Fisher, 2006]

12 12 Key: A – Retrieve genes in QTL region B – Annotate genes with external database Ids C – Cross-reference Ids with KEGG gene ids D – Retrieve microarray data from MaxD database E – For each KEGG gene get the pathways it’s involved in F – For each pathway get a description of what it does G – For each KEGG gene get a description of what it does [Andy Brass, Steve Kemp, Paul Fisher, 2006]

13 13 Result Captured the pathways returned by QTL and Microarray workflows over the MaxD microarray database Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance. Manually analysis on the microarray and QTL data had failed to identify this gene as a candidate. [Andy Brass, Steve Kemp, Paul Fisher, 2006]

14 14 Trichuris muris (mouse whipworm) infection Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified Workflows: trypanosomiasis cattle experiment, was reused without change. Analysis of the data by a biologist found the processes in a couple of days. [Joanne Pennock, Paul Fisher, 2006]

15 15 Changing scientific practice Systematic and comprehensive automation. Eliminated user bias and premature filtering of datasets and results leading to single sided, expert- driven hypotheses Dry people hypothesise, wet people validate. “make sense of this data” -> “does this make sense?” Workflow factories. Different dataset, different result Accurate provenance.

16 16 Overview The situation in -omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

17 17 User Uptake ~25000 downloads Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput screening Phenotypical studies Plants, Mouse, Human Astronomy Dilbert Cartoons

18 18 Finding and Sharing Tools Taverna Workbench 3 rd Party Applications and Portals Workflow Enactor Service Management Results Management Provenance log Metadata Default Data Store Custom Store DAS KAVEBAKLAVA Feta myExperiment Utopia Clients LSIDs Workflow enactor

19 19 Taverna workbench

20 services Open domain services and resources, Third party. Enforce NO common data model. No common typing, Missing metadata. Soaplab InstantSoap

21 21 Services Landscape

22 22 User Interaction Allows a workflow to call out to an expert human user E.g. Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline [University of Bergen]

23 23 Tools, Tools, Tools Feta Search tool Pedro Annotation tool

24 24 Capture and Curation Effort Ontology and Annotation Curation Team Franck Tanoh and Katy Wolstencroft Community Service Providers Community Scientists

25 25 Scufl Model Taverna Workbench Shielding & Extensible plug-ins Workflow Execution Application Workflow enactor Processor Plain Web Service Soap lab Processor Local Java App Processor WF Enactor Processor Bio MOBY Processor Seq Hound Processor Bio MART Processor WS RF Processor Beanshell Simple Conceptual Unified Flow Language Nested workflows, Automatic iterations, Best guess data type handling

26 26 Service incompatibility Fix up the services to be compatible or…. Shims – libraries of adapters. Automated data type matching using reasoning over a mismatch and service ontology Duncan Hull, myGrid Khalid Belhajjame, ISPIDER

27 27 Shim identification Mismatch detection

28 28 Service failure? Most services are owned by other people No control over service failure Some are research level Workflows only as good as the services they connect. Notify failures Instigate retries Set criticality Substitute services

29 29 Provenance Collection Observes events from the workflow engine Populates an RDF triple store with information from these events Browse interface Simple browser replicates Taverna’s existing result and status browser Graphical browser ProQA Query API urn:data: f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data1 2 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2 …. urn:hit1 … urn:hit50 ….. [instanceOf] [similar_sequence_to] Data generated by services/workfl ows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10 ….. [contains] [instanceOf] urn:BlastNInvocation3 urn:invocation 5 urn:data: f1 [output] New sequence Missed sequence [hasName ] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFrom ] [Zhao et al 07 provenance challenge paper]

30 30

31 31 Provenance Tracking From which Ensembl gene does pathway mmu come from?

32 32 Pathway_idKEGG_idUniprotEnsembl_gene_id Entrez dF Workflows over Results Automatically backtrack through the data provenance graph

33 33 A workflow marketplace

34 34 webTaverna GUI - main

35 35 Overview The situation in -omics Creating new biology using Taverna Taverna Key traits Features on the OMII roadmap Including today’s release

36 36 Ingest Early adopters Pioneers Conservatives Early adopters Pioneers my Grid Pre-release my Grid Release OMII-UK Release Software Engineering XP Software Engineering Quality & Test Evaluation OMII Software Engineering Quality & Test Prioritise & Plan Prioritise & Plan Production Applications & Professional Services my Grid Alliance my Grid Alliance Source-forge community Source-forge community

37 37 Who are the OMII Users? Increasing variation in requirements with the scientific domain. Different scientific/research domains End Users Application Developers Service and Middleware Developers Middleware Deployers Different activities Systems Administrators

38 38 Taverna is now part of OMII-UK Taverna 1.5 – Today! Taverna 1.6 myExperiment

39 39 Integrated provenance Raven release mechanism to simplify updates for the user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data transactions Redeveloped plug in and enactor framework, improved iteration events, data management Taverna 1.5

40 40 Integrated provenance Taverna 1.5

41 41 Integrated provenance Raven release mechanism to simplify updates for the user Taverna 1.5

42 42 Integrated provenance Raven release mechanism to simplify updates for the user +/- 300 semantic annotations for core services Add_ncbi_to_string : beanshell script, need to ask Paul for more details Input: Output: Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping] string: External ID. e.g. NCBI ID [Genebank_GI] return: KEGG gene ID [KEGG_record_id] Get_pathways_by_genes: Search all pathways which include all the given genes [Searching] Input: List of KEGG genes id [KEGG_gene_id] Output: Return a list of pathway_id of specified KEGG genes_id Merge_pathways Stringlist Concatenated This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to the start of each gene id. These gene ids are then cross-referenced to KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway database and its relevant pathways returned. Taverna 1.5

43 43 Integrated provenance Raven release mechanism to simplify updates for the user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data transactions Redeveloped plug in and enactor framework, improved iteration events, data management Taverna 1.5

44 44 Taverna 1.6 Due out Summer 2007 Revised enactment core Native support for long running workflows Data proxy to deal with bulk data transactions Improved service discovery and provenance management

45 45 Future Directions myExperiment pilot prototype Enhancements to the Workflow Core Enhancements to user interface and experience Expanded use of semantic web technologies Engagement with new user communities – cheminformatics, humanities, social sciences etc. Code remains open source and always will

46 46 Obtaining Taverna Taverna is available under the LGPL from our project site on Sourceforge.net Win32, Solaris / Linux & OS-X Includes online and downloadable user manual, examples etc. Support via project mailing lists

47 47 Conclusions See plans for Taverna 2.0 on myGrid wiki Taverna development is user-driven Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers Taverna my Grid OMII-UK

48 48 Phase1 my Grid researchers, Phase2 OMII-UK, my Grid Research Team Peter Li, Paul Fisher, Andy Brass, Robert Stevens, Mark Wilkinson EPSRC, Wellcome Foundation, EU Acknowledgements


Download ppt "Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,"

Similar presentations


Ads by Google