Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taverna workflow management system

Similar presentations


Presentation on theme: "Taverna workflow management system"— Presentation transcript:

1 Taverna workflow management system
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK Taverna workflow management system UKOLN DevSci: Workflow Tools Bath,

2 Software | Services | Content | Skills | Community
What is myGrid? An e-Science Collaboration Since 2001 Not a grid! Numerous partners involved: University of Manchester University of Southampton University of Oxford EMBL-EBI Provides sustainable and production quality software Supported by OMII-UK, EPSRC and BBSRC Mixture of developers, bioinformaticians and researchers Software | Services | Content | Skills | Community

3 Motivation Challenge: Bioinformatics Large amounts of data
Many open questions Numerous freely available public datasets and analysis tools

4 How do I look at all the genes systematically?
Huge amounts of data Microarray 1000+ Genes QTL regions 100+ Genes How do I look at all the genes systematically? Next Gen Sequencing Vast amounts of data We now have technologies to help with identification of candidate genes But, there is still a problem with the amount of data generated by these high-throughput methods QTL based investigation can easily produce over 200 genes per chromosome region, if there are 5 regions then there are a lot of genes to look at Microarray gene expression studies map entire genomes to the chip (mouse 23,000 genes) Those genes that show a change in their expression under the studied state are chosen These can be in the 1000’s Numbers of genes quickly overwhelms researchers How do researches look at these candidate genes then? 10,000+ Genes

5 Manual approach Search using public web sites and databases
Pubmed Uniprot EBI BioMart Copy and paste to web tools for analysis NCBI Blast EBI InterPro Further processing locally R Perl Python

6 Manual: disadvantages
Scale of analysis task overwhelms researchers – lots of data User bias and premature filtering of datasets – cherry picking Hypothesis-Driven approach to data analysis Constant changes in data - problems with re-analysis of data Implicit methodologies (hyper-linking through web pages) Error proliferation from any of the listed issues – notably human error

7 Web services and workflows
Technology and standards for exposing code and data resources that can be programmatically consumed by a remote third party Description on how to interact with the service, parameters, documentation Workflows General technique for describing and executing a process Describe what you want to do running which services

8 Taverna workflows A set of (local and remote) services to analyze or manage data Nested workflows are also services Data-links connects services i.e. output from service A is input to service B and C Describes the desired dataflow instead of process coordination Automatic iterations Can customize list handling and control links

9 What types of services? Public/private/secured WSDL/SOAP web services
RESTful web services Spreadsheet import Command line tools (local/ssh) Inline scripts (Beanshell, R) Java APIs Customizations: BioMart, BioMoby / SADI Soaplab Grid services (Globus, EGEE gLite, caGrid) … your tool (Plugin tutorial on wiki)

10 Which services? Taverna is general, can connect to standard web services for any domain Bioinformatics: From professional third-party organisations providing robust & open data/analysis services ..to under-the-desk web services for one particular purpose, ran by PhD students  services from 130 providers – crowd sourced and quality monitored

11

12 Taverna workbench Graphical desktop tool
No server installation required Drag-and-drop services into diagram Connect services, run, reconnect, rerun Integrates diverse set of tools

13

14

15

16 Sharing workflows myExperiment.org allows users to share, find, download and rate workflows “Facebook for the scientist” 3000 members, 1100 workflows

17 Extensible UI and engine
Plugins can provide new “perspectives” i.e.: BioCatalogue, myExperiment Provide service-specific customization BioMart interface replicates web site Adding new functionality Looping, branching, dynamic service resolution New service types Design helpers, “Find matching service”

18 Taverna 3 “Next-gen” Under development for 2011
Interactive, component-centric and data-centric workflow design Pre-packaged workflow components Searching for workflow components from BioCatalogue and myExperiment New myGrid workflow components library

19 Taverna command line Executes from a Windows/Linux/OSX shells
Takes a predefined workflow with files as inputs and outputs Quick way to “productionize” a workflow

20 Taverna Server REST/SOAP interface to execute workflows
Client libraries for Ruby and Java Two demonstration web interfaces Ruby Java Portlets Future Detailed execution support and control Security delegation

21 Taverna portlet Example portlet implementation
Executes workflows using Taverna Server

22

23 Ruby web interface Example customized web interface
Uses Ruby gem t2-server

24 Taverna on the cloud Use-case:
SNP analysis and annotation of genome sequenced from breeds of cows in Africa – why are some of them resistent to X? Amazon EC2 with Taverna Server and local services Custom (built-in-a-week) Ruby on Rails web interface Runs through 31 chromosomes in 6.5 hours using 10 instances - $26 Optimised: 2 hours for 10 instances £10.000s for SNP retrieval in 1 week

25

26 Open source, open development
Taverna suite of tools are all open source and free to use Large user community, active mailing lists Lead developers: myGrid in Manchester Contributors from across the world PAL programme myGrid provides training, tutorials and documentation

27 Acknowledgements 27

28 Norman missing! And olga. And wei.
28 28

29 More information http://www.mygrid.org.uk/ http://www.taverna.org.uk/


Download ppt "Taverna workflow management system"

Similar presentations


Ads by Google