Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.

Similar presentations


Presentation on theme: "Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft."— Presentation transcript:

1 Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft

2 Why are workflows important? 21 st century is the century of information More data will be produced in the next 5 years than in the entire history of human-kind NESC e-Science strategy 2008

3 Data Deluge eGovernment World bank data Climate change data Large scale physics Large Hadron collider Astronomy ‘Omics data Next Gen Sequencing

4 Lots of Resources NAR 2012 – 1500 databases

5 Next Generation Sequencing 1000 Genome Project A Deep Catalog of Human Genetic Variation 10000 Genome project a genomic zoo—DNA sequences of 10,000 vertebrate species, approximately one for every vertebrate genus. Human Microbiome Characterise the microbial communities found at several different sites on the human body

6 Where is the data? In repositories run by major service providers (e.g. NCBI, EBI) In local project stores On web pages On ftp servers No defined formats

7 Distribution Data resources Computational power Researchers and collaborators 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

8 What that means for Bioinformatics Sequential use of distributed tools Analysing large data sets Incompatible input and output formats Difficult to record parameter selections Its ok for one gene or one protein, but what about 10000!

9 Workflow as a Solution Sophisticated analysis pipelines A set of services to analyse or manage data (either local or remote) Data flow through services Control of service invocation Iteration Automation

10 Workflows as a solution Flow of data from one tool to the next is automatic Incompatibilities overcome in the workflow with ‘helper’ services (known as shims) Workflow records parameter values and algorithms Workflows can include data integration and visualisation without the loss of information Iteration over large data sets automatic – ideal for high throughput analysis (e.g. omics)

11 Reproducible Research Preventing non-reproducible research An array of errors http://www.economist.com/node/21528593 Duke University, 2006 -Prediction of the course of a patient’s lung cancer using expression arrays and recommendations on different chemotherapies from cell cultures – reported in Nature Medicine 3 different groups could not reproduce the results and uncovered mistakes in the original work

12 If the Analyses were done using Workflows..... Reviewers could re-run experiments and see results for themselves Methods could be properly examined and criticised Mistakes could be pinpointed

13 Kepler Triana BPEL Ptolemy II Taverna Different Workflow Systems VisTrails Galaxy Pipeline Pilot

14 Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32. Taverna: a tool for building and running workflows of services. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Freely available open source Current Version 2.4 80,000+ downloads across version Part of the myGrid Toolkit Taverna Workbench http://www.taverna.org.uk/ Windows/Mac OS X/ Linux/unix

15 Taverna Workflows Part of UK E-Science myGrid project Started in 2001, collaboration across UK Now: Manchester (Goble), Oxford/Southampton (DeRoure) http://www.taverna.org.uk Taverna desktop Client Taverna Server Taverna on the cloud

16 Workflow engine to run workflows List of services Construct and visualise workflows Taverna Workbench Web Services e.g. KEGG Scripts e.g. beanshell, R Programming libraries Programming libraries e.g. libSBML

17 What are Web Services? NOT the same as services on the web (i.e. web forms) Web services support machine-to-machine interaction over a network Therefore, you can automatically connect to and use remote services from your computer in an automated way

18 Using Remote Tools and Services with Taverna Web Services WSDL REST BioMart R-processor Grid Services Local services Beanshell (small, local scripts) Workflows And more.....

19 Open domain services and resources Taverna accesses thousands of services Third party – we don’t own them – we didn’t build them All the major providers –NCBI, DDBJ, EBI … Enforce NO common data model. Who Provides the Services?

20 Asynchronous services Simple WSDL services BioMoby ‘Semantic’ Services How do you use the services?

21

22 Tags Service Description Monitoring Provider Submitter

23 What do Scientists use Taverna for? Astronomy Music Meteorology Social Science Cheminformatics

24 Workflows are …... records and protocols (i.e. your in silico experimental method)... know-how and intellectual property... hard work to develop and get right …..re-usable methods (i.e. you can build on the work of others) So why not share and re-use them

25 Workflow Repository

26 Just Enough Sharing…. myExperiment can provide a central location for workflows from one community/group myExperiment allows you to say Who can look at your workflow Who can download your workflow Who can modify your workflow Who can run your workflow Ownership and attribution

27 Spectrum of Users Advanced users design and build workflows (informaticians) Intermediate users reuse and modify existing workflows or components http://www.myexperiment.org Load Data: Run Workflow Others “replay” workflows through web page

28 A Collection of Tools Client User Interfaces Workflow GUI Workbench and 3 rd party plug-ins Workflow Repository Service Catalogue Programming and APIs Web Portals Activity and Service Plug-in Manager Provenance Store Workflow Server Open Provenance Model Secure Service Access, and Programming APIs E-Laboratories

29 Summary – Workflow Advantages Informatics often relies on data integration and large-scale data analysis Workflows are a mechanism for linking together resources and analyses Promote reproducible research Easy to find and use successful analysis methods developed by others with myExperiment

30 More Information Taverna http://www.taverna.org.uk myExperiment http://www.myexperiment.org BioCatalogue http://www.biocatalogue.org

31 Tutorial Using Taverna to design and build workflows Reusing workflows from myExperiment Analyse a gene set from a Chip-Seq experiment by finding and reusing existing workflows Tutorials are available in the myExperiment group: Cranfield Course - January 2014


Download ppt "Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft."

Similar presentations


Ads by Google