Presentation is loading. Please wait.

Presentation is loading. Please wait.

Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.

Similar presentations


Presentation on theme: "Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester."— Presentation transcript:

1 Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester

2 What is a Workflow? Workflows provide a general technique for describing and enacting a process Describes what you want to do, not how you want to do it Simple language specifies how bioinformatics processes fit together Processes are represented as web services Repeat Masker Web service GenScan Web Service Blast Web Service Sequence Predicted Genes out

3 What is Taverna? Taverna enables the interoperation between databases and tools by providing a toolkit for composing, executing and managing workflow experiments A workbench for chaining services together, replicating dataflow, in the form of workflows. Access to local and remote resources and analysis tools Automation of data flow Iteration over large data sets

4 Workflow diagram Tree view of workflow structure Available services

5 Who Provides the Services? YOU –You write it, Taverna will consume it Open domain services and resources. Taverna accesses 3000+ services –We don’t own them – we didn’t build them All the major data providers –NCBI, DDBJ, EBI …

6 What types of service? WSDL Web Services BioMart R-processor BioMoby Soaplab Local Java services Beanshell Workflows as services

7 Who uses Taverna? ~41288 downloads Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput screening Genotype/Phenotype studies Health Informatics Astronomy Chemoinformatics Data integration

8 Case Study – Graves Disease Autoimmune disease that causes hyperthyroidism Antibodies to the thyrotropin receptor result in constitutive activation of the receptor and increased levels of thyroid hormone Original my Grid Case Study Ref: Li P, Hayward K, Jennings C, Owen K, Oinn T, Stevens R, Pearce S and Wipat A (2004) Association of variations in NFKBIE with Graves? disease using classical and myGrid methodologies. UK e-Science All Hands Meeting 2004

9 Pharmacogenomics Heavy use of R-Statistics for clinical data analysis Association study of Nevirapine-induced skin rash in Thai Population A systemic (bodywide) allergic reaction with a characteristic rash –100 Cases: rash – 100 Cases: no rash controls –10,000 SNP significantly associated with rash –Pathway analysis and systems biology –Prioritising SNPs –Functional studies –Diagnostic tools

10 [Peter Li, Doug Kell] Systems Biology Model Construction Automatic reconstruction of genome-scale yeast metabolism from distributed data in the life sciences to create and manipulate Systems Biology Markup Models.

11 http://www.genomics.liv.ac.uk/tryps/trypsindex.html Andy Brass Steve Kemp Paul Fisher Sleeping Sickness in African Cattle Caused by infection by parasite (Trypanosoma brucei) Some cattle breeds more resistant than others Differences between resistant and susceptible cattle? Can we breed cattle resistant to infection? Fisher et al (2007) A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis. Nucleic Acids Res.35(16):5625-33

12 Why were the Workflow Approaches Successful? Workflow analysed each piece of data systematically –Eliminated user bias and premature filtering of datasets and results leading to single sided, expert-driven hypotheses The size and amount of the data made a manual approach impractical Workflows capture exactly where data came from and how it was analysed Workflow output produced a manageable amount of data for the biologists to interpret and verify –“make sense of this data” -> “does this make sense?”

13 In short…. Workflows reduce: –Scale of analysis task –User bias and premature filtering –Hypothesis-Driven approach to data analysis –Constant flux of data - problems with re-analysis of data –Implicit methodologies (hyper-linking through web pages) –Error proliferation from any of the listed issues

14 Sharing Experiments my Grid supports the in silico experimental process for individual scientists How do you share your results/experiments/experiences with your –Research group –Collaborators –Scientific community How do you compare your results with others produced by e.g. Kepler / Triana?

15

16

17 Just Enough Sharing…. myExperiment can provide a central location for workflows from one community/group myExperiment allows you to say –Who can look at your workflow –Who can download your workflow –Who can modify your workflow –Who can run your workflow

18 Summary Taverna allows interoperation between local and remote resources allow automated access or analysis to sets of data helps with data integration Is extensible and open source – for application embedding MyExperiment Allows sharing across particular communities Provides a central location for publishing/finding useful workflows

19 my Grid acknowledgements Carole Goble, Norman Paton, Robert Stevens, Anil Wipat, David De Roure, Steve Pettifer OMII-UK Tom Oinn, Katy Wolstencroft, Daniele Turi, June Finch, Stuart Owen, David Withers, Stian Soiland, Franck Tanoh, Matthew Gamble, Alan Williams, Ian Dunlop Research Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Antoon Goderis, Alastair Hampshire, Qiuwei Yu, Wang Kaixuan. Current contributors Matthew Pocock, James Marsh, Khalid Belhajjame, PsyGrid project, Bergen people, EMBRACE people. User Advocates and their bosses Simon Pearce, Claire Jennings, Hannah Tipney, May Tassabehji, Andy Brass, Paul Fisher, Peter Li, Simon Hubbard, Tracy Craddock, Doug Kell, Marco Roos, Matthew Pocock, Mark Wilkinson Past Contributors Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Juri Papay, Savas Parastatidis, Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Victor Tan, Paul Watson, and Chris Wroe. Industrial Dennis Quan, Sean Martin, Michael Niemi (IBM), Chimatica. Funding EPSRC, Wellcome Trust. http://www.mygrid.org.uk http://www.myexperiment.org


Download ppt "Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester."

Similar presentations


Ads by Google