Taverna and my Grid Open Workflow for Life Sciences Tom Oinn

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Delivering User Needs: A middleware perspective Steven Newhouse Director.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Using Taverna to access SOAP-based web services Per Larsson CBR
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
© 2011 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 August 15th, 2012 BP & IA Team.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Presenting Statistical Data Using XML Office for National Statistics, United Kingdom Rob Hawkins, Application Development.
Fundamentals of Database Chapter 7 Database Technologies.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
HyperContent 2.0 Common Solutions Group September 21, 2005 Alex Vigdor, Columbia University.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CaliBayes and BASIS: e-Science applications for Systems Biology research Yuhui Chen Institute for Ageing and Health Centre for Integrated Systems Biology.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
Children’s Health Exposure Analysis Resource (CHEAR) CHEAR Center for Data Science Susan Teitelbaum, PhD November 4, 2015.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
Collaborative Development Services Learning From the Open Source Agile Development Process Richard Kilmer, InfoEther LLC.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Maite Barroso – WP4 Workshop – 10/12/ n° 1 -WP4 Workshop- Developers’ Guide Maite Barroso 10/12/2002
De Rigueur - Adding Process to Your Business Analytics Environment Diane Hatcher, SAS Institute Inc, Cary, NC Falko Schulz, SAS Institute Australia., Brisbane,
International Planetary Data Alliance Registry Project Update September 16, 2011.
Infrastructure and Workflow for the Formal Evaluation of Semantic Search Technologies Stuart N. Wrigley 1, Raúl García-Castro 2 and Cassia Trojahn 3 1.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Deployment of Flows Loretta Auvil
University of Chicago and ANL
Grid Portal Services IeSE (the Integrated e-Science Environment)
Alan Williams, Donal Fellows, Finn Bacall,
Module 01 ETICS Overview ETICS Online Tutorials
Taverna workflow management system
Shim (Helper) Services and Beanshell Services
Automation of Control System Configuration TAC 18
MSDI training courses feedback MSDIWG10 March 2019 Busan
Presentation transcript:

Taverna and my Grid Open Workflow for Life Sciences Tom Oinn

What, who, why? Taverna – a workflow development and enactment environment Who – part of myGrid, an EPSRC funded UK eScience Pilot project coordinated by Carole Goble at Manchester University Why – because bioinformatics is hard enough without turning users into web spiders

Old approach Cut and paste, cgi, shell scripting, ftp, excel Time intensive Manual process, fails to scale sensibly Hard to document and reproduce Good scientific discipline hard to maintain Boring, waste of highly trained scientists

Our approach Capture the scientific method as a formal process model Allow users to construct such models from libraries of available components in a graphical editing environment with semantic support Publish process definitions as scientific methods, enact and automatically scale to large data sets, multiple runs Automatically collect enactment metadata – workflow provenance.

What can we integrate? Web services defined by WSDL Pathport, BIND, Gene Ontology, DBFetch, FASTA, InterproScan, NCBI eUtils… Complex analysis services conforming to Life Science Analysis Engine (LSAE) specification EMBOSS, Jess, any arbitrary legacy C, PERL or Shell script BioMoby services ( PlaNeT, IRI, Spanish Bioinformatics Network, Genome Prairie… Biomart Database Queries Ensembl, DbSNP, VEGA… Local embedded scripts via Java, Perl, Python, Ruby etc. Seqhound Genomic data warehouse Genbank, LocusLink, GO Styx Grid Service Environmental eScience, ocean temperature analysis etc Arbitrary 3 rd Party APIs i.e. BioJava, JUMBO, caBIG

Comparative Genomics BiomartAndEMBOSSAnalysis.xml

Functional Analysis Workflow… CompareXandYFunctions.xml

…and result

Philosophy Open world approach for services Do not require service providers to change Maximize interoperability Extend on demand Minimalist functional core, declarative language, many plugin extension points Open development approach as well LGPL License Transparent, public development process CVS, Mailing lists, website are all public at all times Avoid institutional ‘ownership’ of code to safeguard long term future development

Taverna network architecture diagram

Implicit Iteration Allows services to consume collections of items without service modification Equivalent to higher order map functions Graphical configuration Intuitively understood by our user community Scares computer scientists

Workflow summary views Diagram and HTML report of the structure of and resources used by the workflow Intended to be added to papers, websites etc. Can be used by portals, workflow repositories Supports reuse – very important!

Semantic and Naïve Search Find services by name or… …by function, input types, resources

Successful? Over 1200 downloads of the workbench software for release 1.0 Averaging downloads / day for release 1.1 Slightly scary 220 downloads in three days for 1.2 Over 100 active mailing list participants Over 1300 available services Used across the world in widely differing projects, mostly but not all in bioinformatics (some cheminformatics) Active external developer community!

Taverna User Support Taverna has a self supporting user community Access help from other users and from the project developers via our mailing lists All accessible from We have a user manual! Please use it

Where next? Funding Core myGrid project has completed (3 years) Follow-on platform grant for core team until 2008 Associated consumer / helper projects Comparagrid, EMBRACE, iSpider… Will be used to… Enhance the scalability of the workflow core Investigate new interfaces (Dalec, Data driven workbench…)

Schedule 1.3 Release in September Final version 1 release Moving to 2.0 with new workflow core by end 2005

Acknowledgements my Grid is an EPSRC funded UK eScience Program Pilot Project Particular thanks to the other members of the Taverna project,