My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.

Slides:



Advertisements
Similar presentations
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Advertisements

IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.
Workflows within Taverna Stuart Owen University of Mancester, UK
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
High level Knowledge-based Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK myGrid project
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
Standards and Ontologies to Enable Discovery Data and Information Integration Robin McEntire GlaxoSmithKline 19 Nov, 2002.
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
The Grid as Future Scientific Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
High level Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK Robin McEntire, GSK.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Taverna Workbench Stuart Owen University of Mancester, UK
ACGT: Open Grid Services for Improving Medical Knowledge Discovery Stelios G. Sfakianakis, FORTH.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
Databases, Ontologies and Text mining Session Introduction Part 2 Carole Goble, University of Manchester, UK Dietrich Rebholz-Schuhmann, EBI, UK Philip.
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
1 Class exercise II: Use Case Implementation Deborah McGuinness and Peter Fox CSCI Week 8, October 20, 2008.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
High throughput biology data management and data intensive computing drivers George Michaels.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Katy Wolstencroft University of Manchester
Grid Portal Services IeSE (the Integrated e-Science Environment)
A myGrid Project Tutorial
Taverna workflow management system
Scientific Workflows Lecture 15
Presentation transcript:

my Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester

Background my Grid middleware components to support in silico experiments in biology Originally designed to support bioinformatics chemoinformatics health informatics medical imaging integrative biology

History EPSRC funded UK eScience Program Pilot Project

my Grid in OMII-UK OGSA-DAI my Grid OMII Stack March Developers Dedicated design, implementation, testing and support team – moving towards production quality software

Lots of Resources NAR 2006 – over 850 databases

The User Community Bioinformatics is an open Community Open access to data Open access to resources Open access to tools Open access to applications Global in silico biological research

The User Community Problems Everything is Distributed –Data, Resources and Scientists Heterogeneous data Very few standards –I/O formats, data representation, annotation –Everything is a string! Integration of data and interoperability of resources is difficult

ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC ) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE BINDS PEP (BY SIMILARITY). FT CONFLICT S -> A (IN REF. 3). SQ SEQUENCE 429 AA; MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

my Grid Approach - Workflows General technique for describing and enacting a process describes what you want to do, not how you want to do it Simple language specifies how bioinformatics processes fit together – processes are web services -High level workflow diagram separated from any lower level coding – therefore, you don’t have to be a coder to build workflows Repeat Masker Web service GenScan Web Service Blast Web Service Sequence Predicted Genes out

Freefluo Workflow enactor Scufl + Workflow Object Model Processor Plain Web Service Soap lab Processor Local App Processor Enactor Taverna Workbench Processor Bio MOBY Processor Seq Hound Processor Bio MART SCUFL Application data flow layer Scufl graph + service introspection Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Processor invocation layer Workflow Execution

Taverna Workflow Components Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available Freefluo Workflow engine to run workflows Freefluo SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST

What Services we Support

User Interaction Handling Interaction Service and corresponding Taverna processor allows a workflow to call out to an expert human user Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline Collaboration with the University of Bergen Ref: Poster, Nettab 2005 R for numerical analysis (microarray informatics amongst others)

What shall I do when a service fails? Most services are owned by other people No control over service failure Some are research level Workflows are only as good as the services they connect! To help - Taverna can: Notify failures Instigate retries Set criticality Substitute services

myGrid Users ~20000 downloads Users in US, Singapore, UK, Europe, Australia Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis

Trypanosomiasis Study Resistance to trypanosomiasis in cattle in Kenya Andy Brass, Paul Fisher – University of Manchester Form of Sleeping sickness in cattle – Known as n’gana Caused by Trypanosoma brucei

Study involves Microarray data QTL SNPs Metabolic pathway analysis Need to access microarray data, genomic sequence information, pathway databases AND integrate the results

Workflow Reuse Addisons Disease SNP design Protein annotation Microarray analysis myGrid Workflow Repository

Scufl Workflows + Taverna Workflow Workbench OGSA-Distributed Query Processing Results management LSID mIR e-Science coordination e-Science mediator e-Science process patterns e-Science events Notification service Components designed to work together my Grid information model Metadata & provenance management using semantics KAVE Service management Publication and Discovery using semantics Feta Pedro Ontology Portal & Application tools

Data Management Workflows can generate vast amount of data - how can we manage and track it? Data AND metadata AND experiment provenance LSIDs - to identify objects Semantic Web technologies (RDF, Ontologies) –To store knowledge provenance Taverna workflow workbench & plugins –Ensure automated recording

KAVE Data and metadata management Life Science Identifiers (LSIDs) Information Model File management Support for custom database building Provenance metadata capture using RDF SRB integration OGSA-DAI integration urn:data: f2 urn:data1 urn:data2 urn:compareinvocation3 urn:data1 2 Blast_report [input] [output] [input] [distantlyDerivedFrom] SwissProt_seq [instanceOf] Sequence_hit [hasHits] urn:hit2 …. urn:hit1 … urn:hit50 ….. [instanceOf] [similar_sequence_to] Data generated by services/workfl ows Concepts [ ] [performsTask] Find similar sequence [contains] Services urn:data:3 urn:hit8…. urn:hit5… urn:hit10 ….. [contains] [instanceOf] urn:BlastNInvocation3 urn:invocation 5 urn:data: f1 [output] New sequence Missed sequence [hasName ] literals DatumCollection [type] LSDatum [type] Properties [instanceOf] [output] [directlyDerivedFrom ]

Provenance Browsing in Taverna New in Taverna 1.4

Feta Semantic Discovery Over 3000 services! Find services by their function Questions we can ask: Find me all the services that perform a multiple sequence alignment And accepts protein sequences in FASTA format as input

Upper level ontology Task ontology Informatics ontology Molecular Biology ontology Bioinformatics ontology Web Service ontology Specialises Contributes to sequence biological_sequence protein_sequence nucleotide_sequence DNA_sequence protein_structure_feature BLASTp service Similarity Search Service BLAST service InterProScan service my Grid Ontology

Feta Architecture Feta Engine Service Semantic Discovery Taverna Workbench Feta GUI Client DL Reasoner Ontology Editor Ontologist User Classification - In RDF(S) - Build my Grid Domain Ontology Obtain descriptions Obtain Classification Feta Descriptions Feta Descriptions Feta Descriptions

Annotations Feta has been available for ~1 year Not yet in the release Need critical mass of services before release Annotation experiments with users and domain experts Domain expert annotations much better – –hiring a full-time annotation – see the myGrid website for details

Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results Provenance Record Custom Data Model Input Result Results Integration Smarter workflow design incorporating visualisation VBI collaboration

Utopia SeqVista Visualisation

New Plans for Taverna 2.0

Evolving challenges Long running data intensive workflows Manipulation of confidential or otherwise protected information Use with classical grid systems Interaction with users during workflows

Development Development of Taverna 2.0 –reworking of the processor model to include duel execution semantics incorporating data and control flow –enhanced support for long-running workflows fully distributed workflow enactment and authoring User steering – large scale data transfer

Enhanced Processor Model Modular dispatcher mechanism –Dynamic service binding –Recursive invocation –Data filter implementation –Retry, failover, back-off behaviours Transparent third party data transfers High throughput stream handling with implicit iteration semantics

3 rd Party Data Transfers Allows ‘in place’ referencing of data –Large data sets no longer round-trip between workflow engine and data provider –Allows restricted access to sensitive data Automatic de-reference when a reference type is linked to a value type within a workflow. –Connecting a grid service to a web service

Streaming Data Allow execution of downstream workflow stages on partially complete results from upstream. Service 1Service 2Service 3 Non streaming (Taverna 1), entire iteration must complete at each stage Streamed data, Service 2 starts operating on partial results from Service 1

Recursive Invocation Dispatcher allowing recursive invocation to be plugged into per operation semantics. Gather Result Set Test For completion Modify Input Set Invoke operation Return Result Receive Input

Future Direction Enhancements to the Workflow Core Enhancements to user interface and experience Expanded use of semantic web technologies Code remains open source and always will

Latest News See plans for Taverna 2.0 on myGrid wiki Taverna development is user-driven –Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers Bioinformatics curator for service annotation Details on the myGrid website

Acknowledgements The myGrid group – Past and Present OMII-uk Carole Goble Pinar Alper Tom Oiin Antoon Goderis Matthew Gamble Daniele Turi