Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.

Slides:



Advertisements
Similar presentations
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Advertisements

ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Planning for Flexible Integration via Service-Oriented Architecture (SOA) APSR Forum – The Well-Integrated Repository Sydney, Australia February 2006 Sandy.
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.
Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
January, 23, 2006 Ilkay Altintas
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Taverna and my Grid Basic overview and Introduction Tom Oinn
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Towards a Provenance Architecture Karen Schuchardt PNNL.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Semantically Enhanced Model Experiment Evaluation Process (SeMEEP) within the Atmospheric Chemistry Community Chris Martin 1,2, Mo Haji 2, Peter Dew 2,
Brian Matthews, DeFINE, Pisa 26/11/02 Trust and the Semantic Web Brian Matthews, Business & Information Technology Dept, CLRC
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Workflow in Grid Systems Workshop Dave Berry, Research Manager UK National e-Science Centre GGF10, Mar 2004.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Services for advanced workflow programming.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
National Library of Finland Strategic, Systematic and Holistic Approach in Digitisation Cultural unity and diversity of the Baltic Sea Region – common.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
CIMA and Semantic Interoperability for Networked Instruments and Sensors Donald F. (Rick) McMullen Pervasive Technology Labs at Indiana University
Life Science Identifiers Chris Wroe (based on material from myGrid team and IBM Life Sciences)
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton.
Research Objects Preserving scientific data and methods Stian Soiland-Reyes, Khalid Belhajjame School of Computer Science, Univ of Manchester myGrid NIHBI.
Provenance: Problem, Architectural issues, Towards Trust
Professor Carole Goble University of Manchester, UK
LSIDs in Taverna Daniele Turi University of Manchester
Grid Portal Services IeSE (the Integrated e-Science Environment)
Presentation transcript:

Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004

Outline my Grid Motivation Challenges my Grid approach Related work Conclusions

myGrid Project A pilot e-Science project in U.K.; Target at biologists and bioinformatician; Three bio-test beds: Providing middleware services in a Grid environment, which are orchestrated in the mechanism of workflows;

e-Science in silico Experiments (workflows) Automate the process of experiments; Orchestrate distributed resources and Web/Grid services; Transparent, seamless access to remote data and computation resources Increase the collaboration and results sharing across multi-scale communities Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisatio n Forming experiments Executing and monitoring experiments Soaplab

Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; Scientists A large repository of zipped records about experiments!! frequently updated resources; volatile, distributed environment

Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; Scientists verification of data; “recipes” for experiment designs; explanation for the impact of changes; ownership; performance of services; data quality; PROVENANCE

Provenance Forms Derivations –A workflow log. –Linking items, in a directed graph. –when, who, how, which, what, where –Execution Process-centric Annotations –Attached to items or collections of items, in a structured, semi-structured or free text form. –Annotations on one item or linking items. –why, when, where, who, what, how. –Data-centric mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 decay = bb mass = 200 plot = 1 mass = 200 event = 8 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000

Challenges  cross-referencing across runs and within experiment;  Provenance of *good* metadata annotation  Bridging provenance islands  Moreover….

Challenges: Complex cross-referencing information  Complex control flow  Iterative data and process flow  Repetitive running producing cross- referencing information  human interaction activities v.s. service invocations  Service failure and experiment re- composition Experiment run with interactions Experiment design file State controls Iterative service Revised experiment Experiment run with failures

Challenges Annotations: –Mandatory / automatic –Who did that –How much should be trusted –Security control –Authenticity validation –Quality –Cross-referencing –Versioning

Challenges: provenance islands Workflow 1 Service 2 Service 1 Data 1 Experimental Investigation 1 Diverse informatio n Diverse metadata of information

Moreover Intellectual property Preservation Archiving Query and access Integration Investigation Impact analysis ……

myGrid Approach Taverna workflow workbench –Provenance plug-in; –mIR(myGrid Information Repository) plug-in; myGrid information model –Based on CCLRC scientific metadata model –Providing shared model for services and components interactions Semantic Web technologies –RDF (Resource Description Framework) –Ontologies LSIDs and URNs B. Matthews and S. Sufi: The CLRC Scientific Metadata Model, version 1, DL TR 02001, CLRC, February 2001

RDF in a Nutshell Resource Description Language Common model for metadata A graph of triples RDQL, repositories, integration tools, presentation tools Jena, Haystack

data derivation e.g. output data derived from input data knowledge statements e.g. similar protein sequence to instanceOf partOf componentProcess e.g. web service invocation of NCBI componentEvent e.g. completion of a web service invocation at 12.04pm runBy e.g. NCBI run for Organisation level provenanceProcess level provenance Data/ knowledge level provenance hasInput hasOutput project Experiment design Workflow design Workflow run Person Organisation Service Process Event Data Blast Result DNA sequence User can add templates to each workflow process to determine knowledge links between data items. subClass

Representing links Identify link type –Again use URI –Allows us to use RDF infrastructure Repositories Ontologies urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3

Provenance Web LSID for GenBank Data Personalization view

Reflection  First attempt  Bridging the island  Provenance modelling: relational + schema-less model  Provenance collection  Moreover:  Provenance slicing  Security control  Authenticity validation  Provenance versioning and (long-time) preservation

Related Work Chimera:  Provenance cross-referencing – CombeChem: – PASOA (Provenance Aware Service Oriented Architecture) – CMCS (Collaboratory for Multi-Scale Chemical Science) – ESSW (Earth System Science Workbench) –

Acknowledgement –myGrid team: esp. Carole Goble, Robert Stevens, Chris Wroe, Mark Greenwood, Phil Lord –IBM: Dennis Quan –Williams Group Esp. Hannah Tipney