Presentation is loading. Please wait.

Presentation is loading. Please wait.

Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.

Similar presentations


Presentation on theme: "Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004."— Presentation transcript:

1 Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004

2 Outline my Grid Motivation Challenges my Grid approach Related work Conclusions

3 myGrid Project http://www.mygrid.co.uk A pilot e-Science project in U.K.; Target at biologists and bioinformatician; Three bio-test beds: Providing middleware services in a Grid environment, which are orchestrated in the mechanism of workflows;

4 e-Science in silico Experiments (workflows) Automate the process of experiments; Orchestrate distributed resources and Web/Grid services; Transparent, seamless access to remote data and computation resources Increase the collaboration and results sharing across multi-scale communities Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisatio n Forming experiments Executing and monitoring experiments Soaplab

5 Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; Scientists A large repository of zipped records about experiments!! frequently updated resources; volatile, distributed environment

6 Problems when doing in silico experiments Experiments being performed repeatedly, at different sites, different times, by different users or groups; Scientists verification of data; “recipes” for experiment designs; explanation for the impact of changes; ownership; performance of services; data quality; PROVENANCE

7 Provenance Forms Derivations –A workflow log. –Linking items, in a directed graph. –when, who, how, which, what, where –Execution Process-centric Annotations –Attached to items or collections of items, in a structured, semi-structured or free text form. –Annotations on one item or linking items. –why, when, where, who, what, how. –Data-centric mass = 200 decay = WW stability = 1 event = 8 mass = 200 decay = WW stability = 1 plot = 1 mass = 200 decay = WW plot = 1 mass = 200 decay = WW event = 8 mass = 200 decay = WW stability = 1 mass = 200 decay = WW stability = 3 mass = 200 decay = WW mass = 200 decay = ZZ mass = 200 decay = bb mass = 200 plot = 1 mass = 200 event = 8 mass = 200 decay = WW stability = 1 LowPt = 20 HighPt = 10000

8 Challenges  cross-referencing across runs and within experiment;  Provenance of *good* metadata annotation  Bridging provenance islands  Moreover….

9 Challenges: Complex cross-referencing information  Complex control flow  Iterative data and process flow  Repetitive running producing cross- referencing information  human interaction activities v.s. service invocations  Service failure and experiment re- composition Experiment run with interactions Experiment design file State controls Iterative service Revised experiment Experiment run with failures

10 Challenges Annotations: –Mandatory / automatic –Who did that –How much should be trusted –Security control –Authenticity validation –Quality –Cross-referencing –Versioning

11 Challenges: provenance islands Workflow 1 Service 2 Service 1 Data 1 Experimental Investigation 1 Diverse informatio n Diverse metadata of information

12 Moreover Intellectual property Preservation Archiving Query and access Integration Investigation Impact analysis ……

13 myGrid Approach Taverna workflow workbench –Provenance plug-in; –mIR(myGrid Information Repository) plug-in; myGrid information model –Based on CCLRC scientific metadata model –Providing shared model for services and components interactions Semantic Web technologies –RDF (Resource Description Framework) –Ontologies LSIDs and URNs http://taverna.sourceforge.net http://freefluo.sourceforge.net B. Matthews and S. Sufi: The CLRC Scientific Metadata Model, version 1, DL TR 02001, CLRC, February 2001

14 RDF in a Nutshell Resource Description Language Common model for metadata A graph of triples RDQL, repositories, integration tools, presentation tools Jena, Haystack http://www.w3.org/RDF/

15 data derivation e.g. output data derived from input data knowledge statements e.g. similar protein sequence to instanceOf partOf componentProcess e.g. web service invocation of BLAST @ NCBI componentEvent e.g. completion of a web service invocation at 12.04pm runBy e.g. BLAST @ NCBI run for Organisation level provenanceProcess level provenance Data/ knowledge level provenance hasInput hasOutput project Experiment design Workflow design Workflow run Person Organisation Service Process Event Data Blast Result DNA sequence User can add templates to each workflow process to determine knowledge links between data items. subClass

16 Representing links Identify link type –Again use URI –Allows us to use RDF infrastructure Repositories Ontologies urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3 http://www.mygrid.org.uk/ontology#derived_from

17 Provenance Web LSID for GenBank Data Personalization view

18 Reflection  First attempt  Bridging the island  Provenance modelling: relational + schema-less model  Provenance collection  Moreover:  Provenance slicing  Security control  Authenticity validation  Provenance versioning and (long-time) preservation

19 Related Work Chimera:  Provenance cross-referencing –www.griphyn.org/chimera/ CombeChem: –www.combechem.org/ PASOA (Provenance Aware Service Oriented Architecture) –http://twiki.pasoa.ecs.soton.ac.uk/bin/view/PASOA/WebHome CMCS (Collaboratory for Multi-Scale Chemical Science) –http://cmcs.ca.sandia.gov/index.php ESSW (Earth System Science Workbench) –http://essw.bren.ucsb.edu/

20 Acknowledgement –myGrid team: esp. Carole Goble, Robert Stevens, Chris Wroe, Mark Greenwood, Phil Lord –IBM: Dennis Quan –Williams Group Esp. Hannah Tipney


Download ppt "Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004."

Similar presentations


Ads by Google