Presentation is loading. Please wait.

Presentation is loading. Please wait.

Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.

Similar presentations


Presentation on theme: "Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers."— Presentation transcript:

1 Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers Different types of provenance Provenance generation Provenance storage Provenance retrieval

2 Problem Automated workflows produce lots of heterogeneous data These are just some of the results from one workflow run for Williams Disease

3 Amplification of results One input Many outputs

4 Link v Data Representation Data management questions refer to relationships rather than internal content –What are the origins of this data? Which service produced this data? Which data is this derived from? Who was this data produced for? ?What is this data telling me? Data analysis questions delegated to external services.

5 Representing links Identify each resource –Life science identifier: URI with associated data and metadata retrieval protocols. –Understanding that underlying data will not change urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3

6 Representing links II Identify link type –Again use URI –Allows us to use RDF infrastructure Repositories Ontologies urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3 http://www.mygrid.org.uk/ontology#derived_from

7 Workflow run Workflow design Experiment design Project Person Organisation Process Service Event Data item data derivation e.g. output data derived from input data knowledge statements e.g. similar protein sequence to instanceOf partOf componentProcess e.g. web service invocation of BLAST @ NCBI componentEvent e.g. completion of a web service invocation at 12.04pm runBy e.g. BLAST @ NCBI run for Organisation level provenanceProcess level provenance Data/ knowledge level provenance Provenance (1) User can add templates to each workflow process to determine links between data items.

8 Storing management metadata Automated generation of this web of links preferable Workflow enactor generates –LSIDs –Data derivation links –Knowledge links –Process links –Organisation links As RDF

9 Provenance generation Configuring and generating provenance within TavernaTaverna

10 Storage LSID has no protocol for storage Taverna/ Freefluo implements its own data/ metadata storage protocol Taverna/ Freefluo Metadata Store Data store Publish interface data metadata

11 Retrieval LSID protocol used to retrieve data and metadata Query handled separately Metadata Store Data store LSID interface LSID aware client Query RDF aware client

12 LSID launchpad Light weight plug in to Internet Explorer providing access to LSID data / metadata demo

13 Using IBM’s Haystack GenBank record Portion of the Web of provenance Managing collection of sequences for review


Download ppt "Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers."

Similar presentations


Ads by Google