LSIDs in Taverna Daniele Turi University of Manchester RDF, Ontologies and Metadata, Edinburgh, 7-9/6/06
Outline Taverna Workbench: LSIDs used to identify: workflows of biological services LSIDs used to identify: data, workflows, workflow runs LSIDs and Named Graphs LSID Resolution Security (under development) LSID granularity
myGrid eScience project biological workflows compose web services execute discover audit/provenance
myGrid eScience project biological workflows compose web services execute discover audit/provenance Taverna
myGrid eScience project biological workflows compose web services execute discover audit/provenance Taverna Provenance Service Annotation/ Discovery
Taverna Workbench Large user community in biology about 1,000 downloads per month one release each 6 weeks Collect and browse provenance new feature (released 2 days ago!)
Taverna Workbench
Provenance as RDF RDF generated automatically audit trail RDF is typed (semantics!) 1 RDF graph for each workflow run named graph
Workflow Run urn:lsid:…:workflow:6 urn:lsid:…:org:HY7 runs urn:lsid:..:wfRun:HU77I8 belongsTo launchedBy urn:lsid:…:person:4 hasInput hasInput urn:lsid:…:dataItem:K84P urn:lsid:…:dataItem:51HJ3
Typed Workflow Run Provenance Ontology urn:lsid:…:workflow:6 launchedBy Provenance Ontology hasInput WorkflowRun Workflow DataObject Experimenter Organization runs belongsTo urn:lsid:…:workflow:6 urn:lsid:…:org:HY7 runs urn:lsid:..:wfRun:HU77I8 belongsTo launchedBy urn:lsid:…:person:4 hasInput hasInput urn:lsid:…:dataItem:K84P urn:lsid:…:dataItem:51HJ3
LSIDs LSIDs used to identify: Workflow runs LSIDs are names of graphs data, workflows, workflow runs internal external LSIDs not used (call by value) Taverna 2 (call by reference) near future data and workflows (and people and organizations!) Workflow runs LSIDs are names of graphs
Storage Named RDF graphs retrieve whole workflow runs implementation in Sesame2 native store scalable alpha release (bugs) NG4J (Jena + MySQL) scalability issues Future implementations: Oracle and Boca
LSID Resolution Implemented but not deployed Resolution returns obstacle: single user v enterprise virtual organisation Resolution returns only data for workflows and data only metadata for workflow runs Data v Metadata why data immutable and metadata mutable?
Security LSID granularity very good Policies (in XACML) easily expressed in terms of LSIDs LSID spec does not mention https and credentials IBM Java Toolkit supports credentials
Security Policy Scenario See policySet.xml on myGrid wiki supervisors can access all workflow runs in the organization students can access only their own workflow runs blacklisted users cannot access anything See policySet.xml on myGrid wiki
Conclusions LSIDs Named Graphs Ontologically typed RDF persistence Ontologically typed RDF Mutable v immutable identified with metadata v data Credentials not part of LSID spec LSID granularity for security