Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.

Slides:



Advertisements
Similar presentations
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Less is More Lightweight Ontologies and User Interfaces for Smart Labs J. G. Frey, G. V. Hughes, H. R. Mills, m. c. schraefel, G. M. Smith, David De Roure.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Information Infrastructure: Foundations for ABS Transformation Stuart Girvan, Australian Bureau of Statistics MSIS Paris, April 2013.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Oct 31, 2000Database Management -- Fall R. Larson Database Management: Introduction to Terms and Concepts University of California, Berkeley School.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
Lecture Two Database Environment Based on Chapter Two of this book:
Source: George Colouris, Jean Dollimore, Tim Kinderberg & Gordon Blair (2012). Distributed Systems: Concepts & Design (5 th Ed.). Essex: Addison-Wesley.
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Ricerca Distribuita Semantica Protocolli opensource per la condivisione di risorse online.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Enabling E Research ANU Data Commons. What is it ? Building a repository for data sets o data can be deposited o updated o published to Research Data.
Usage of `provenance’: A Tower of Babel Luc Moreau.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
Smart Lab, Smart Tea H. R. Mills, G. V. Hughes, m. c. schraefel, J. G. Frey, G. M. Smith, David De Roure CombeChem Project Electronics and Computer Science.
Phase II Additions to LSG Search capability to Gene Browser –Though GUI in Gene Browser BLAST plugin that invokes remote EBI BLAST service Working set.
Integrating Modeling Tools in the Development Lifecycle with OSLC Miami, October 2013 Adam Neal (Presenter) Maged.
Web: Minimal Metadata for Data Services Through DIALOGUE Neil Chue Hong AHM2007.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Knowledge Enabled Information and Services Science Glycomics project overview.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
Metadata Registries Registry: authoritative, centrally controlled store of information – W3C Web Services Glossary, 2004
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Children’s Health Exposure Analysis Resource (CHEAR) CHEAR Center for Data Science Susan Teitelbaum, PhD November 4, 2015.
UCL DEPARTMENT OF SPACE AND CLIMATE PHYSICS MULLARD SPACE SCIENCE LABORATORY Taverna Plugin VAMDC and HELIO (part of the ‘taverna-astronomy’ edition) Kevin.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Organising social science data – computer science perspectives Simon Jones Computing Science and Mathematics University of Stirling, Stirling, Scotland,
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
1 DMS-DQS-SUPSC03-PRE-12-E © DEIMOS Space S.L., 2007 A Semantic Data Grid for Satellite Mission Quality Analysis Reuben Wright Deimos Space.
1 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data. u A user’s view is immune to changes.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre
Life Science Identifiers Chris Wroe (based on material from myGrid team and IBM Life Sciences)
High throughput biology data management and data intensive computing drivers George Michaels.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
EMBL-EBI, programmatically - take a REST from manual searching: Sequence analysis tools Web Production Team Anna Foix Joon Lee.
LSIDs in Taverna Daniele Turi University of Manchester
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Content Management Systems
An ontology for e-Research
Code Analysis, Repository and Modelling for e-Neuroscience
Publishing data and metdata From iRODS to repositories
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Managing Private and Public Views of DDI Metadata Repositories
Code Analysis, Repository and Modelling for e-Neuroscience
Scientific Workflows Lecture 15
Presentation transcript:

Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers Different types of provenance Provenance generation Provenance storage Provenance retrieval

Problem Automated workflows produce lots of heterogeneous data These are just some of the results from one workflow run for Williams Disease

Amplification of results One input Many outputs

Link v Data Representation Data management questions refer to relationships rather than internal content –What are the origins of this data? Which service produced this data? Which data is this derived from? Who was this data produced for? ?What is this data telling me? Data analysis questions delegated to external services.

Representing links Identify each resource –Life science identifier: URI with associated data and metadata retrieval protocols. –Understanding that underlying data will not change urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3

Representing links II Identify link type –Again use URI –Allows us to use RDF infrastructure Repositories Ontologies urn:lsid:taverna.sf.net:datathing:45fg6urn:lsid:taverna.sf.net:datathing:23ty3

Workflow run Workflow design Experiment design Project Person Organisation Process Service Event Data item data derivation e.g. output data derived from input data knowledge statements e.g. similar protein sequence to instanceOf partOf componentProcess e.g. web service invocation of NCBI componentEvent e.g. completion of a web service invocation at 12.04pm runBy e.g. NCBI run for Organisation level provenanceProcess level provenance Data/ knowledge level provenance Provenance (1) User can add templates to each workflow process to determine links between data items.

Storing management metadata Automated generation of this web of links preferable Workflow enactor generates –LSIDs –Data derivation links –Knowledge links –Process links –Organisation links As RDF

Provenance generation Configuring and generating provenance within TavernaTaverna

Storage LSID has no protocol for storage Taverna/ Freefluo implements its own data/ metadata storage protocol Taverna/ Freefluo Metadata Store Data store Publish interface data metadata

Retrieval LSID protocol used to retrieve data and metadata Query handled separately Metadata Store Data store LSID interface LSID aware client Query RDF aware client

LSID launchpad Light weight plug in to Internet Explorer providing access to LSID data / metadata demo

Using IBM’s Haystack GenBank record Portion of the Web of provenance Managing collection of sequences for review