Data Provenance.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.
Computer Science and Engineering 1 What these organizations have in common? American Education Services, PA United States Marine Corps / Penn State University.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Descriptions Robert Grimm New York University. The Final Assignment…  Your own application  Discussion board  Think: Paper summaries  Time tracker.
Dr. Alexandra I. Cristea RDF.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
JOSH FLECK Semantic Web. What is Semantic Web? Movement led by W3C that promotes common formats for data on the web Describes things in a way that computer.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Computer Science and Engineering 1 XML, RDF, Workflow Security.
Practical RDF Chapter 1. RDF: An Introduction
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Logics for Data and Knowledge Representation
Usage of `provenance’: A Tower of Babel Luc Moreau.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
OWL-S. Web Services: OWL-S2 BPEL and WSDL : Messages.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
CSCE 201 Web Browser Security Fall CSCE Farkas2 Web Evolution Web Evolution Past: Human usage – HTTP – Static Web pages (HTML) Current: Human.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Open Sources -- Intelligence The GoodThe Bad The Ugly Challenges.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Data Integration and Management A PDB Perspective.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
Exploitation of Semantic Web Technology in ERP Systems Amin Andjomshoaa, Shuaib Karim Ferial Shayeganfar, A Min Tjoa (andjomshoaa, skarim, ferial,
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Dr. Bhavani Thuraisingham September 24, 2008 Building Trustworthy Semantic Webs Lecture #9: RDF and RDF Security.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
Csilla Farkas Department of Computer Science and Engineering University of South Carolina
Semantic Web COMS 6135 Class Presentation Jian Pan Department of Computer Science Columbia University Web Enhanced Information Management.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Linked Data & Semantic Web Technology The Semantic Web Part 7. RDF Semantics Dr. Myungjin Lee.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Web-Technology Lecture 13.
Web Data and Application Security
Building Trustworthy Semantic Webs
WEB SERVICES From Chapter 19 of Distributed Systems Concepts and Design,4th Edition, By G. Coulouris, J. Dollimore and T. Kindberg Published by Addison.
Introduction to the Semantic Web (tutorial) 2009 Semantic Technology Conference San Jose, California, USA June 15, 2009 Ivan Herman, W3C
Middleware independent Information Service
Knowledge Management Systems
knowledge organization for a food secure world
Active Data Management in Space 20m DG
Web Ontology Language for Service (OWL-S)
An Architecture for Complex Objects and their Relationships
Lifting Data Portals to the Web of Data
Zachary Cleaver Semantic Web.
Web services, WSDL, SOAP and UDDI
Lecture #6: RDF and RDF Security Dr. Bhavani Thuraisingham
LOD reference architecture
WEB SERVICES From Chapter 19, Distributed Systems
Semantic Web Basics (cont.)
Modeling Data Set Versioning Operations
Andrei G. Stoica and Csilla Farkas
Cultivating Semantics for Data in Agriculture and Nutrition
Presentation transcript:

Data Provenance

What is Data Provenance? Lineage and pedigree History of data Origin of Data Etc.  … record trail that accounts for the origin of a piece of data (in a database, document or repository) together with an explanation of how and why it got to the present place. (Encyclopedia of Database Systems, 2009)

Data History Origin of data (input, publish) Date of creation Data processing information (modification, extension, etc.) Metadata What data do I need to collect?

Workflow Provenance Coarse-grain provenance Record of history of the derivation of the final result May include: tracking interaction of programs input from external devices, e.g., sensors, and human interactions Performed for complex processing tasks

Data Provenance Fine-grain provenance Derivation of part of the resulting data set Description of the origin of the data and the process on how it arrived to the database Where-provenance: identifies the source elements where the data in the target is originated Why-provenance: justification for the data elements appearing in the output and how some parts of the input influenced certain parts of the output

Example What is the where-provenance? What is the why-provenance? From: Peter Buneman and Wang-Chiew Tan. 2007. Provenance in databases. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (SIGMOD '07). ACM, New York, NY, USA, 1171-1173. emp(ssn, name, deptid) dept(id, dname) SELECT emp.name, dept.name FROM emp, dept WHERE emp.deptid=dept.id; Answer(Kim, CS) What is the where-provenance? What is the why-provenance?

Provenance Applications Scientific Publications: regenerating results Input data information Process specific information: software used, system used, control flow, etc. Parameters of the experiment Different results? Why? Capture how results were achieved Reproducibility? Community sharing?

Trustworthiness and Accountability Origin and processing of data recorded Can enforce accountability on malicious sources/processing Can detect malfunctioning sources/processing components Can attribute high quality source/processing

Current Applications of Provenance data Databases: Data sharing and integration Web of data Linked data Digital Humanities Science Art Publishing IoT

Data Integration How to map ontologies? How to annotate data with semantics? How to propagate changes Back to the local database?

Web Evolution Past: Human usage HTTP Static Web pages (HTML) Current: Human and some automated usage Interactive Web pages Web Services (WSDL, SOAP, SAML) Semantic Web (RDF, OWL, RuleML, Web databases) XML technology (data exchange, data representation) Future: Semantic Web Services

Provenance Data Model Provenance Vocabulary Dataset Description level Data analysis level Experimental specification level Institutional level Provenance Vocabulary

Provenance Data Management Directly linked to data and follows data Represented in data dictionary Stored at separate location Usability?

Provenance Data Protection Accountability Piracy Malicious intent

Metadata Security No security model exists for metadata Can we use existing security models to protect metadata? RDF/S is the Basic Framework for SW RDF/S supports simple inferences

Correlated Inference Concept Generalization: weighted concepts, concept abstraction level, range of allowed abstractions Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base address fort Public Water source base Confidential district basin ?

Correlated Inference (cont.) Object[]. waterSource :: Object basin :: waterSource place :: Object district :: place address :: place base :: Object fort :: base Base Place place base Public address fort Public Water source district basin Water Source Water source base Confidential f

RDF/S Entailment Rules Example RDF/S Entailment Rules (http://www.w3.org/TR/rdf-mt/#rules ) Rdfs2: (aaa, rdfs:domain, xxx) + (uuu, aaa, yyy)  (uuu, rdf:type, xxx) Rdfs3: (aaa, rdfs:range, xxx) + (uuu, aaa, vvv) (vvv, rdf:type, xxx) Rdfs5: (uuu, rdfs:subPropertyOf, vvv) + (vvv, rdfs:subPropertyOf, xxx) (uuu,rdfs:subPropertyOf, xxx) Rdfs11: (uuu, rdfs:subClassOf, vvv)+(vvv, rdfs:subClassOf, xxx)(uuu,rdfs:subClassOf, xxx)

Example Graph Format RDF Triples: (Student, rdfs:subClassOf, Person) (University, rdfs:subClassOf, GovAgency) (studiesAt, rdfs:domain, Student) (studiesAt, rdfs:range,University) (studiesAt, rdfs:subPropertyOf, memberAt) (John, studiesAt, USC)

Example Graph Format

Example Graph Format

Example Graph Format

RDF Access Control Security Policy Default policy Conflict Resolution Subject Object – Object pattern Access Mode Default policy Conflict Resolution Classification of entailed data Flexible granularity

Next Class Febr. 28, XML