Towards Constraint-based Explanations for Answers and Non-Answers Boris Glavic Illinois Institute of Technology Sean Riddle Athenahealth Corporation Sven.

Slides:



Advertisements
Similar presentations
What is an Extended Response?
Advertisements

CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Logic Programming Automated Reasoning in practice.
Data Flow Coverage. Reading assignment L. A. Clarke, A. Podgurski, D. J. Richardson and Steven J. Zeil, "A Formal Evaluation of Data Flow Path Selection.
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
The Power of How-to Queries joint work with Dan Suciu (University of Washington) Alexandra Meliou.
University of Washington Database Group Tiresias The Database Oracle for How-To Queries Alexandra Meliou § ✜ Dan Suciu ✜ § University of Massachusetts.
Ashish Kundu CS590F Purdue 02/12/07 Language-Based Information Flow Security Andrei Sabelfield, Andrew C. Myers Presentation: Ashish Kundu
Ontology and Application for Reusable Search Interface Design Plans for Advanced Semantic Technologies Final Project Eric Rozell, Tetherless World Constellation.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
A First Attempt towards a Logical Model for the PBMS PANDA Meeting, Milano, 18 April 2002 National Technical University of Athens Patterns for Next-Generation.
University of Washington Database Group Reverse Data Management … and the case for Reverse What-If queries 1 Alexandra Meliou, Wolfgang Gatterbauer, Dan.
Social Choice Theory By Shiyan Li. History The theory of social choice and voting has had a long history in the social sciences, dating back to early.
Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
How can Computer Science contribute to Research Publishing?
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Answering Imprecise Queries over Autonomous Web Databases Ullas Nambiar Dept. of Computer Science University of California, Davis Subbarao Kambhampati.
THE MODEL OF ASIS FOR PROCESS CONTROL APPLICATIONS P.Andreeva, T.Atanasova, J.Zaprianov Institute of Control and System Researches Topic Area: 12. Intelligent.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
1 Overview of Database Federation and IBM Garlic Project Presented by Xiaofen He.
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
CS609 Introduction. Databases Current state? Future?
CSE-291: Ontologies in Data & Process Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies.
1 On Provenance of Non-Answers for Queries over Extracted Data Jiansheng Huang Ting Chen AnHai Doan Jeffrey F. Naughton.
Ming Fang 6/12/2009. Outlines  Classical logics  Introduction to DL  Syntax of DL  Semantics of DL  KR in DL  Reasoning in DL  Applications.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
IDEAS 2011 I nternational D atabase E ngineering & A pplications S ymposium September 21-23, Lisbon – Portugal Aggregates and Priorities in P2P Data Management.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
REWERSE WG I3: Composition Framework & Tool demo 5 December 2006.
Essay Format The Expository Essay. Basic Structure 1 st Paragraph: Introduction 2 nd Paragraph: Body 3 rd Paragraph: Body 4 th Paragraph: Body 5 th Paragraph:
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Chapter 10. The Explorer System in Cognitive Systems, Christensen et al. Course: Robots Learning from Humans On, Kyoung-Woon Biointelligence Laboratory.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
ONTOLOGY ENGINEERING Lab #2 – September 8,
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
ece 627 intelligent web: ontology and beyond
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
Quality Assurance in the Presence of Variability Kim Lauenroth, Andreas Metzger, Klaus Pohl Institute for Computer Science and Business Information Systems.
The Object-Oriented Database System Manifesto Malcolm Atkinson, François Bancilhon, David deWitt, Klaus Dittrich, David Maier, Stanley Zdonik DOOD'89,
EEL 5937 Content languages EEL 5937 Multi Agent Systems Lecture 10, Feb. 6, 2003 Lotzi Bölöni.
1 Instance Store Database Support for Reasoning over Individuals S Bechhofer, I Horrocks, D Turi. Instance Store - Database Support for Reasoning over.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
CSE-291: Ontologies in Data Integration Department of Computer Science & Engineering University of California, San Diego CSE-291: Ontologies in Data Integration.
An argument-based framework to model an agent's beliefs in a dynamic environment Marcela Capobianco Carlos I. Chesñevar Guillermo R. Simari Dept. of Computer.
Optimization of Association Rules Extraction Through Exploitation of Context Dependent Constraints Arianna Gallo, Roberto Esposito, Rosa Meo, Marco Botta.
LDK R Logics for Data and Knowledge Representation Description Logics: family of languages.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
SysML and Modelica Integration Working Group Meeting 3/11/09 Peter Fritzson Wladimir Schamai.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
The Object-Oriented Database System Manifesto
Computing Full Disjunctions
Logic Based Query Languages
Chen Li Information and Computer Science
Datalog Inspired by the impedance mismatch in relational databases.
Theorems on Redundancy Identification
Introduction Dataset search
Presentation transcript:

Towards Constraint-based Explanations for Answers and Non-Answers Boris Glavic Illinois Institute of Technology Sean Riddle Athenahealth Corporation Sven Köhler University of California Davis Bertram Ludäscher University of Illinois Urbana-Champaign

Outline ① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work

Overview Introduce a unified framework for generalizing explanations for answers and non-answers Why/why-not question Q(t) Why is tuple t not in result of query Q? Explanation Provenance for the answer/non-answer Generalization Use an ontology to summarize and generalize explanations Computing generalized explanations for UCQs Use Datalog 1

Train-Example 2 2hop(X,Y) :- Train(X,Z), Train(Z,Y). Why can’t I reach Berlin from Chicago? Why-not 2hop(Chicago,Berlin) FromTo New YorkWashington DC New York Chicago New York …… BerlinMunich Berlin …… Seattle Chicago Washington DC New York Paris Berlin Munich Atlantic Ocean!

Train-Example Explanations 2hop(X,Y) :- Train(X,Z), Train(Z,Y). Missing train connections explain why Chicago and Berlin are not connected E.g., if there only would exist a train line between New York and Berlin: Train(New York, Berlin) ! 3 Seattle Chicago Washington DC New York Paris Berlin Munich Atlantic Ocean!

Why-not Approaches Two categories of data-based explanations for missing answers 1) Enumerate all failed rule derivations and why they failed (missing tuples) Provenance games 2) One set of missing tuples that fulfills optimality criterion e.g., minimal side-effect on query result e.g., Artemis, … 4

Why-not Approaches 1) Enumerate all failed rule derivations and why they failed (missing tuples) Exhaustive explanation Potentially very large explanations Train(Chicago,Munich), Train(Munich,Berlin) Train(Chicago,Seattle), Train(Seattle,Berlin) … 2) One set of missing tuples that fulfills optimality criterion Concise explanation that is optimal in a sense Optimality criterion not always good fit/effective Consider reach (transitive closure) Adding any train connection between USA and Europe - same effect on query result 5

Uniform Treatment of Why/Why-not Provenance and missing answer approaches have been treated mostly independently Observation: For provenance models that support query languages with “full” negation Why and why-not are both provenance computations! Q(X) :- Train(chicago,X). Why-not Q(New York) ? Equivalent to why Q’(New York) ? Q’(X) :- adom(X), not Q(X) 6

Outline ① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work

Unary Train-Example Q(X) :- Train(chicago,X). Why-not Q(berlin) Explanation: Train(chicago,berlin) Consider an available ontology! More general: Train(chicago,GermanCity) 7 Seattle Chicago Washington DC New York Paris Berlin Munich Atlantic Ocean!

Unary Train-Example Q(X) :- Train(chicago,X). Why-not Q(berlin) Explanation: Train(chicago,berlin) Consider an available ontology! Generalized explanation: Train(chicago,GermanCity) Most general explanation: Train(chicago,EuropeanCity) 8

Our Approach Explanations for why/why-not questions over UCQ queries Successful/failed rule derivations Utilize available ontology Expressed as inclusion dependencies “mapped” to instance E.g., city(name,country) GermanCity(X) :- city(X,germany). Generalized explanations Use concepts to describe subsets of an explanation Most general explanation Pareto-optimal 9

Related Work - Generalization ten Cate et al. High-Level Why-Not Explanations using Ontologies [PODS ‘15] Also uses ontologies for generalization We summarize provenance instead of query results! Only for why-not, but, extension to why trivial Other summarization techniques using ontologies Data X-ray Datalog-S (datalog with subsumption) 10

Outline ① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work

Rule derivations 11 What causes a tuple to be or not be in the result of a query Q? Tuple in result – exists >= 1 successful rule derivation which justifies its existence Existential check Tuple not in result - all rule derivations that would justify its existence have failed Universal check Rule derivation Replace rule variables with constants from instance Successful: body if fulfilled

Basic Explanations 12 A basic explanation for question Q(t) Why - successful derivations with Q(t) as head Why-not - failed rule derivations Replace successful goals with placeholder T Different ways to fail 2hop(Chicago,Munich) :- Train(Chicago,New York), Train(New York,Munich). 2hop(Chicago,Munich) :- Train(Chicago,Berlin), Train(Berlin,Munich). 2hop(Chicago,Munich) :- Train(Chicago,Paris), Train(Paris,Munich). Seattle Chicago Washington DC New York Paris Berlin Munich

Explanations Example 13 Why 2hop(Paris,Munich) ? 2hop(Paris,Munich) :- Train(Paris,Berlin), Train(Berlin,Munich). Seattle Chicago Washington DC New York Paris Berlin Munich

Outline ① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work

Generalized Explanation 14 Generalized Explanations Rule derivations with concepts Generalizes user question generalize a head variable 2hop(Chicago,Berlin) – 2hop(USCity,EuropeanCity) Summarizes provenance of (non-) answer generalize any rule variable 2hop(New York,Seattle) :- Train(New York,Chicago), Train(Chicago,Seattle). 2hop(New York,Seattle) :- Train(New York,USCity), Train(USCity,Seattle).

Generalized Explanation Def. 14 For user question Q(t) and rule r r(C 1,…,C n ) ① (C 1,…,C n ) subsumes user question ② headvars(C 1,…,C n ) only cover existing/ missing tuples ③ For every tuple t’ covered by headvars(C 1,…,C n ) all rule derivations for t’ covered are explanations for t’

Recap Generalization Example 15 r: Q(X) :- Train(chicago,X). Why-not Q(berlin) Explanation: r(berlin) Generalized explanation: r(GermanCity)

Most General Explanation 16 Domination Relationship r(C 1,…,C n ) dominates r(D 1,…,D n ) if for all i: C i subsumes D i and exists i: C i strictly subsumes D i Most General Explanation Not dominated by any other explanation Example most general explanation: r(EuropeanCity)

Outline ① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work

Datalog Implementation ①Rules for checking subsumption and domination of concept tuples ②Rules for successful and failed rule derivations Return variable bindings ③Rules that model explanations, generalization, and most general explanations 17

① Modeling Subsumption Basic concepts and concepts isBasicConcept(X) :- Train(X,Y). isConcept(X) :- isBasicConcept(X). isConcept(EuropeanCity). Subsumption (inclusion dependencies) subsumes(GermanCity,EuropeanCity). subsumes(X,GermanCity) :- city(X,germany). Transitive closure subsumes(X,Y) :- subsumes(X,Z), subsumes(Z,Y). Non-strict version subsumesEqual(X,X) :- isConcept(X). subsumesEqual(X,Y) :- subsumes(X,Y). 18

② Capture Rule Derivations Rule r 1 :2hop(X,Y) :- Train(X,Z), Train(Z,Y). Success and failure rules r 1 _success(X,Y,Z) :- Train(X,Z), Train(Z,Y). r 1 _fail(X,Y,Z) :- isBasicConcept(X), isBasicConcept(Y), isBasicConcept(Z), not r 1 _success(X,Y,Z). More general: r 1 (X,Y,Z,true,false) :- isBasicConcept(Y), Train(X,Z), not Train(Z,Y). 19

③ Model Generalization Explanation for Q(X) :- Train(chicago,X). expl_r 1 _success(C 1,B 1 ) :− subsumesEqual(B 1,C 1 ), r 1 _success(B 1 ), not has_r 1 _fail(C 1 ). User question: Q(B 1 ) Explanation: Q(C 1 ) :- Train(chicago, C 1 ). Q(B 1 ) exists and justified by r 1 : r 1 _success(B 1 ) r 1 succeeds for all B in C 1 : not has_r 1 _fail(C 1 ) 20

③ Model Generalization Explanation for Q(X) :- Train(chicago,X). expl_r 1 _success(C 1,B 1 ) :− subsumesEqual(B 1,C 1 ), r 1 _success(B 1 ), not has_r 1 _fail(C 1 ). 21

③ Model Generalization Domination dominated_r 1 _success(C 1,B 1 ) :- expl_r 1 _success(C 1,B 1 ), expl_r 1 _success(D 1,B 1 ), subsumes(C 1, D 1 ). Most general explanation most_gen_r 1 _success(C 1,B 1 ) :- expl_r 1 _success(C 1,B 1 ), not dominated_r 1 _success(C 1,B 1 ). Why question why(C 1 ) :- most_gen_r 1 _success(C 1,seattle ). 22

Outline ① Introduction ② Approach ③ Explanations ④ Generalized Explanations ⑤ Computing Explanations with Datalog ⑥ Conclusions and Future Work

Conclusions Unified framework for generalizing provenance-based explanations for why and why-not questions Uses ontology expressed as inclusion dependencies (Datalog rules) for summarizing explanations Uses Datalog to find most general explanations (pareto optimal) 23

Future Work I Extend ideas to other types of constraints E.g., denial constraints – German cities have less than 10M inhabitants :- city(X,germany,Z), Z > 10,000,000 Query returns countries with very large cities Q(Y) :- city(X,Y,Z), Z > 15,000,000 Why-not Q(germany) ? – Constraint describes set of (missing) data – Can be answered without looking at data Semantic query optimization? 24

Future Work II Alternative definitions of explanation or generalization – Our gen. explanations are sound, but not complete – Complete version Concept covers at least explanation – Sound and complete version: Concepts cover explanation exactly Queries as ontology concepts – As introduced in ten Cate 25

Future Work III Extension for FO queries – Generalization of provenance game graphs – Need to generalize interactions of rules Implementation – Integrate with our provenance game engine Powered by GProM! Negation - not yet Generalization rules - not yet 26

Questions? Boris – Bertram – h h

Relationship to (Constraint) Provenance Games 36