A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.

Slides:



Advertisements
Similar presentations
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Advertisements

CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Efficient Query Evaluation on Probabilistic Databases
S KEW IN P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2014.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Complexity 13-1 Complexity Andrei Bulatov Hierarchy Theorem.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Hardness Results for Problems P: Class of “easy to solve” problems Absolute hardness results Relative hardness results –Reduction technique.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
The Theory of NP-Completeness
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
NP-Complete Problems Problems in Computer Science are classified into
CSP, Algebras, Varieties Andrei A. Bulatov Simon Fraser University.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 10 Instructor: Paul Beame.
Complexity Issues Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
GRAPH Learning Outcomes Students should be able to:
C OMMUNICATION S TEPS F OR P ARALLEL Q UERY P ROCESSING Paraschos Koutris Paul Beame Dan Suciu University of Washington PODS 2013.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Chapter 4 Relations and Digraphs
Model Checking Lecture 3 Tom Henzinger. Model-Checking Problem I |= S System modelSystem property.
SAT and SMT solvers Ayrat Khalimov (based on Georg Hofferek‘s slides) AKDV 2014.
Q UERY -B ASED D ATA P RICING Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012.
A NSWERING C ONJUNCTIVE Q UERIES W ITH I NEQUALITIES Paris Koutris 1 Tova Milo 2 Sudeepa Roy 1 Dan Suciu 1 ICDT University of Washington 2 Tel Aviv.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Techniques for Proving NP-Completeness Show that a special case of the problem you are interested in is NP- complete. For example: The problem of finding.
1 Design and Analysis of Algorithms Yoram Moses Lecture 11 June 3, 2010
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Mathematical Preliminaries
Lecture 6 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
NP-Complete problems.
CSP: Algorithms and Dichotomy Conjecture Andrei A. Bulatov Simon Fraser University.
NP-completeness Section 7.4 Giorgi Japaridze Theory of Computability.
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
1 CSE 326: Data Structures: Graphs Lecture 24: Friday, March 7 th, 2003.
NPC.
 2004 SDU 1 Lecture5-Strongly Connected Components.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Honors Track: Competitive Programming & Problem Solving 2-Satisfiability José Kuiper.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Lesson 4-1 Solving linear system of equations by graphing
Lecture 2-2 NP Class.
Formal Modeling Concepts
Computing Full Disjunctions
Graph Algorithms Using Depth First Search
Queries with Difference on Probabilistic Databases
ICS 353: Design and Analysis of Algorithms
Chapter 11 Limitations of Algorithm Power
Instructor: Aaron Roth
Presentation transcript:

A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington

R EPAIRS An uncertain instance I for a schema with key constraints A repair r of I is a subinstance of I that satisfies the key constraints and is maximal 2 R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 3 ) (a 3, b 4 ) (a 4, b 4 ) (a 1, b 1 ) (a 2, b 2 ) (a 3, b 4 ) (a 4, b 4 ) (a 1, b 1 ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 4 ) (a 4, b 4 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 3 ) (a 4, b 4 ) The 4 possible repairs

C ONSISTENT Q UERY A NSWERING If Q is boolean, we say that I is certain for Q, I |= Q, if for every repair r of I, Q(r) is true 3 R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 2 ) (a 3, b 3 ) (a 3, b 4 ) (a 4, b 4 ) S(y, z) (b 1, c 1 ) (b 2, c 1 ) (b 2, c 2 ) (b 3, c 3 ) Q() = R(x, y), S(y, z) I |= Q

P ROBLEM S TATEMENT CERTAINTY(Q): Given as input an instance I, does I |= Q when Q is a boolean CQ? In general, CERTAINTY(Q) is in coNP – Q 1 = R(x, y), S(y, z) : expressible as a first-order query – Q 2 = R(x, y), S(z, y) : coNP-complete – Q 3 = R(x, y), S(y, x) : PTIME but not first-order expressible 4 Conjecture For every boolean conjunctive query Q, CERTAINTY(Q) is either in PTIME or coNP-complete

P ROGRESS SO F AR [Wijsen, 2010] – Syntactic characterization of FO-expressible acyclic CQs w/o self- joins [Kolaitis and Pema, 2012] – A trichotomy for CQs with 2 atoms and no self-joins [Wijsen, 2010 & 2013] – PTIME algorithm for cyclic queries: C k = R 1 (x 1,x 2 ), …, R k (x k, x 1 ) – Further classification of acyclic CQs w/o self-joins 5

O UR C ONTRIBUTION A dichotomy for CQs w/o self-joins where atoms have either Simple keys : R(x, y, z) Keys that consist of all attributes: S(x, y, z) 6 Theorem For every boolean CQ Q w/o self-joins where for each atom the key consists of either one attribute or all attributes, there exists a dichotomy of CERTAINTY(Q) into PTIME and coNP-complete

O UTLINE 1.The Dichotomy Condition 2.Frugal Repairs & Representable Answers 3.Strongly Connected Graphs 7

T HE Q UERY G RAPH We equivalently study boolean CQs consisting only of binary relations where one attribute is the key: R(x, y) Relations can be consistent (R c ) or inconsistent (R i ) Query Graph: a directed edge (u, v) for each atom R(u,v) 8 Q = R i (x, y), S i (z, w), T c (y, w) y w x S T R z G[Q] source node u R end node v R

D EFINITIONS x +,R : set of nodes reachable from node x once we remove the edge R (through a directed path) R ~ S [source-equivalent]: source nodes u R, u S are in the same SCC [R]: the equivalence class of R w.r.t ~ 9 y R z x T S v w u x +,R = {x, v, w} R ~ T and [R] = {R, T} V U

C OUPLED E DGES coupled + (R) = edges in [R] + any inconsistent edge S s.t. the source node u S is connected to the end node v R through a (undirected) path that does not intersect with u R +,R 10 y = v R R z x = u R T S v w u = u V coupled + (R): contains R,T: [R] = {R, T} contains V: path from y (= v R ) to u (= u V ) does not contain U V U The set u R +,R

S PLITTABLE G RAPHS Two inconsistent edges R, S are coupled if – S in coupled + (R) & R in coupled + (S) A graph G[Q] is: – unsplittable if it contains a pair of coupled edges that are not source-equivalent. – splittable otherwise 11 y R z x T S v w u V U coupled + (R) = {R, T, V} coupled + (T) = {R, T, V} coupled + (V) = {V} coupled + (U) = {U,V,R,T} Only R,T are coupled SPLITTABLE!

T HE D ICHOTOMY C ONDITION 12 y R z x T S v w u V U Dichotomy Theorem If G[Q] is splittable, CERTAINTY(Q) is in PTIME If G[Q] is unsplittable, CERTAINTY(Q) is coNP- complete Splittable, so in PTIME

E XAMPLES 13 PTIME R(x, y), S(y, z) coNP-complete R(x, y), S(y, z), T c (x, z) x y z x y z PTIME R(x, y), S(y, z), U c (z, y) x y z coNP-complete R(x, y), S(z, y), U c (y, z) x y z

O UTLINE 1.The Dichotomy Condition 2.Frugal Repairs & Representable Answers 3.Strongly Connected Graphs 14

F RUGAL R EPAIRS (1) 15 Definition A repair r of an instance I is frugal for a boolean query Q if for any other repair r’ of I, Q f (r’) is not strictly contained in Q f (r) R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 3 ) (a 3, b 4 ) (a 4, b 4 ) S(y, x) (b 1, a 1 ) (b 3, a 2 ) (b 4, a 3 ) (b 4, a 4 ) repair r 1 = { R(a 1, b 1 ), R(a 2, b 3 ), R(a 3, b 4 ), R(a 4, b 4 ) S(b 1, a 1 ), S(b 3, a 2 ), S(b 4, a 3 ) } Q f (r 1 ) = { (a 1, b 1 ), (a 2, b 3 ), (a 3, b 4 ) } repair r 2 = { R(a 1, b 2 ), R(a 2, b 3 ), R(a 3, b 4 ), R(a 4, b 4 ) S(b 1, a 1 ), S(b 3, a 2 ), S(b 4, a 3 ) } Q f (r 2 ) = { (a 2, b 3 ), (a 3, b 4 ) } not frugal frugal Q f = all body variables to the head (full query)

R(x, y) (a 1, b 1 ) (a 1, b 2 ) (a 2, b 3 ) (a 3, b 4 ) (a 4, b 4 ) S(y, x) (b 1, a 1 ) (b 3, a 2 ) (b 4, a 3 ) (b 4, a 4 ) F RUGAL R EPAIRS (2) 16 I |= Q if and only if every frugal repair satisfies Q We lose no generality if we study only frugal repairs! Only two frugal repairs: Q f (r 2 ) = {(a 2, b 3 ), (a 3, b 4 )} Q f (r 3 ) = {(a 2, b 3 ), (a 4, b 4 )}

O R -S ETS 17 Efficiently represent all answer sets of frugal repairs We use or-sets: means 1 or 2 or 3 – A = – We can “compress” A as B = {, } – [Libkin and Wong, ‘93] “decompression” α operator: α(B) = A The or-set of answer sets for frugal repairs of I for Q: – M Q (I) = Compressed form (set of or-sets): – A Q (I) = {, }

R EPRESENTABILITY (1) 18 An or-set-of-sets S is representable if there exists a set-of- or-sets S 0 (compression) such that: – α(S 0 ) = S – For any distinct or-sets A, B in S 0, the tuples in A and B use distinct constants in all coordinates The compression of a representable set with active domain of size n has size polynomial in n {, } compressionnot representable

R EPRESENTABILITY (2) 19 I |= Q iff the compression A Q (I) is not empty If we can compute A Q (I) in polynomial time, deciding whether I |= Q is in PTIME Theorem If G[Q] is a strongly connected graph, M Q (I) is representable and its compression can be computed in polynomial time in the size of I

O UTLINE 1.The Dichotomy Condition 2.Frugal Repairs & Representable Answers 3.Strongly Connected Graphs 20

C YCLES 21 C k = R 1 (x 1, x 2 ), R 2 (x 2, x 3 )…, R k (x k, x 1 ) The purified instance contains a collection of disjoint SCCs ALGORITHM FrugalC – Find the SCCs that contain no directed cycle of length > k – For each such SCC i, create an or-set A i that contains all cycles of length k – Output A Ck (I) = {A 1, A 2, …} R(x, y) (a 1, b 1 ) (a 2, b 2 ) (a 2, b 3 ) S(y, z) (b 1, c 1 ) (b 2, c 2 ) (b 3, c 2 ) T(z, x) (c 1, a 1 ) (c 2, a 2 ) a1a1 b1b1 c1c1 a2a2 b2b2 c2c2 b3b3 A C3 (I) = {, }

G ENERAL C ASE : SCC S (1) 22 Recursively split a SCC G into a SCC G’ and a directed path P that intersects G’ only at its start and end node The set A G’ (I) can be recursively computed x y R S T t U V Graph G’ The path P = y -- > t -- > z A G’ (I) = {, } A1A1 A2A2 z

G ENERAL C ASE : SCC S (2) 23 A G’ (I) = {, } A1A1 A2A2 B(a, b) (A 1, [a 1 b 1 c 1 ]) (A 2, [a 2 b 2 c 2 ]) (A 2, [a 2 b 3 c 2 ]) B 1 c (b, y) ([a 1 b 1 c 1 ], b 1 ) ([a 2 b 2 c 2 ], b 2 ) ([a 2 b 3 c 2 ], b 3 ) B 2 c (b, z) ([a 1 b 1 c 1 ], c 1 ) ([a 2 b 2 c 2 ], c 2 ) ([a 2 b 3 c 2 ], c 2 ) B 0 c (z, b) (c 1, A 1 ) (c 2, A 2 ) Any value belongs in a unique or-set a y t U V b B B1cB1c z B2cB2c B0cB0c Replacement of G’ A cycle C = a -> b -> y -> t -> z -> a + a chord B 2 that is a consistent relation

R EST O F THE P ROOF 24 PTIME algorithm for splittable graphs – Find a separator in G[Q] (always exists if a graph is splittable) – The separator splits G[Q] into cases with fewer inconsistent edges, which are solved recursively – Base case: all edges are consistent (check whether Q(I) is true) coNP-hardness – Reduction from the Monotone-3SAT problem

C ONLUSIONS 25 Significant progress towards proving the dichotomy for the complexity of Certain Query Answering for Conjunctive Queries Settle the dichotomy (or trichotomy) even for queries with self-joins!

Thank you ! 26