2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.

Slides:



Advertisements
Similar presentations
Completeness and Expressiveness
Advertisements

CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
10 October 2006 Foundations of Logic and Constraint Programming 1 Unification ­An overview Need for Unification Ranked alfabeths and terms. Substitutions.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
F22H1 Logic and Proof Week 7 Clausal Form and Resolution.
First-order Set Theory Chapter 15 Language, Proof and Logic.
Efficient Query Evaluation on Probabilistic Databases
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
Catriel Beeri Pls/Winter 2004/5 type reconstruction 1 Type Reconstruction & Parametric Polymorphism  Introduction  Unification and type reconstruction.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
Winter 2004/5Pls – inductive – Catriel Beeri1 Inductive Definitions (our meta-language for specifications)  Examples  Syntax  Semantics  Proof Trees.
Search in the semantic domain. Some definitions atomic formula: smallest formula possible (no sub- formulas) literal: atomic formula or negation of an.
2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.
2005conjunctive1 Query languages, equivalence & containment  conjunctive queries – CQ’s  More expressive languages.
Foundations of Semantic Web Databases Gutierrez, Hurtado and Mendelzon Presented by: Nir Zepkowitz.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Linear Systems The definition of a linear equation given in Chapter 1 can be extended to more variables; any equation of the form for real numbers.
1 First order theories. 2 Satisfiability The classic SAT problem: given a propositional formula , is  satisfiable ? Example:  Let x 1,x 2 be propositional.
Chapter 4: A Universal Program 1. Coding programs Example : For our programs P we have variables that are arranged in a certain order: Y 1 X 1 Z 1 X 2.
The Game of Algebra or The Other Side of Arithmetic The Game of Algebra or The Other Side of Arithmetic © 2007 Herbert I. Gross by Herbert I. Gross & Richard.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
Theoretical basis of GUHA Definition 1. A (simplified) observational predicate language L n consists of (i) (unary) predicates P 1,…,P n, and an infinite.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
MATH 224 – Discrete Mathematics
The Relational Model: Relational Calculus
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Logical Inference 2 rule based reasoning
The Bernays-Schönfinkel Fragment of First-Order Autoepistemic Logic Peter Baumgartner MPI Informatik, Saarbrücken.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
CS344: Introduction to Artificial Intelligence Lecture: Herbrand’s Theorem Proving satisfiability of logic formulae using semantic trees (from Symbolic.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 28– Interpretation; Herbrand Interpertation 30 th Sept, 2010.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
1 Relational Algebra and Calculas Chapter 4, Part A.
Unification Algorithm Input: a finite set Σ of simple expressions Output: a mgu for Σ (if Σ is unifiable) 1. Set k = 0 and  0 = . 2. If Σ  k is a singleton,
CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.
1 CA 208 Logic PQ PQPQPQPQPQPQPQPQ
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
1 Finite Model Theory Lecture 1: Overview and Background.
1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.
Querying Big Data by Accessing Small Data Wenfei FanUniversity of Edinburgh & Beihang University Floris GeertsUniversity of Antwerp Yang CaoUniversity.
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
Inference in First Order Logic. Outline Reducing first order inference to propositional inference Unification Generalized Modus Ponens Forward and backward.
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding Combined Theories.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
1/20 Arrays Changki PSWLAB Arrays Daniel Kroening and Ofer Strichman Decision Procedure.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
1 Section 7.1 First-Order Predicate Calculus Predicate calculus studies the internal structure of sentences where subjects are applied to predicates existentially.
Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar Dr Nazir A. Zafar Advanced Algorithms Analysis and Design.
Extensions of Datalog Wednesday, February 13, 2001.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
Automata, Grammars and Languages
Containment Mappings Canonical Databases Sariaya’s Algorithm
Lecture 10: Query Complexity
MA/CSSE 474 More Math Review Theory of Computation
Data Exchange: Semantics and Query Answering
Local-as-View Mediators
Consider the function Note that for 1 from the right from the left
Presentation transcript:

2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity results for certain answers

2005certain2  Views – an incomplete representation Given: a view def V, view extension I Sound V: I is contained in V(D) Complete V: I contains V(D) Precise V: I = V(D) V may also be mixed: some views are sound, others are complete In general, more than one db D may exist s.t.

2005certain3 Example : teams in World Cup Soccer Tournament Global scheme : Team(country, group) (gr – assignment for 1 st round) Source1: S-C(C) – the countries that participate Source2 : S-Q(C) -- countries that participated in qualifying games Source3 : S-T(C) – teams whose games will be on T.V For all three, the logical mapping is v(X) :- Team(X, Y)

2005certain4 Given V (including a specification in s/c/p) and I poss(V,I) = {D | D is a db for which I is a possible view} Since we have only the views, this is the set of possible databases. For sound views : an infinite set For complete views : contains the empty db For precise views : may be empty -- inconsistent views Example : v1(X, Y) :- R(X, Y, Z), v1={(a, b), (b, c)} v2(X,Z) :- R(X, Y, Z), v2={(a, d), (c, e)} * The above changes when the global db is known to satisfy constraints (e.g. keys)

2005certain5  Certain and possible answers Now, assume also a query Q cert(Q, V, I) – seems easier to compute, always finite poss(Q, V, I) – may be infinite and where do we obtain values not in I? A possible approach: a finite representation of a possibly infinite family of partially unknown databases

2005certain6 We concentrate on certain answers -- an absolute notion of answering queries using views Cert(Q, V, I) depends on soundness/completeness of views Example : global : p(x, y) v1(x) :- p(x, y), v2(y):- p(x, y) I = {v1(a), v2(b)} Q: q(x, y) :- p(x, y) Sound views : cert(Q, V, I) is empty Precise views : cert(Q, V, I) is {(a, b)}

2005certain7 An issue in query processing : For same example, let Q’ : s(x) :- p(x, y) To allow relational algebra manipulation of certain answers, we need more than a simple relational representation! We need algorithms for performing operations on representations of partially unknown db’s (not in this course)

2005certain8 From now : sound views, certain answers Was investigated for views defined in L1, query defined in L2, where L1, L2 in {CQ, CQ!=, NR-Datalog, Datalog, FO} Results include: Complexity – lower bounds Algorithms – upper bounds

2005certain9  Complexity results for certain answers Thm : for V in L1, Q in L2, the following are equivalent: (a) computing cert(Q, V, I) (b) deciding containment: is Q1 (in L1) contained in Q2 (in L2)? (a) is decidable iff (b) is When decidable, combined complexity of (a) = query complexity of (b)  data complexity of (a) <= query complexity of (b) [ Data complexity: function of db size Query complexity: function of query size Combined : both ]

2005certain10 Proof (sketch) :  given t, how hard to decide if t is in cert(Q, V, I)? Let I = {vi(tij)}, define Q’ by Q’ contains the rules that define V, and one more “large” rule: (t follows from facts in I) Claim: Hence deciding if t in cert(Q, V, I) is no harder than this containment (Note: for L1 = CQ, need to “massage” Q’ into CQ)

2005certain11  How hard to check containment of Q1 in Q2? let p be a new predicate Define V by: rules of Q1, and v(c) :- q1(X), p(X), let I = {v(c)} Define Q by: rules of Q2, and q(c) :- q2(X), p(X) Then: (c) is in cert(Q, V, I) iff Q1 is contained in Q2

2005certain12 Consequences : computing certain answers (depends on L1, L2) Is: undecidable for Datalog, FO decidable if: one side <= datalog, other side <= nr-datalog For decidable cases, the above gives combined complexity, We are interested more in data complexity; here it is Co-NP data complexity is bad: impractical to compute, no datalog plan! We will not prove co-NP complexity results FODatalognr-datalogCQ!=CQViews\query undecPPCo-NPPCQ undecPPCo-NPPCQ!= undecCo-NP nr-datalog undec Co-NPundecCo-NPDatalog undec FO same

2005certain13 Claim : For Q in Datalog, V in CQ(!=), let V~ be the same view def, with inequalities omitted Then cert(Q, V, I) = cert(Q, V~, I) (Computing the certain answers from I using V w/o the inequalities gives same results) Proof : (b) If t is in cert(Q, V~, I), then for each D in poss(V~, I), t in Q(D) If D also in poss(V, I) -- fine If D not in poss(V, I), exists larger D’ in poss(V, I) s.t. t is in Q(D’) Hence, t is in cert(Q, V, I)

2005certain14 Proof of last claim: some s in I, but s not in V(D), because of some inequality Since s is in V(D’’), inequality involves attribute in view body  can add some tuples to D so obtain D1, s.t. s is in V(D1)  adding for all such s gives D’ that contains D, s.t. D’ is in poss(V, I)  If t in Q(D’), since Q has no inequalities, t also in Q(D)

2005certain15 For CQ views, Datalog queries, Query plan: datalog program P on V exp(P) – replace views by their definitions (using fresh names for existential variables) P is maximally-contained in Q: exp(P)(D) is contained in Q(D) exp(P’)(D) is contained in ep(P)(D) for all other plans P’ Such a plan is best among all plans (This is a language-dependent notion – given a more expressive language, P may not be best any more) But, if a plan delivers cert(Q, V, I) it is absolutely best

2005certain16 Thm : For CQ sound views, Datalog queries, the inverse rules algorithm computes cert(Q, V, I) (Thus, for this case, a Datalog query plan can give the absolute best possible answer) Corollary: If P is max-cont(Q) then, for all view instances, I P(I) = cert(Q, V, I) we proceed to prove the theorem

2005certain17 Def: A tableau is a collection of atoms, with constants and variables A tableau T represents a db D: there is a valuation from T into D Rep(T) = {D | for some h, D contains H(T) }

2005certain18 Claim : For a Datalog query Q, tableau T cert(Q, rep(T)) = the tuples w/o variables in Q(T) Proof : (a)Can consider only D in rep(T) s.t. D = h(T) every tuple in Q(D’) but not in Q(D) where D’ is larger than h(T) is not in cert(Q, rep(T)) (b) For such D, h(Q(T)) = Q(D)  a ground tuple in Q(T) is in cert(Q, rep(T)) (c) For a non-ground t tuple in Q(T), can find D1, D2 in rep(T) that give different values to variables in t  no instance of this tuple is in cert(Q, rep(T))

2005certain19 The inverse rules of V create from a view I a database with elements that are skolem functions. Consider each skolem term to be a distinct variable  This is a tableau T(V, I) Claim : T(V, I) represents poss(V, I) Proof : easy Corollary : is cert(Q, V, I) This is precisely what the inverse rule algorithm produces: For each I, the inverse rules produce T(V, I), then apply Q end of story Next: one more (last) algorithm, for CQ queries and views, that is fastest so far