Presentation is loading. Please wait.

Presentation is loading. Please wait.

2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.

Similar presentations


Presentation on theme: "2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment."— Presentation transcript:

1 2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment of Datalog programs

2 2005lav-iii2  The InfoMaster system A LAV system implemented (~same time) by a PhD student at Stanford (AI) Same basic idea of defining sources as views over global schema A different algorithm, the inverse rule algorithm, proved later (97) to solve all the problems mentioned above, even for recursive Datalog queries (Used for integrating many internal data sources in Stanford)

3 2005lav-iii3  The Inverse rules algorithm The idea: Invert a view definition v(..) :- body to obtain rules that define the db relations in terms of the view Then combine with the given query Example : A db: a graph represented by the edge relation e(x, y) A view: v(X, Y) :- e(X, Z), e(Z, Y) // 2-steps only A recursive query: Q: (transitive closure) q(X, Y) :- e(X, Y) q(X, Y) :- e(X, Z), q(Z, Y)

4 2005lav-iii4 Step 1 – invert the view definition : The view: v(X, Y) :- e(X, Z), e(Z, Y) With exist: v(X, Y) :- exists Z. e(X, Z), e(Z, Y) Skolemize : v(X, Y) :- e(X, f(X,Y)), e(f(X,Y), Y) Invert: e(X, f(X,Y)) :- v(X, Y) e(f(X, Y), Y) :- v(X, Y) Step 2: Add the rules of Q: q(X, Y) :- e(X, Y) q(X, Y) :- e(X, Z), q(Z, Y) The query plan

5 2005lav-iii5 Assume the db is: 1.What are the facts in the view? v(a, c), v(b, d), v(c, e) 2. What are the db “facts” derived from the view? e(a, f(a,c)), e(f(a,c), c), e(b, f(b,d)), e(f(b,d), d), e(c, f(c,e)), e(f(c,e), e) a cbd e G: a cbd e f(a,c)f(b,d)f(c,e)

6 2005lav-iii6 3. What is the result of the combined set of rules on this db? -- the rules for q now compute a transitive closure: dist. 1 : q(X, Y) :- e(X, Y) q(a, f(a,c)), q(f(a,c), c), q(b, f(b,d)), q(f(b,d), d), q(c, f(c,e)), q(f(c,e), e), dist. 2 q(X, Y) :- e(X, Z), q(Z, Y) q(a,c), q(b,d), q(c, e), q(f(a,c), f(c,e)), dist. 3, dist. 4 : let’s do it 4. The facts w/o function symbols are the answer! q(a,c), q(b,d), q(c, e), q(a, e) a cbd e f(a,c)f(b,d)f(c,e)

7 2005lav-iii7 Note: The program above looks like an expensive way for answering a query; why? 1.We compute a full representative of the db. 2.Although in computing the view a join was evaluated, it is now re-evaluated.

8 2005lav-iii8 The Algorithm (for a set of views defined by CQ’s, a Datalog query P): For each view rule, with head vars X, replace each existential variable y in body by f(X), using a different function symbol for each variable, in each rule Invert the rules, to a set of rules that define the body atoms (db preds) in terms of the views: Add the program P : Compute Project on the atoms w/o function symbols: Note: rules of P that use db predicates not mentioned in the views are dropped first – these cannot derive answers from the views

9 2005lav-iii9 Notes: The program is not Datalog: It contains function symbols  it is a logic program but, in its evaluation, function symbols will not be nested  The evaluation on finite sources will terminate It is possible to eliminate the function symbols, to obtain a Datalog program (proof deferred) If the query is UCQ/nr-datalog, so is the query plan: the part (*) added to Q is a collection of UCQ’s (*) the inverted program is just one non-recursive layer – the db facts are computed by UCQ queries

10 2005lav-iii10 Thm: For every CQ view definitions V, Datalog query P on V, the program is a maximally contained query plan for P That is, for a db D with view extensions v1,…,vn, 1. 2. If P’ is any contained datalog plan, then (proof deferred) Thm: can be constructed in time polynomial in the size of V, Q (compare to the NP-completeness of finding a rewriting in previous approach)

11 2005lav-iii11 Thm : Given CQ view definitions V, Datalog query P on V, it is undecidable if there is an equivalent datalog query plan (proof omitted) But, if there is one, then the inverse rules algorithms is one, constructible in poly time in the size of V and Q! (given the program generated by the algorithm, we do not know, and cannot know if it is an equivalent rewriting!) What about the case that the query is CQ, or UCQ?


Download ppt "2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment."

Similar presentations


Ads by Google