Presentation is loading. Please wait.

Presentation is loading. Please wait.

TOON CALDERS Deductive databases.

Similar presentations


Presentation on theme: "TOON CALDERS Deductive databases."— Presentation transcript:

1 TOON CALDERS T.CALDERS@TUE.NL Deductive databases

2 Important Information This material was developed by Dr. Toon Calders and prof. Dr. Jan Paredaens at University of Technology, Eindhoven. The original document can be accessed at the following address: http://wwwis.win.tue.nl/~tcalders/teaching/advancedDB/ All changes and adaptations are my own responsability, Rogelio Dávila Pérez

3 Motivation: Deductive DB Motivation is two-fold:  add deductive capabilities to databases; the database contains:  facts (extensional relations)  rules to generate derived facts (intensional relations) Database is knowledge base  Extend the querying  datalog allows for recursion

4 Motivation: Deductive DB Datalog as engine of deductive databases  similarities with Prolog  has facts and rules  rules define -possibly recursive- views Semantics not always clear  safety  negation  recursion

5 Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

6 Syntax of Datalog Datalog query/program:  facts  traditional relational tables  rules  define intensional views Rules  if-then rules  can contain recursion  can contain negations Semantics of program can be ambiguous

7 Syntax of Datalog Example father(X,Y) :- person(X,m), parent(X,Y). grandson(X,Y) :- parent(Y,Z), parent(Z,X), person(X,m). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs

8 Syntax of Datalog Variables: X, Y Constants: m, f, rita, … Positive literal: p(t 1,…,t n )  p is the name of a relation (EDB or IDB)  t 1, …, t n constants or variables Negative literal: not p(t 1, …, t n ) Rule: h :- l 1, …, l n  h positive literal, l 1, …, l n literals

9 Syntax of Datalog Variables: X, Y Constants: m, f, rita, … Positive literal: p(t 1,…,t n )  p is the name of a relation (EDB or IDB)  t 1, …, t n constants or variables Negative literal: not p(t 1, …, t n ) Rule: h :- l 1, …, l n  h positive literal, l 1, …, l n literals In Datalog: Classic logical negation ( In contrast to Prolog’s negation by failure )

10 Syntax of Datalog Rules can be recursive Arithmetic operations considered as special predicates  A { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4219250/slides/slide_10.jpg", "name": "Syntax of Datalog Rules can be recursive Arithmetic operations considered as special predicates  A

11 Outline Syntax of the Datalog language Semantics of a Datalog program  non-recursive  recursive datalog  aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

12 Semantics of Non-Recursive Datalog Programs Ground instantiation of a rule h :- l 1, …, l n : replace every variable in the rule by a constant Example: father(X,Y) :- person(X,m), parent(X,Y) instantiation: father(toon,an) :- person(toon,m), parent(toon,an).

13 Semantics of Non-Recursive Datalog Programs Let I be a set of facts The body of a rule instantiation R’ is satisfied by I if:  every positive literal in the body of R’ is in I  no negative literal in the body of R’ is in I Example: person(toon,m), parent(toon,an) not satisfied by the facts given before

14 Semantics of Non-Recursive Datalog Programs Let I be a set of facts R is a rule h :- l 1, …, l n Infer(R,I) = { h’ :  h’ :- l 1 ’, …, l n ’ is a ground instantiation of R  l 1 ’ … l n ’ is satisfied by I } R={R 1, …, R n } Infer(R,I) = Infer(R 1,I)  …  Infer(R n,I)

15 Semantics of Non-Recursive Datalog Programs A rule h :- l 1, …, l n is in layer 1:  l 1, …, l n only involve extensional predicates A rule h :- l 1, …, l n is in layer i  for all 0 { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4219250/slides/slide_15.jpg", "name": "Semantics of Non-Recursive Datalog Programs A rule h :- l 1, …, l n is in layer 1:  l 1, …, l n only involve extensional predicates A rule h :- l 1, …, l n is in layer i  for all 0

16 Semantics of Non-Recursive Datalog Programs Let I 0 be the facts in a datalog program Let R 1 be the rules at layer 1 … Let R n be the rules at layer n I 1 = I 0  Infer(R 1, I 0 ) I 2 = I 1  Infer(R 2, I 1 ) … I n = I n-1  Infer(R n, I n-1 )

17 Semantics of Non-Recursive Datalog Programs Example: person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y).

18 Semantics of Non-Recursive Datalog Programs person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). Stratum 0 Valuation:

19 Semantics of Non-Recursive Datalog Programs person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). Stratum 0 father alextoon janan toonbernd toonmattijs hbrothers berndmattijs mattijsberndmattijsbernd Stratum 1 Valuation:

20 Semantics of Non-Recursive Datalog Programs person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). Stratum 0 father grandfather alextoon jan mattijs janan jan bernd toonbernd alex mattijs toonmattijs alex bernd hbrothers berndmattijs mattijsberndmattijsbernd Stratum 1Stratum 2 Valuation:

21 Caveat: Correct Negation Negation in Datalog  Negation in Prolog Prolog negation (Negation by failure):  not(p(X)) “is true if we fail to prove p(X)” Datalog negation (Correct negation):  not(p(X)) “binds X to a value such that p(X) does not hold.”

22 Caveat: Correct Negation Example: father(a,b). person(a). person(b). nfather(X) :- person(X), not( father (X,Y) ), person(Y). Datalog: ? nfather(X)  { (a), (b) } Prolog: ? nfather(X)  { (b) } ? person(a), not(father(a,a)), person(a)  yes

23 Caveat: Correct Negation Prolog:  Order of the clauses is important nfather(X) :- person(X), not( father (X,Y) ), person(Y). versus nfather(X) :- person(X), person(Y), not( father (X,Y) ).  Order of the rules is important Datalog:  Order not important  More declarative

24 Caveat: Correct Negation Difference is not fundamental: Prolog: nfather(X) :- person(X), not( father (X,Y) ).  Datalog: nfather(X) :- person(X), not(father_of_someone(X)). father_of_someone(X) :- father (X,Y).

25 Caveat: Correct Negation Difference is not fundamental:  Many systems that claim to implement Datalog, actually implement negation by failure.  Debating on whether or not this is correct is pointless; both perspectives are useful  Check on beforehand how an engine implements negation … Throughout the course, in all exercises, in the exam, …, we assume correct negation.

26 Safety A rule can make no sense if variables appear in funny ways Examples: S(x) :- R(y) S(x) :- not R(x) S(x) :- R(y), x { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4219250/slides/slide_26.jpg", "name": "Safety A rule can make no sense if variables appear in funny ways Examples: S(x) :- R(y) S(x) :- not R(x) S(x) :- R(y), x

27 Safety Even when not leading to infinite relations, such Datalog Programs can be domain-dependent. Example: s(a,b). s(a,a). r(a). r(b). t(X) :- not(s(X,Y)), r(X). If domain is {a,b}: only t(b) holds. If domain is {a,b,c} not only t(b), but also t(a) holds ( Ground instantiation t(a) :- not(s(a,c)), r(a). )

28 Safety Therefore, we will only consider rules that are safe. A rule h :- l 1, …, l n is safe if:  every variable in the head of the rule: also in a non- arithmetic positive literale in body  every variable in a negative literal of the body: also in some positive literal of the body

29 Model-Theoretic Semantics A model M of a Datalog program is:  An instatiation of all intensional relations in the program  That satisfies all rules in the program  If the body of a ground instantiation of a rule holds in M, also the head must hold Some models are special …

30 Model-Theoretic Semantics father(a,b). person(X) :- father(X,Y). person(Y) :- father(X,Y). M1: fatherperson ab a b M2:fatherperson ab a ba ba

31 Model-Theoretic Semantics A model is minimal if « we cannot remove tuples » M1: fatherperson ab a b M2:fatherperson ab a ba ba Minimal Not Minimal

32 Model-Theoretic Semantics For non-recursive, safe datalog programs semantics is well defined  The model = all facts that can be derived from the program  Closed-World Assumption: if a fact cannot be derived from the database, then it is not true  Is a minimal model

33 Model-Theoretic Semantics Minimal model is, however, not necessarily unique Example: r(a). t(X) :- r(X), not s(X). minimal models: M1 M2 rstrst aaaa

34 Outline Syntax of the Datalog language Semantics of a Datalog program  non-recursive  recursive datalog  aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

35 Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y) reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). Fixpoint of a set of rules R, starting with set of facts I: repeat Old_I := I I := I  infer(R,I) until I = Old_I Always termination (inflationary fixpoint)

36 Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y) reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). Step 0: reach = { } Step 1: reach = { (a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d) } Step 2: reach = {(a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d), (a,c) } Step 3: reach = {(a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d), (a,c) }STOP

37 Semantics of Recursive Datalog Programs Datalog without negation:  Always a unique minimal model. Semantics of recursive datalog with negation is less clear. Example: T(a). R(X) :- T(X), not S(X). S(X) :- T(X), not R(X). What about R(a)? S(a)?

38 Semantics of Recursive Datalog Programs For some classes of Datalog queries with negation still a natural semantics can be defied Important class: stratified programs  T depends on S if some rule with T in the head contains S or (recursively) some predicate that depends on S, in the body.  Stratified program: If T depends on (not S), then S cannot depend on T or (not T).

39 Semantics of Recursive Datalog Programs The program: T(a). R(X) :- T(X), not S(X). S(X) :- T(X), not R(X). is not stratified; R depends negatively on S S depends negatively on R R T S

40 Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y). g reachnode unreach

41 Semantics of Recursive Datalog Programs If a program is stratified, the tables in the program can be partitioned into strata:  Stratum 0: All database tables.  Stratum I: Tables defined in terms of tables in Stratum I and lower strata.  If T depends on (not S), S is in lower stratum than T.

42 Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y). g reachnode unreach 0 1 2

43 Semantics of Recursive Datalog Programs Semantics of a stratified program given by:  First, compute the least fixpoint of all tables in Stratum 1. (Stratum 0 tables are fixed.)  Then, compute the least fixpoint of tables in Stratum 2; then the lfp of tables in Stratum 3, and so on, stratum-by-stratum.

44 Semantics of Recursive Datalog Programs Fixpoint of a set of rules R, starting with set of facts I: repeat Old_I := I I := I  infer(R,I) until I = Old_I Fixpoint within one stratum always terminates  Due to monotonicity within the strata  Only positive dependence between tables in stratum l. Due to finite program, number of strata is finite as well

45 Semantics of Recursive Datalog Programs Stratum 0: g(a,b). g(b,c). g(a,d). Stratum 1: node(a), node(b), node(c), node(d), reach(a,a), reach(b,b), reach(c,c), reach(d,d), reach(a,b), reach(b,c), … Stratum 2: unreach(b,a), unreach(c,a), …

46 Outline Syntax of the Datalog language Semantics of a Datalog program  non-recursive  recursive datalog  aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

47 Aggregate Operators The notation in the head indicates grouping; the remaining arguments (X, in this example) are the GROUP BY fields. In order to apply such a rule, must have all of relation g available. Stratification with respect to use of is similar to negation. Degree(X, SUM ( )) :- g(X,Y).

48 Aggregate Operators bi(X,Y) :- g(X,Y).g bi(Y,X) :- g(X,Y). Degree(X, SUM ( )) :- bi(X,Y).bi degree

49 Aggregate Operators bi(X,Y) :- g(X,Y).g bi(Y,X) :- g(X,Y). Degree(X, SUM ( )) :- bi(X,Y).bi degree 0 1 2

50 Aggregate Operators bi(X,Y) :- g(X,Y).g bi(Y,X) :- g(X,Y). Degree(X, SUM ( )) :- bi(X,Y).bi degree Compute stratum by stratum Assume strata 1  k fixed when computing k+1 0 1 2

51 Aggregate Operators r(a,b). r(a,c). s(a,d). t(X,SUM( )) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y).

52 Aggregate Operators r(a,b). r(a,c). s(a,d).t t(X,SUM( )) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y).rs

53 Aggregate Operators r(a,b). r(a,c). s(a,d).t t(X,SUM( )) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y).rs “a is aggregating over a moving target” Step 1:t(a,2) is added Step 2:r(a,d)is added Step 3:t(a,3) added, t(a,2) no longer true … hence r(a,d) should not have been added

54 Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = Safe Datalog with negation and without recursion Optimization techniques Conclusions

55 RA = Non-Recursive Datalog Every operator of RA can be simulated by non-recursive datalog  Project on the first attribute of the ternary relation r: query (A) :–r(A, B, C).  Cartesian product of relations r 1 and r 2. query (X 1, X 2,..., X n, Y 1, Y 1, Y 2,..., Y m ) :– r 1 (X 1, X 2,..., X n ), r 2 (Y 1, Y 2,..., Y m ).  Union of relations r 1 and r 2. query (X 1, X 2,..., X n ) :–r 1 (X 1, X 2,..., X n ), query (X 1, X 2,..., X n ) :–r 2 (X 1, X 2,..., X n ),  Set difference of r 1 and r 2. query (X 1, X 2,..., X n ) :–r 1 (X 1, X 2,..., X n ), not r 2 (X 1, X 2,..., X n )

56 RA = Non-Recursive Datalog Every operator of RA can be simulated by non- recursive datalog  Result of our construction is always safe, and equivalent for stratified semantics  $1=$3 ((  $1 R) x R)  query 1 (A) :- R(A,B). query 2 (A,B,C) :- query 1 (A), R(B,C). result(A,B,A) :- query 2 (A,B,A)

57 RA = Non-Recursive Datalog Every rule can be expressed by one RA expression:  Translate every atom separately  Negation/arithmetic: use “complement” construction Essential: safety  Combine atoms with Cartesian product  Do the joins with a selection  Project on the relevant attributes Strata determine the order of evaluation Because of no recursion: every rule only executed once.

58 RA = Non-Recursive Datalog sister(X,Y) :- person(X,f), parent(Z,X), parent(Z,Y), not(X=Y). person(X,f):  $2=‘f’ Person parent(Z,X) and parent(Z,Y): Parent not(X=Y): complement construction X comes from parent(Z,X)   $2 Parent Y from parent(Z,Y)   $2 Parent not(X=Y)   $1  $2 (  $2 Parent x  $2 Parent)

59 RA = Non-Recursive Datalog sister(X,Y) :- person(X,f), parent(Z,X), parent(Z,Y), not(X=Y).  $1,$6  $1=$4, $3=$5, $1=$7, $6=$8 (  $2=‘f’ Person x Parent x Parent x  $1  $2 (  $2 Parent x  $2 Parent))

60 RA = Non-Recursive Datalog Hence, the following two are equivalent in expressive power:  Safe Datalog with negation, without recursion or aggregation, under the stratified semantics  Relational Algebra Every rule separately can be expressed by a relational algebra expression  Makes it very suitable for implementation on top of a relational database

61 Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = Datalog with negation and without recursion Optimization techniques Conclusions

62 Evaluation of Datalog Programs Running example: root(r). child(r,a). child(r,b). child(a,c). child(a,d). child(c,e). child(d,f). child(b,h). sg(X,Y) :- root(X),root(Y). sg(X,Y) :- child(X,U), sg(U,V), child(Y,V). r ab cd fe h

63 Evaluation of Datalog Programs: Issues Repeated inferences: recursive rules are repeatedly applied in the naïve way; same inferences in several iterations. Unnecessary inferences: if we just want to find sg of a particular node, say e, computing the fixpoint of the sg program and then selecting tuples with e in the first column is wasteful, in that we compute many irrelevant facts.

64 Evaluation of Datalog Programs Running example: Query: ? sg(e,X) 1. (r, r) 2. (a,a), (b,b), (a,b), (b,a) 3. (c,c), (c,d), (c,h), (d,c), (d,d), … 4. (e,e), (f,f), (e,f), (f,e) r ab cd fe h

65 Avoiding Repeated Inferences Seminaive Fixpoint Evaluation: Avoid repeated inferences: at least one of the body facts generated in the most recent iteration.  For each recursive table P, use a table delta_P.  Rewrite the program to use the delta tables. A second evaluation of the rule: r(X,Y) :- s(X), t(Y), u(X,Z), v(Y,Z). only gives new tuples (X,Y) for ground instantiations in which at least one of the atoms is new.

66 Avoiding Unnecessary Inferences Still, in the running example:  many unnecessary deductions when query is: ? sg(e,X) Compare with top-down  as in Prolog  only facts that are connected to the ultimate goal are being considered

67 The “Prolog Way” sg(X,Y) :- root(X),root(Y). sg(X,Y) :- child(X,U), sg(U,V), child(Y,V). ? sg(e,X). try: root(e) FAIL try: child(e,U)  U=c try: sg(c,V) try: root(e) FAIL try: child(c,U)  U=a … r ab cd fe h

68 The “Prolog Way” sg(X,Y) :- root(X),root(Y). sg(X,Y) :- child(X,U), sg(U,V), child(Y,V). ? sg(e,X). try: root(e) FAIL try: child(e,U)  U=c try: sg(c,V) try: root(e) FAIL try: child(c,U)  U=a … r ab cd fe h

69 “Magic Sets” Idea We want to do something similar for Datalog Idea: Define a “filter” table: computes all relevant values, restricts the computation of sg(e,X). sg(X,Y) :- m(X), root(X), root(Y). sg(X,Y) :- m(X), child(X,U), sg(U,V),child(Y,V). m(X) :- m(Y), child(Y,X). m(e).

70 Magic Sets It is always possible to do this in such a way that bottom-up becomes as efficient as top-down! Different proposals exist in literature  how to introduce the magic filters

71 Optimization Techniques Many other techniques exist as well  Standard relational indexing techniques  (partly) materializing intensional relations on beforehand  Trade-off memory  query time performance  (See also the OLAP-part for a similar technique)  Different representations for relations  BDD (Stanford)

72 Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = Datalog with negation and without recursion Optimization techniques Conclusions

73 Datalog adds deductive capabilities to databases  extensional relations  intensional relations Datalog without recursion, with negation  safety requirement  stratification  equal in power to relational algebra  Closed World Assumption

74 Conclusions Datalog without Negation  Always a unique minimal model Datalog with negation and recursion  semantics not always clear  stratified negation Evaluation of datalog queries:  without negation = RA-optimization  with recursion:  semi-naive recursion  magic sets

75 Conclusions Very nice idea, but … Deductive databases did not make it as a database paradigm Yet, many ideas survived  recursion in SQL … And others may re-surface in future.  Increasing need for adding meta-information in databases


Download ppt "TOON CALDERS Deductive databases."

Similar presentations


Ads by Google