Download presentation

Presentation is loading. Please wait.

1
TOON CALDERS Deductive databases

2
Important Information This material was developed by Dr. Toon Calders and prof. Dr. Jan Paredaens at University of Technology, Eindhoven. The original document can be accessed at the following address: All changes and adaptations are my own responsability, Rogelio Dávila Pérez

3
Motivation: Deductive DB Motivation is two-fold: add deductive capabilities to databases; the database contains: facts (extensional relations) rules to generate derived facts (intensional relations) Database is knowledge base Extend the querying datalog allows for recursion

4
Motivation: Deductive DB Datalog as engine of deductive databases similarities with Prolog has facts and rules rules define -possibly recursive- views Semantics not always clear safety negation recursion

5
Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

6
Syntax of Datalog Datalog query/program: facts traditional relational tables rules define intensional views Rules if-then rules can contain recursion can contain negations Semantics of program can be ambiguous

7
Syntax of Datalog Example father(X,Y) :- person(X,m), parent(X,Y). grandson(X,Y) :- parent(Y,Z), parent(Z,X), person(X,m). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs

8
Syntax of Datalog Variables: X, Y Constants: m, f, rita, … Positive literal: p(t 1,…,t n ) p is the name of a relation (EDB or IDB) t 1, …, t n constants or variables Negative literal: not p(t 1, …, t n ) Rule: h :- l 1, …, l n h positive literal, l 1, …, l n literals

9
Syntax of Datalog Variables: X, Y Constants: m, f, rita, … Positive literal: p(t 1,…,t n ) p is the name of a relation (EDB or IDB) t 1, …, t n constants or variables Negative literal: not p(t 1, …, t n ) Rule: h :- l 1, …, l n h positive literal, l 1, …, l n literals In Datalog: Classic logical negation ( In contrast to Prolog’s negation by failure )

10
Syntax of Datalog Rules can be recursive Arithmetic operations considered as special predicates A**
**

11
Outline Syntax of the Datalog language Semantics of a Datalog program non-recursive recursive datalog aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

12
Semantics of Non-Recursive Datalog Programs Ground instantiation of a rule h :- l 1, …, l n : replace every variable in the rule by a constant Example: father(X,Y) :- person(X,m), parent(X,Y) instantiation: father(toon,an) :- person(toon,m), parent(toon,an).

13
Semantics of Non-Recursive Datalog Programs Let I be a set of facts The body of a rule instantiation R’ is satisfied by I if: every positive literal in the body of R’ is in I no negative literal in the body of R’ is in I Example: person(toon,m), parent(toon,an) not satisfied by the facts given before

14
Semantics of Non-Recursive Datalog Programs Let I be a set of facts R is a rule h :- l 1, …, l n Infer(R,I) = { h’ : h’ :- l 1 ’, …, l n ’ is a ground instantiation of R l 1 ’ … l n ’ is satisfied by I } R={R 1, …, R n } Infer(R,I) = Infer(R 1,I) … Infer(R n,I)

15
Semantics of Non-Recursive Datalog Programs A rule h :- l 1, …, l n is in layer 1: l 1, …, l n only involve extensional predicates A rule h :- l 1, …, l n is in layer i for all 0

16
Semantics of Non-Recursive Datalog Programs Let I 0 be the facts in a datalog program Let R 1 be the rules at layer 1 … Let R n be the rules at layer n I 1 = I 0 Infer(R 1, I 0 ) I 2 = I 1 Infer(R 2, I 1 ) … I n = I n-1 Infer(R n, I n-1 )

17
Semantics of Non-Recursive Datalog Programs Example: person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y).

18
Semantics of Non-Recursive Datalog Programs person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). Stratum 0 Valuation:

19
Semantics of Non-Recursive Datalog Programs person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). Stratum 0 father alextoon janan toonbernd toonmattijs hbrothers berndmattijs mattijsberndmattijsbernd Stratum 1 Valuation:

20
Semantics of Non-Recursive Datalog Programs person parent NameSexParentChild ingridfingridtoon alexmalextoon ritafritaan janmjanan toonmanbernd anfanmattijs berndmtoonbernd mattijsmtoonmattijs father(X,Y) :- person(X,m), parent(X,Y). grandfather(X,Y) :- father(X,Y), parent(Y,Z). hbrothers(X,Y) :- person(X,m), person(Y,m), parent(Z,X), parent(Z,Y). Stratum 0 father grandfather alextoon jan mattijs janan jan bernd toonbernd alex mattijs toonmattijs alex bernd hbrothers berndmattijs mattijsberndmattijsbernd Stratum 1Stratum 2 Valuation:

21
Caveat: Correct Negation Negation in Datalog Negation in Prolog Prolog negation (Negation by failure): not(p(X)) “is true if we fail to prove p(X)” Datalog negation (Correct negation): not(p(X)) “binds X to a value such that p(X) does not hold.”

22
Caveat: Correct Negation Example: father(a,b). person(a). person(b). nfather(X) :- person(X), not( father (X,Y) ), person(Y). Datalog: ? nfather(X) { (a), (b) } Prolog: ? nfather(X) { (b) } ? person(a), not(father(a,a)), person(a) yes

23
Caveat: Correct Negation Prolog: Order of the clauses is important nfather(X) :- person(X), not( father (X,Y) ), person(Y). versus nfather(X) :- person(X), person(Y), not( father (X,Y) ). Order of the rules is important Datalog: Order not important More declarative

24
Caveat: Correct Negation Difference is not fundamental: Prolog: nfather(X) :- person(X), not( father (X,Y) ). Datalog: nfather(X) :- person(X), not(father_of_someone(X)). father_of_someone(X) :- father (X,Y).

25
Caveat: Correct Negation Difference is not fundamental: Many systems that claim to implement Datalog, actually implement negation by failure. Debating on whether or not this is correct is pointless; both perspectives are useful Check on beforehand how an engine implements negation … Throughout the course, in all exercises, in the exam, …, we assume correct negation.

26
Safety A rule can make no sense if variables appear in funny ways Examples: S(x) :- R(y) S(x) :- not R(x) S(x) :- R(y), x

27
Safety Even when not leading to infinite relations, such Datalog Programs can be domain-dependent. Example: s(a,b). s(a,a). r(a). r(b). t(X) :- not(s(X,Y)), r(X). If domain is {a,b}: only t(b) holds. If domain is {a,b,c} not only t(b), but also t(a) holds ( Ground instantiation t(a) :- not(s(a,c)), r(a). )

28
Safety Therefore, we will only consider rules that are safe. A rule h :- l 1, …, l n is safe if: every variable in the head of the rule: also in a non- arithmetic positive literale in body every variable in a negative literal of the body: also in some positive literal of the body

29
Model-Theoretic Semantics A model M of a Datalog program is: An instatiation of all intensional relations in the program That satisfies all rules in the program If the body of a ground instantiation of a rule holds in M, also the head must hold Some models are special …

30
Model-Theoretic Semantics father(a,b). person(X) :- father(X,Y). person(Y) :- father(X,Y). M1: fatherperson ab a b M2:fatherperson ab a ba ba

31
Model-Theoretic Semantics A model is minimal if « we cannot remove tuples » M1: fatherperson ab a b M2:fatherperson ab a ba ba Minimal Not Minimal

32
Model-Theoretic Semantics For non-recursive, safe datalog programs semantics is well defined The model = all facts that can be derived from the program Closed-World Assumption: if a fact cannot be derived from the database, then it is not true Is a minimal model

33
Model-Theoretic Semantics Minimal model is, however, not necessarily unique Example: r(a). t(X) :- r(X), not s(X). minimal models: M1 M2 rstrst aaaa

34
Outline Syntax of the Datalog language Semantics of a Datalog program non-recursive recursive datalog aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

35
Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y) reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). Fixpoint of a set of rules R, starting with set of facts I: repeat Old_I := I I := I infer(R,I) until I = Old_I Always termination (inflationary fixpoint)

36
Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y) reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). Step 0: reach = { } Step 1: reach = { (a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d) } Step 2: reach = {(a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d), (a,c) } Step 3: reach = {(a,a), (b,b), (c,c), (d,d), (a,b), (b,c), (a,d), (a,c) }STOP

37
Semantics of Recursive Datalog Programs Datalog without negation: Always a unique minimal model. Semantics of recursive datalog with negation is less clear. Example: T(a). R(X) :- T(X), not S(X). S(X) :- T(X), not R(X). What about R(a)? S(a)?

38
Semantics of Recursive Datalog Programs For some classes of Datalog queries with negation still a natural semantics can be defied Important class: stratified programs T depends on S if some rule with T in the head contains S or (recursively) some predicate that depends on S, in the body. Stratified program: If T depends on (not S), then S cannot depend on T or (not T).

39
Semantics of Recursive Datalog Programs The program: T(a). R(X) :- T(X), not S(X). S(X) :- T(X), not R(X). is not stratified; R depends negatively on S S depends negatively on R R T S

40
Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y). g reachnode unreach

41
Semantics of Recursive Datalog Programs If a program is stratified, the tables in the program can be partitioned into strata: Stratum 0: All database tables. Stratum I: Tables defined in terms of tables in Stratum I and lower strata. If T depends on (not S), S is in lower stratum than T.

42
Semantics of Recursive Datalog Programs g(a,b). g(b,c). g(a,d). reach(X,X) :- g(X,Y). reach(Y,Y) :- g(X,Y). reach(X,Y) :- g(X,Y). reach(X,Z) :- reach(X,Y), reach(Y,Z). node(X) :- g(X,Y). node(Y) :- g(X,Y). unreach(X,Y) :- node(X), node(Y), not reach(X,Y). g reachnode unreach 0 1 2

43
Semantics of Recursive Datalog Programs Semantics of a stratified program given by: First, compute the least fixpoint of all tables in Stratum 1. (Stratum 0 tables are fixed.) Then, compute the least fixpoint of tables in Stratum 2; then the lfp of tables in Stratum 3, and so on, stratum-by-stratum.

44
Semantics of Recursive Datalog Programs Fixpoint of a set of rules R, starting with set of facts I: repeat Old_I := I I := I infer(R,I) until I = Old_I Fixpoint within one stratum always terminates Due to monotonicity within the strata Only positive dependence between tables in stratum l. Due to finite program, number of strata is finite as well

45
Semantics of Recursive Datalog Programs Stratum 0: g(a,b). g(b,c). g(a,d). Stratum 1: node(a), node(b), node(c), node(d), reach(a,a), reach(b,b), reach(c,c), reach(d,d), reach(a,b), reach(b,c), … Stratum 2: unreach(b,a), unreach(c,a), …

46
Outline Syntax of the Datalog language Semantics of a Datalog program non-recursive recursive datalog aggregation Relational algebra = safe Datalog with negation and without recursion Optimization techniques Conclusions

47
Aggregate Operators The notation in the head indicates grouping; the remaining arguments (X, in this example) are the GROUP BY fields. In order to apply such a rule, must have all of relation g available. Stratification with respect to use of is similar to negation. Degree(X, SUM ( )) :- g(X,Y).

48
Aggregate Operators bi(X,Y) :- g(X,Y).g bi(Y,X) :- g(X,Y). Degree(X, SUM ( )) :- bi(X,Y).bi degree

49
Aggregate Operators bi(X,Y) :- g(X,Y).g bi(Y,X) :- g(X,Y). Degree(X, SUM ( )) :- bi(X,Y).bi degree 0 1 2

50
Aggregate Operators bi(X,Y) :- g(X,Y).g bi(Y,X) :- g(X,Y). Degree(X, SUM ( )) :- bi(X,Y).bi degree Compute stratum by stratum Assume strata 1 k fixed when computing k

51
Aggregate Operators r(a,b). r(a,c). s(a,d). t(X,SUM( )) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y).

52
Aggregate Operators r(a,b). r(a,c). s(a,d).t t(X,SUM( )) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y).rs

53
Aggregate Operators r(a,b). r(a,c). s(a,d).t t(X,SUM( )) :- r(X,Y). r(X,Y) :- t(X,Z), Z=2, s(X,Y).rs “a is aggregating over a moving target” Step 1:t(a,2) is added Step 2:r(a,d)is added Step 3:t(a,3) added, t(a,2) no longer true … hence r(a,d) should not have been added

54
Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = Safe Datalog with negation and without recursion Optimization techniques Conclusions

55
RA = Non-Recursive Datalog Every operator of RA can be simulated by non-recursive datalog Project on the first attribute of the ternary relation r: query (A) :–r(A, B, C). Cartesian product of relations r 1 and r 2. query (X 1, X 2,..., X n, Y 1, Y 1, Y 2,..., Y m ) :– r 1 (X 1, X 2,..., X n ), r 2 (Y 1, Y 2,..., Y m ). Union of relations r 1 and r 2. query (X 1, X 2,..., X n ) :–r 1 (X 1, X 2,..., X n ), query (X 1, X 2,..., X n ) :–r 2 (X 1, X 2,..., X n ), Set difference of r 1 and r 2. query (X 1, X 2,..., X n ) :–r 1 (X 1, X 2,..., X n ), not r 2 (X 1, X 2,..., X n )

56
RA = Non-Recursive Datalog Every operator of RA can be simulated by non- recursive datalog Result of our construction is always safe, and equivalent for stratified semantics $1=$3 (( $1 R) x R) query 1 (A) :- R(A,B). query 2 (A,B,C) :- query 1 (A), R(B,C). result(A,B,A) :- query 2 (A,B,A)

57
RA = Non-Recursive Datalog Every rule can be expressed by one RA expression: Translate every atom separately Negation/arithmetic: use “complement” construction Essential: safety Combine atoms with Cartesian product Do the joins with a selection Project on the relevant attributes Strata determine the order of evaluation Because of no recursion: every rule only executed once.

58
RA = Non-Recursive Datalog sister(X,Y) :- person(X,f), parent(Z,X), parent(Z,Y), not(X=Y). person(X,f): $2=‘f’ Person parent(Z,X) and parent(Z,Y): Parent not(X=Y): complement construction X comes from parent(Z,X) $2 Parent Y from parent(Z,Y) $2 Parent not(X=Y) $1 $2 ( $2 Parent x $2 Parent)

59
RA = Non-Recursive Datalog sister(X,Y) :- person(X,f), parent(Z,X), parent(Z,Y), not(X=Y). $1,$6 $1=$4, $3=$5, $1=$7, $6=$8 ( $2=‘f’ Person x Parent x Parent x $1 $2 ( $2 Parent x $2 Parent))

60
RA = Non-Recursive Datalog Hence, the following two are equivalent in expressive power: Safe Datalog with negation, without recursion or aggregation, under the stratified semantics Relational Algebra Every rule separately can be expressed by a relational algebra expression Makes it very suitable for implementation on top of a relational database

61
Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = Datalog with negation and without recursion Optimization techniques Conclusions

62
Evaluation of Datalog Programs Running example: root(r). child(r,a). child(r,b). child(a,c). child(a,d). child(c,e). child(d,f). child(b,h). sg(X,Y) :- root(X),root(Y). sg(X,Y) :- child(X,U), sg(U,V), child(Y,V). r ab cd fe h

63
Evaluation of Datalog Programs: Issues Repeated inferences: recursive rules are repeatedly applied in the naïve way; same inferences in several iterations. Unnecessary inferences: if we just want to find sg of a particular node, say e, computing the fixpoint of the sg program and then selecting tuples with e in the first column is wasteful, in that we compute many irrelevant facts.

64
Evaluation of Datalog Programs Running example: Query: ? sg(e,X) 1. (r, r) 2. (a,a), (b,b), (a,b), (b,a) 3. (c,c), (c,d), (c,h), (d,c), (d,d), … 4. (e,e), (f,f), (e,f), (f,e) r ab cd fe h

65
Avoiding Repeated Inferences Seminaive Fixpoint Evaluation: Avoid repeated inferences: at least one of the body facts generated in the most recent iteration. For each recursive table P, use a table delta_P. Rewrite the program to use the delta tables. A second evaluation of the rule: r(X,Y) :- s(X), t(Y), u(X,Z), v(Y,Z). only gives new tuples (X,Y) for ground instantiations in which at least one of the atoms is new.

66
Avoiding Unnecessary Inferences Still, in the running example: many unnecessary deductions when query is: ? sg(e,X) Compare with top-down as in Prolog only facts that are connected to the ultimate goal are being considered

67
The “Prolog Way” sg(X,Y) :- root(X),root(Y). sg(X,Y) :- child(X,U), sg(U,V), child(Y,V). ? sg(e,X). try: root(e) FAIL try: child(e,U) U=c try: sg(c,V) try: root(e) FAIL try: child(c,U) U=a … r ab cd fe h

68
The “Prolog Way” sg(X,Y) :- root(X),root(Y). sg(X,Y) :- child(X,U), sg(U,V), child(Y,V). ? sg(e,X). try: root(e) FAIL try: child(e,U) U=c try: sg(c,V) try: root(e) FAIL try: child(c,U) U=a … r ab cd fe h

69
“Magic Sets” Idea We want to do something similar for Datalog Idea: Define a “filter” table: computes all relevant values, restricts the computation of sg(e,X). sg(X,Y) :- m(X), root(X), root(Y). sg(X,Y) :- m(X), child(X,U), sg(U,V),child(Y,V). m(X) :- m(Y), child(Y,X). m(e).

70
Magic Sets It is always possible to do this in such a way that bottom-up becomes as efficient as top-down! Different proposals exist in literature how to introduce the magic filters

71
Optimization Techniques Many other techniques exist as well Standard relational indexing techniques (partly) materializing intensional relations on beforehand Trade-off memory query time performance (See also the OLAP-part for a similar technique) Different representations for relations BDD (Stanford)

72
Outline Syntax of the Datalog language Semantics of a Datalog program Relational algebra = Datalog with negation and without recursion Optimization techniques Conclusions

73
Datalog adds deductive capabilities to databases extensional relations intensional relations Datalog without recursion, with negation safety requirement stratification equal in power to relational algebra Closed World Assumption

74
Conclusions Datalog without Negation Always a unique minimal model Datalog with negation and recursion semantics not always clear stratified negation Evaluation of datalog queries: without negation = RA-optimization with recursion: semi-naive recursion magic sets

75
Conclusions Very nice idea, but … Deductive databases did not make it as a database paradigm Yet, many ideas survived recursion in SQL … And others may re-surface in future. Increasing need for adding meta-information in databases

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google