# 1 The History of Datalog Origins Failure Resurrection.

## Presentation on theme: "1 The History of Datalog Origins Failure Resurrection."— Presentation transcript:

1 The History of Datalog Origins Failure Resurrection

2 An Odd Encounter uSeveral years ago, I met a colleague, Monica Lam, in the hallway at Stanford. u“I hear you were involved in the early work on Datalog.” uShe had discovered this work and used it in her system for large-scale data- flow analysis.

3 Odd Encounter – (2) uThe application is naturally recursive. uVery large-scale (analyzed code of 800K lines).  They (Monica and her student John Whaley) had an implementation bddbddb that compiled Datalog rules into BDD’s (binary decision diagrams).

4 Where Did Datalog Come From? 1.Codd’s tuple and domain calculus (1972). 2.Gallaire and Minker’s “Logic and Databases” (1978). 3.Prolog (1976).

5 Codd’s Logics uTRC. { t | R(r) and S(s) and t.A = r.A and r.B = s.B and t.C = s.C } wImplemented by Stonebraker as QUEL. uDRC. { ac | R(ab) and S(bc) } wImplemented by Zloof as Query-by- Example.

6 “Logic and Databases” uViewed queries as the result of an entire logical theory. uThus allows recursion, negation, theories with multiple minimal models. uClosed/open-world evaluations.

7 Prolog uA conventional programming language with predicates as function calls. uBizarre execution rule. uExample: you have to write TC as: path(X,Y) :- arc(X,Y). path(X,Y) :- arc(X,Z), path(Z,Y).

8 Implementation of Logical Query Languages for Databases uIn 1984 I took sabbatical at Hebrew University and wrote a paper with the above title. uIt has some crazy stuff that makes me wonder “what was I thinking?” uMuch was fixed by others, later. uPublished in SIGMOD (no real theorems!).

9 Implementation – (2) uKey idea: Prolog notation + Horn- clause, unique fixedpoint semantics. uKey idea: It’s about algorithms for query execution, not logical models. wOriginal thought in that direction was really by Henschen and Naqvi.

10 Enter “Datalog” uThe term “Datalog” to refer to positive Horn clauses without function symbols was first proposed by Dave Maier and David S. (“the other”) Warren. uAppears in their book Programming with Logic (1988), but in common use before that.

11 Good Implementation Ideas 1.Seminaive evaluation (Bancilhon and Ramakrishnan, 1986 – also in SIGMOD). 2.Specialized linear-recursion implementations (many people including Naughton, Ramakrishnan, Sagiv, Vardi,…). 3.Magic sets (Beeri and Ramakrishnan, 1987 – finally something got into PODS).

12 Magic Sets uA query-rewriting scheme. uSimilar in effect to a number of query- execution ideas such as 1.Query-Subquery (Rohmer, Lescoeur, and Kerasit, 1986). 2.Memoing (Dietrich and Warren, 1985).

13 Negation uWith negated subgoals in Datalog wExample: bachelor(X) :- male(X), NOT married(X,Y) you run the risk of multiple minimal models. uStratified model (Chandra-Harel, 1982; Apt, Blair, Walker, 1985). u Well-founded semantics (Van Gelder, Ross, Schlipf, 1988).

14 The Death of Datalog uRecursion turned out not to be all that important in the world of the 1980’s. uIn the AI community, where logic was taken more seriously than in DB, the emphasis was on expressiveness, not tractability.

15 The Rebirth uDatalog slept, but nothing could take away its important virtues: wSimplicity and declarativeness. wTractability. wSimple execution engine. uWhile “rule-based systems” were long an AI staple, they never got these features of Datalog.

16 bddbddb uWhy did Monica Lam think of Datalog for data-flow analysis? uClassical DFA was for code optimization. wOnly inner loops are important, so data never needed to get really large.

17 bddbddb – (2) uMonica was looking at a different application: software security. wExample: can a string read at one point be passed to a SQL call without first being the argument of a function that checks safety? uEntire program analyzed as a whole. wExample: 800K lines of Apache. wNow it’s a database problem.

18 Overlog and Dedalus uAt about the same time, Joe Hellerstein was experimenting with Datalog, first for prototyping and later for the real implementation. uGeneral direction: protocols for distributed systems.

19 Overlog and Dedalus – (2) uTwo important additions: time and space as first-class concepts. uExample (space): Assume each node has a table of arcs out. warc(@n, h) means the table at node n contains an arc to node h.

20 Example – Continued uEach node n computes the set of nodes it can reach by consulting the reach sets for the nodes to which n has arcs. reach(@n, m) :- arc(@n, h), reach(@h, m).

21 Some Other Datalog Directions 1.Webdamlog (Abiteboul et al., these proceedings). uAdds creation of rules at remote sites. 2.PrPl (Lam et al.). uSocial networking in Datalog. 3.SecPAL (Becker et al.). uMicrosoft authorization language translated to Datalog.

22 Other Directions – (2) 4.LogicBlox (Molham Aref, CEO). uStartup in Atlanta GA. uOne of several Datalog-based startups. uUses Datalog for customized decision- support systems. uMany extensions, including controlled 2 nd –order predicates. uStill has a tractable, straightforward execution model.

23 Conclusions uToo early to tell how important Datalog will be. wWill simplicity and tractability beat expressiveness? uBut moving in the right direction(s) now. uFrom Datalog 2.0 Workshop: needs an open-source standard, like mySQL.