Logical Rules Recursion

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Union, Intersection, Difference (subquery) UNION (subquery) produces the union of the two relations. Similarly for INTERSECT, EXCEPT = intersection and.
Relational Calculus and Datalog
1 Extended Conjunctive Queries Unions Arithmetic Negation.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Engineering IgTechnology Maggioli Informatica Micron Technology Neta Nous Informatica Nikesoft SED Siemens CNX Taiprora Telespazio Datalog S. Costantini.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
Winter 2002Arthur Keller – CS 1806–1 Schedule Today: Jan. 22 (T) u SQL Queries. u Read Sections Assignment 2 due. Jan. 24 (TH) u Subqueries, Grouping.
SQL CSET 3300.
1 Database Systems Relations as Bags Grouping and Aggregation Database Modification.
1 Introduction to SQL Multirelation Queries Subqueries Slides are reused by the approval of Jeffrey Ullman’s.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
Winter 2002Arthur Keller – CS 18015–1 Schedule Today: Feb. 28 (TH) u Datalog and SQL Recursion, ODL. u Read Sections , Project Part 6.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
1 Datalog Logical Rules Recursion SQL-99 Recursion.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #3.
1 Datalog Rules Programs Negation. 2 Review of Logical If-Then Rules h(X,…) :- a(Y,…) & b(Z,…) & … head body subgoals “The head is true if all the subgoals.
1 Semantics of Datalog With Negation Local Stratification Stable Models Well-Founded Models.
1 SQL/PSM Procedures Stored in the Database General-Purpose Programming.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Winter 2002Arthur Keller – CS 1807–1 Schedule Today: Jan. 24 (TH) u Subqueries, Grouping and Aggregation. u Read Sections Project Part 2 due.
1 More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
1 Datalog Logical Rules Recursion. 2 Logic As a Query Language uIf-then logical rules have been used in many systems. wMost important today: EII (Enterprise.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Deductive Databases Chapter 25
Rutgers University Relational Calculus 198:541 Rutgers University.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
1 Transactions, Views, Indexes Controlling Concurrent Behavior Virtual and Materialized Views Speeding Accesses to Data This slides are from J. Ullman’s.
Databases 1 8th lecture. Topics of the lecture Multivalued Dependencies Fourth Normal Form Datalog 2.
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
Winter 2006Keller, Ullman, Cushing9–1 Constraints Commercial relational systems allow much more “fine-tuning” of constraints than do the modeling languages.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Databases 1 Second lecture.
1 Introduction to SQL Database Systems. 2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying a lot of data-manipulation.
Himanshu GuptaCSE 532-SQL-1 SQL. Himanshu GuptaCSE 532-SQL-2 Why SQL? SQL is a very-high-level language, in which the programmer is able to avoid specifying.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
XML Query Languages XPATH XQUERY Zaki Malik November 11, 2008.
SCUHolliday - coen 1787–1 Schedule Today: u Subqueries, Grouping and Aggregation. u Read Sections Next u Modifications, Schemas, Views. u Read.
More SQL (and Relational Algebra). More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation Insert/Delete/Update.
Security and User Authorization in SQL. Lu Chaojun, SJTU 2 Security Two aspects: –Users only see the data they’re supposed to; –Guard against malicious.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
1 Data Modification with SQL CREATE TABLE, INSERT, DELETE, UPDATE Slides from
1 Introduction to Database Systems, CS420 SQL JOIN, Aggregate, Grouping, HAVING and DML Clauses.
Schedule Today: Jan. 28 (Mon) Jan. 30 (Wed) Next Week Assignments !!
Slides are reused by the approval of Jeffrey Ullman’s
Outerjoins, Grouping/Aggregation Insert/Delete/Update
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
Foreign Keys Local and Global Constraints Triggers
Databases : More about SQL
CPSC-310 Database Systems
Schedule Today: Next After that Subqueries, Grouping and Aggregation.
Introduction to Database Systems, CS420
CPSC-608 Database Systems
Semantics of Datalog With Negation
CPSC-310 Database Systems
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
CPSC-608 Database Systems
CPSC-608 Database Systems
More SQL Extended Relational Algebra Outerjoins, Grouping/Aggregation
Instructor: Zhe He Department of Computer Science
Select-From-Where Statements Multirelation Queries Subqueries
Rules Programs Negation
Presentation transcript:

Logical Rules Recursion Datalog Logical Rules Recursion

Logic As a Query Language If-then logical rules have been used in many systems. Important example: EII (Enterprise Information Integration). Nonrecursive rules are equivalent to the core relational algebra. Recursive rules extend relational algebra and appear in SQL-99.

Example: Enterprise Integration Goal: integrated view of the menus at many bars Sells(bar, beer, price). Joe has data JoeMenu(beer, price). Approach 1: Describe Sells in terms of JoeMenu and other local data sources. Sells(’Joe’’s Bar’, b, p) <- JoeMenu(b, p)

EII – (2) Approach 2: Describe how JoeMenu can be used as a view to help answer queries about Sells and other relations. JoeMenu(b, p) <- Sells(’Joe’’s Bar’, b, p) More about information integration later.

A Logical Rule Our first example of a rule uses the relations Frequents(drinker, bar), Likes(drinker, beer), and Sells(bar, beer, price). The rule is a query asking for “happy” drinkers --- those that frequent a bar that serves a beer that they like.

Anatomy of a Rule Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Body = antecedent = AND of subgoals. Head = consequent, a single subgoal Read this symbol “if”

Subgoals Are Atoms An atom is a predicate, or relation name with variables or constants as arguments. The head is an atom; the body is the AND of one or more atoms. Convention: Predicates begin with a capital, variables begin with lower-case.

Example: Atom Sells(bar, beer, p) The predicate = name of a relation Arguments are variables (or constants).

Interpreting Rules A variable appearing in the head is distinguished ; otherwise it is nondistinguished. Rule meaning: The head is true for given values of the distinguished variables if there exist values of the nondistinguished variables that make all subgoals of the body true.

Example: Interpretation Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) Distinguished variable Nondistinguished variables Interpretation: drinker d is happy if there exist a bar, a beer, and a price p such that d frequents the bar, likes the beer, and the bar sells the beer at price p.

Applying a Rule Approach 1: consider all combinations of values of the variables. If all subgoals are true, then evaluate the head. The resulting head is a tuple in the result.

Example: Rule Evaluation Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) FOR (each d, bar, beer, p) IF (Frequents(d,bar), Likes(d,beer), and Sells(bar,beer,p) are all true) add Happy(d) to the result Note: set semantics so add only once.

A Glitch (Fixed Later) Relations are finite sets. We want rule evaluations to be finite and lead to finite results. “Unsafe” rules like P(x)<-Q(y) have infinite results, even if Q is finite. Even P(x)<-Q(x) requires examining an infinity of x-values.

Applying a Rule – (2) Approach 2: For each subgoal, consider all tuples that make the subgoal true. If a selection of tuples define a single value for each variable, then add the head to the result. Leads to finite search for P(x)<-Q(x), but P(x)<-Q(y) is problematic.

Example: Rule Evaluation – (2) Happy(d) <- Frequents(d,bar) AND Likes(d,beer) AND Sells(bar,beer,p) FOR (each f in Frequents, i in Likes, and s in Sells) IF (f[1]=i[1] and f[2]=s[1] and i[2]=s[2]) add Happy(f[1]) to the result

Arithmetic Subgoals In addition to relations as predicates, a predicate for a subgoal of the body can be an arithmetic comparison. We write arithmetic subgoals in the usual way, e.g., x < y.

Example: Arithmetic A beer is “cheap” if there are at least two bars that sell it for under $2. Cheap(beer) <- Sells(bar1,beer,p1) AND Sells(bar2,beer,p2) AND p1 < 2.00 AND p2 < 2.00 AND bar1 <> bar2

Negated Subgoals NOT in front of a subgoal negates its meaning. Example: Think of Arc(a,b) as arcs in a graph. S(x,y) says the graph is not transitive from x to y ; i.e., there is a path of length 2 from x to y, but no arc from x to y. S(x,y) <- Arc(x,z) AND Arc(z,y) AND NOT Arc(x,y)

Safe Rules A rule is safe if: also appears in a nonnegated, Each distinguished variable, Each variable in an arithmetic subgoal, and Each variable in a negated subgoal, also appears in a nonnegated, relational subgoal. Safe rules prevent infinite results.

Example: Unsafe Rules Each of the following is unsafe and not allowed: S(x) <- R(y) S(x) <- R(y) AND NOT R(x) S(x) <- R(y) AND x < y In each case, an infinity of x ’s can satisfy the rule, even if R is a finite relation.

An Advantage of Safe Rules We can use “approach 2” to evaluation, where we select tuples from only the nonnegated, relational subgoals. The head, negated relational subgoals, and arithmetic subgoals thus have all their variables defined and can be evaluated.

Datalog Programs Datalog program = collection of rules. In a program, predicates can be either EDB = Extensional Database = stored table. IDB = Intensional Database = relation defined by rules. Never both! No EDB in heads.

Evaluating Datalog Programs As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated. If an IDB predicate has more than one rule, each rule contributes tuples to its relation.

Example: Datalog Program Using EDB Sells(bar, beer, price) and Beers(name, manf), find the manufacturers of beers Joe doesn’t sell. JoeSells(b) <- Sells(’Joe’’s Bar’, b, p) Answer(m) <- Beers(b,m) AND NOT JoeSells(b)

Example: Evaluation Step 1: Examine all Sells tuples with first component ’Joe’’s Bar’. Add the second component to JoeSells. Step 2: Examine all Beers tuples (b,m). If b is not in JoeSells, add m to Answer.

Expressive Power of Datalog Without recursion, Datalog can express all and only the queries of core relational algebra. The same as SQL select-from-where, without aggregation and grouping. But with recursion, Datalog can express more than these languages. Yet still not Turing-complete.

Recursive Example EDB: Par(c,p) = p is a parent of c. Generalized cousins: people with common ancestors one or more generations back: Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

Definition of Recursion Form a dependency graph whose nodes = IDB predicates. Arc X ->Y if and only if there is a rule with X in the head and Y in the body. Cycle = recursion; no cycle = no recursion.

Example: Dependency Graphs Cousin Answer Sib JoeSells Recursive Nonrecursive

Evaluating Recursive Rules The following works when there is no negation: Start by assuming all IDB relations are empty. Repeatedly evaluate the rules using the EDB and the previous IDB, to get a new IDB. End when no change to IDB.

The “Naïve” Evaluation Algorithm Start: IDB = 0 Apply rules to IDB, EDB yes Change to IDB? no done

Seminaive Evaluation Since the EDB never changes, on each round we only get new IDB tuples if we use at least one IDB tuple that was obtained on the previous round. Saves work; lets us avoid rediscovering most known facts. A fact could still be derived in a second way.

Example: Evaluation of Cousin We’ll proceed in rounds to infer Sib facts (red) and Cousin facts (green). Remember the rules: Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y Cousin(x,y) <- Sib(x,y) Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

Par Data: Parent Above Child Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp) Cousin(x,y) <- Sib(x,y) a d b c e f g h j k i Round 1 Round 2 Round 3 Round 4

SQL-99 Recursion Datalog recursion has inspired the addition of recursion to the SQL-99 standard. Tricky, because SQL allows negation grouping-and-aggregation, which interact with recursion in strange ways.

Form of SQL Recursive Queries WITH <stuff that looks like Datalog rules> <a SQL query about EDB, IDB> “Datalog rule” = [RECURSIVE] <name>(<arguments>) AS <query>

Example: SQL Recursion – (1) Find Sally’s cousins, using SQL like the recursive Datalog example. Par(child,parent) is the EDB. WITH Sib(x,y) AS SELECT p1.child, p2.child FROM Par p1, Par p2 WHERE p1.parent = p2.parent AND p1.child <> p2.child; Like Sib(x,y) <- Par(x,p) AND Par(y,p) AND x <> y

Example: SQL Recursion – (2) Required – Cousin is recursive WITH … RECURSIVE Cousin(x,y) AS (SELECT * FROM Sib) UNION (SELECT p1.child, p2.child FROM Par p1, Par p2, Cousin WHERE p1.parent = Cousin.x AND p2.parent = Cousin.y); Reflects Cousin(x,y) <- Sib(x,y) Reflects Cousin(x,y) <- Par(x,xp) AND Par(y,yp) AND Cousin(xp,yp)

Example: SQL Recursion – (3) With those definitions, we can add the query, which is about the virtual view Cousin(x,y): SELECT y FROM Cousin WHERE x = ’Sally’;

Legal SQL Recursion It is possible to define SQL recursions that do not have a meaning. The SQL standard restricts recursion so there is a meaning. And that meaning can be obtained by seminaïve evaluation.

Example: Meaningless Recursion EDB: P(x) = {(1)}. IDB: Q(x) <- P(x) AND NOT Q(x). Is (1) in Q(x)? If so, the recursive rule says it is not. If not, the recursive rule says it is.

Plan to Explain Legal SQL Recursion Define “monotone” recursions. Define a “stratum graph” to represent the connections among subqueries. Define proper SQL recursions in terms of the stratum graph.

Monotonicity If relation P is a function of relation Q (and perhaps other relations), we say P is monotone in Q if inserting tuples into Q cannot cause any tuple to be deleted from P. Examples: P = Q ∪ R. P = σa =10(Q ).

Example: Nonmonotonicity SELECT AVG(price) FROM Sells WHERE bar = ’Joe’’s Bar’; is not monotone in Sells. Inserting a Joe’s-Bar tuple into Sells usually changes the average price and thus deletes the old average price.

Stratum Graph Nodes = IDB relations declared in WITH clause. Subqueries in the body of the “rules.” Includes subqueries at any level of nesting.

Stratum Graph – (2) Arcs P ->Q : P is a rule head and Q is a relation in the FROM list (not of a subquery). P is a rule head and Q is an immediate subquery of that rule. P is a subquery, and Q is a relation in its FROM or an immediate subquery (like 1 and 2). Put “–” on an arc if P is not monotone in Q.

Stratified SQL A SQL recursion is stratified if there is a finite bound on the number of – signs along any path in its stratum graph. Including paths with cycles. Legal SQL recursion = recursion with a stratified stratum graph.

Example: Stratum Graph In our Cousin example, the structure of the rules was: Sib = … Cousin = ( … FROM Sib ) UNION ( … FROM Cousin … ) Subquery S1 Subquery S2

The Graph Sib No “–” at all, so surely stratified. Cousin S1 S2

Nonmonotone Example Change the UNION in the Cousin example to EXCEPT: Sib = … Cousin = ( … FROM Sib ) EXCEPT ( … FROM Cousin … ) Subquery S1 Subquery S2 Can delete a tuple from Cousin Inserting a tuple into S2

The Graph Sib Cousin _ S1 S2 An infinite number of –’s exist on cycles involving Cousin and S2. Cousin _ S1 S2