Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Similar presentations


Presentation on theme: "Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple."— Presentation transcript:

1 Lu Chaojun, SJTU 1 Extended Relational Algebra

2 Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple more than once –There is no specified order (unlike a list). Select, project, and join work for bags as well as sets. –Just work on a tuple-by-tuple basis, and don't eliminate duplicates. Lu Chaojun, SJTU 2

3 Why Bags? Efficient implementation –e.g. projection, union –Q: How to eliminate duplicates? Some queries use bags –e.g. Aggregate Find the average grades Lu Chaojun, SJTU 3

4 Bag Union R  S: Sum the times an element appears in the two bags, i.e. if t appears n/m times in R/S, then t appears n+m times in R  S. Example { 1,2, 1}  { 1,2, 3} = { 1,1,1,2,2, 3}. 4 Lu Chaojun, SJTU

5 Bag Intersection R  S: Take the minimum of the number of occurrences in each bag, i.e. t appears min(n,m) times in R  S. Example { 1,2, 1}  { 1,2, 3,3} = { 1,2 }. 5 Lu Chaojun, SJTU

6 Bag Difference R  S: Proper-subtract the number of occurrences in the two bags, i.e. t appears max(0, n  m) times in R  S. Example { 1,2, 1}  { 1,2, 3,3} = { 1 }. 6 Lu Chaojun, SJTU

7 Other Operators on Bags Projection, selection, product, join –No duplicate elimination 7 Lu Chaojun, SJTU

8 Extensions to Relational Model Not a part of the formal relational model, but appear in real query languages like SQL. –Modification: insert, delete, update. –Aggregation: count, sum, average –Views –Null values 8 Lu Chaojun, SJTU

9 Extended RA Duplicate-elimination operator Sorting operator Extended projection Grouping-and-aggregation operator Outerjoin operator 9 Lu Chaojun, SJTU

10 Duplicate Elimination  ( R) = relation with one copy of each tuple that appears one or more times in R. 10 Lu Chaojun, SJTU

11 Aggregation Operators These are not relational operators; rather they summarize a column in some way. Five standard operators: Sum, Average, Count, Min, and Max. 11 Lu Chaojun, SJTU

12 Grouping Operator  L (R), where L is a list of elements that are either –Individual ( grouping) attributes or –Of the form  (A), where  is an aggregation operator and A the attribute to which it is applied. Example  sno,AVG(grade) (SC) 12 Lu Chaojun, SJTU

13 Grouping Operator(cont.)  L (R) is computed by: 1. Group R according to all the grouping attributes on list L. 2. Within each group, compute  (A), for each element  (A) on list L. 3. Result is the relation that consists of one tuple for each group. The components of that tuple are the values associated with each element of L for that group. 13 Lu Chaojun, SJTU

14 Extended Projection Allow the columns in the projection to be functions of one or more columns in the argument relation. Example  name,2011  age (Student) 14 Lu Chaojun, SJTU

15 Sorting  L (R) = list of tuples of R, ordered according to attributes on list L. Note that result type is outside the normal types (set or bag) for relational algebra. –Consequence:  cannot be followed by other relational operators. 15 Lu Chaojun, SJTU

16 Outerjoin The normal join can lose information, because a tuple that doesn't join with any from the other relation becomes dangling. The null value can be used to pad dangling tuples so they appear in the join. Outerjoin operator: o Variations: theta-outerjoin, left- and right- outerjoin (pad only dangling tuples from the left (resp., right). 16 Lu Chaojun, SJTU

17 A Logic for Relations Datalog Lu Chaojun, SJTU 17

18 Introduction A query language for relational model may be based on –Algebra: relational algebra –Logic: relational calculus e.g. Datalog More natural for recursive queries 18 Lu Chaojun, SJTU

19 Predicates and Atoms RDB vs. Datalog RDB Datalog relation R( ) predicate R( ) attributes(tuples) arguments x schema R(X) (relational)atom R(x) tuple t  R R(t) is TRUE –R(x) is a boolean-valued function if x contains variables; proposition otherwise. 19 Lu Chaojun, SJTU

20 Arithmetic Atoms Comparison between two arithmetic expressions exp1  exp2 –Predicate  (exp1,exp2) –infinite and unchanging relation 20 Lu Chaojun, SJTU

21 Datalog Rules Example Happy(sno)  S(sno,n,a,d) AND SC(sno,cno,g) AND g>=95 AND C(cno,cn) AND cn=‘Database’ Rules: Head  Body –Head: relational atom –Body: AND of subgoals Subgoal: atom or NOT atom Atom: P(arg), P is relation name or arithmetic predicate; arg may be variable or constant –  : if Or :- 21 Lu Chaojun, SJTU

22 Datalog Rules (cont.) Query: a collection of one or more rules Result: a relation appearing in rule heads –Designate the intended answer when there are more than one relation in rule heads 22 Lu Chaojun, SJTU

23 Meaning of Datalog Rules Meaning I: –Assign possible values to variables in the rule –If the assignment makes all the subgoals TRUE, then it forms a tuple of the result relation. Meaning II: –Consider consistent assignment of tuples for each nonnegated, relational subgoals. (see safety) –Then consider the negated, relational subgoals and the arithmetic subgoals, to see if the assignment of values to variables makes them all TRUE. If yes, a tuple is added to the result relation. 23 Lu Chaojun, SJTU

24 Example: Meaning I S(x,y)  R(x,z) AND R(z,y) AND NOT R(x,y) Consider all possible assignments: R: A B 1. x=1, z=2 make R(x,z) TRUE 1 2 y=3 make R(z,y) TRUE 2 3 NOT R(x,y) TRUE thus add (1,3) to S; S: C D 2. x=2, z=3 make R(x,z) TRUE 1 3 no y make R(z,y) TRUE 24 Lu Chaojun, SJTU

25 Example: Meaning II S(x,y)  R(x,z) AND R(z,y) AND NOT R(x,y) Consider consistent assignment of tuples: R: A B 1. t 1 for R(x,z), t 1 for R(z,y) t 1 1 2 2. t 1 for R(x,z), t 2 for R(z,y) t 2 2 3 3. t 2 for R(x,z), t 1 for R(z,y) 4. t 2 for R(x,z), t 2 for R(z,y) S: C D 1 3 only case 2 is a consistent assignment 25 Lu Chaojun, SJTU

26 Safety Every variable in the rule must appear in some nonnegated relational subgoal. To make the result a finite relation. Example: safety violation 1. S(x)  R(y) x not in subgoal 2. S(x)  NOT R(x) x not in nonnegated subgoal 3. S(x)  R(y) AND x < y x not in relational subgoal 26 Lu Chaojun, SJTU

27 Datalog Program -- Query A collection of rules Predicates/Relations are divided into two classes: –Extensional Relations/Predicates: stored in DB –Intensional Relations/Predicates: defined by rules EDB predicates can’t appear in the head, only in body; IDB predicates can appear in head, body, or both. 27 Lu Chaojun, SJTU

28 Datalog Rules Applied to Bags When there are no negated relational subgoals: –Meaning I for evaluating Datalog rules applies to bags as well as sets –But for bags, Meaning II is simpler for evaluating. When there are negated relational subgoals: –There is not a clearly defined meaning under the bag model. 28 Lu Chaojun, SJTU

29 From RA to Datalog R  S I(x)  R(x) AND S(x) R  S I(x)  R(x) I(x)  S(x) R  S I(x)  R(x) AND NOT S(x)  A (R) I(a)  R(a,b) 29 Lu Chaojun, SJTU

30 From RA to Datalog(cont.)  F (R) I(x)  R(x) AND F  C1 AND C2 (R) I(x)  R(x) AND C1 AND C2  C1 OR C2 (R) I(x)  R(x) AND C1 I(x)  R(x) AND C2 R  S I(x,y)  R(x) AND S(y) R S I(x,y,z)  R(x,y) AND S(y,z) 30 Lu Chaojun, SJTU

31 Multiple Operations in Datalog Create IDB predicates for intermediate relations Example A(x,y,z)  R(x,y,z) AND x > 10 B(x,y,z)  R(x,y,z) AND y = ‘ok’ C(x,y,z)  A(x,y,z) AND B(x,y,z) D(x,z)  C(x,y,z) 31 Lu Chaojun, SJTU

32 Expressive Power of Datalog Non-recursive Datalog = RA Datalog simulates SQL SELECT-FROM- WHERE without aggregation and grouping Recursive Datalog is more powerful than RA and SQL None is full in expressive power (Turing completeness) 32 Lu Chaojun, SJTU

33 End


Download ppt "Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple."

Similar presentations


Ads by Google