Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.

Similar presentations


Presentation on theme: "Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content."— Presentation transcript:

1 Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 2 Administrivia  Homework 1 due Thursday

3 3 A Set of Logical Operations: The Relational Algebra  Six basic operations:  Projection   (R)  Selection   (R)  UnionR 1 [ R 2  DifferenceR 1 – R 2  ProductR 1 £ R 2  (Rename)   (R)  And some other useful ones:  JoinR 1 ⋈  R 2  SemijoinR 1 ⊲  R 2  IntersectionR 1 Å R 2  DivisionR 1 ¥ R 2

4 4 Data Instance for Operator Examples sidname 1Jill 2Qun 3Nitin 4Marty fidname 1Ives 2Saul 8Roth sidexp-gradecid 1A550-0103 1A700-1003 3A 3C500-0103 4C cidsubjsem 550-0103DBF03 700-1003AIS03 501-0103ArchF03 fidcid 1550-0103 2700-1003 8501-0103 STUDENT Takes COURSE PROFESSOR Teaches

5 5 Mini-Quiz Try writing queries for these:  The names of students named “Bob”  The names of students expecting an “A”  The names of students in Amir Roth’s 501 class  The sids and names of students not enrolled in any courses

6 6 The Big Picture: SQL to Algebra to Query Plan to Web Page SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid STUDENT Takes COURSE Merge Hash by cid Optimizer Execution Engine Storage Subsystem Web Server / UI / etc Query Plan – an operator tree

7 7 Optimization Is Based on Algebraic Equivalences  Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent in semantics  They may be different in cost of evaluation!  c Ç d (R) ´  c (R) [  d (R)  c (R 1 £ R 2 ) ´ R 1 ⋈ c R 2  c Ç d (R) ´  c (  d (R))  Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

8 8 Switching Gears: An Equivalent, But Very Different, Formalism  Codd invented a relational calculus that he proved was equivalent in expressiveness to the rel. algebra  Based on a subset of first-order logic – declarative, without an implicit order of evaluation  Tuple relational calculus  Domain relational calculus  More convenient for certain kinds of manipulations  The database uses the relational algebra internally  But query languages (e.g., SQL) are mostly based on the relational calculus

9 9 Domain Relational Calculus Queries have form: { | p} Predicate: Boolean expression over x 1,x 2, …, x n  Precise operations depend on the domain and query language – may include special functions, etc.  Assume the following at minimum:  RX op Y X op constconst op X where op is , , , , ,  x i,x j,… are domain variables domain variables predicate

10 10 More Complex Predicates Starting with these atomic predicates, build up new predicates by the following rules:  Logical connectives: If p and q are predicates, then so are p  q, p  q,  p, and p  q  (x>2)  (x<4)  (x>2)   (x>0)  Existential quantification: If p is a predicate, then so is  x.p   x. (x>2)  (x<4)  Universal quantification: If p is a predicate, then so is  x.p   x.x>2   x.  y.y>x

11 11 Some Examples  Faculty ids  Subjects for courses with students expecting a “C”  All course numbers for which there exists a smaller course number

12 12 Logical Equivalences  There are two logical equivalences that will be heavily used:  p  q   (p  q) (Whenever p is true, q must also be true.)   x. p(x)   x.  p(x) (p is true for all x)  The second can be a lot easier to check!  Example:  The highest course number offered

13 13 Terminology: Free and Bound Variables  A variable v is bound in a predicate p when p is of the form  v… or  v…  A variable occurs free in p if it occurs in a position where it is not bound by an enclosing  or   Examples:  x is free in x > 2  x is bound in  x. x > y

14 14 Can Rename Bound Variables Only  When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes  Example:  x. x > 2 is equivalent to  y. y > 2  Otherwise, the variable is defined outside our “scope”…

15 15 Safety  Pitfall in what we have done so far – how do we interpret: { |   STUDENT}  Set of all binary tuples that are not students: an infinite set (and unsafe query)  A query is safe if no matter how we instantiate the relations, it always produces a finite answer  Domain independent: answer is the same regardless of the domain in which it is evaluated  Unfortunately, both this definition of safety and domain independence are semantic conditions, and are undecidable

16 16 Safety and Termination Guarantees  There are syntactic conditions that are used to guarantee “safe” formulas  The definition is complicated, and we won’t discuss it; you can find it in Ullman’s Principles of Database and Knowledge- Base Systems  The formulas that are expressible in real query languages based on relational calculus are all “safe”  Many DB languages include additional features, like recursion, that must be restricted in certain ways to guarantee termination and consistent answers

17 17 Mini-Quiz How do you write:  Which students have taken more than one course from the same professor?

18 18 Translating from RA to DRC  Core of relational algebra: , , , x, -  We need to work our way through the structure of an RA expression, translating each possible form.  Let TR[e] be the translation of RA expression e into DRC.  Relation names: For the RA expression R, the DRC expression is { |  R}

19 19 Selection: TR[   R]  Suppose we have   (e’), where e’ is another RA expression that translates as: TR[e’]= { | p}  Then the translation of  c (e’) is { | p  ’} where  ’ is obtained from  by replacing each attribute with the corresponding variable  Example: TR[  #1=#2  #4>2.5 R] (if R has arity 4) is { |  R  x 1 =x 2  x 4 >2.5}

20 20 Projection: TR[  i 1,…,i m (e)]  If TR[e]= { | p} then TR[  i 1,i 2,…,i m (e)]= { |  x j 1,x j 2, …, x j k.p}, where x j 1,x j 2, …, x j k are variables in x 1,x 2, …, x n that are not in x i 1,x i 2, …, x i m  Example: With R as before,  #1,#3 (R)={ |  x 2,x 4.  R}

21 21 Union: TR[R 1  R 2 ]  R 1 and R 2 must have the same arity  For e 1  e 2, where e 1, e 2 are algebra expressions TR[e 1 ]={ |p} and TR[e 2 ]={ |q}  Relabel the variables in the second: TR[e 2 ]={ |q’}  This may involve relabeling bound variables in q to avoid clashes TR[e 1  e 2 ]={ |p  q’}.  Example: TR[R 1  R 2 ] = { |  R 1   R 2

22 22 Other Binary Operators  Difference: The same conditions hold as for union If TR[e 1 ]={ |p} and TR[e 2 ]={ |q} Then TR[e 1 - e 2 ]= { |p  q}  Product: If TR[e 1 ]={ |p} and TR[e 2 ]={ |q} Then TR[e 1  e 2 ]= { | p  q}  Example: TR[R  S]= { |  R   S }

23 23 What about the Tuple Relational Calculus?  We’ve been looking at the Domain Relational Calculus  The Tuple Relational Calculus is nearly the same, but variables are at the level of a tuple, not an attribute  {Q | 9 S  COURSES, 9 T 2 Takes (S.cid = T.cid Æ Q.cid = S.cid Æ Q.exp-grade = T.exp-grade)}

24 24 Limitations of the Relational Algebra / Calculus Can’t do:  Aggregate operations  Recursive queries  Complex (non-tabular) structures  Most of these are expressible in SQL, OQL, XQuery – using other special operators  Sometimes we even need the power of a Turing- complete programming language

25 25 Summary  Can translate relational algebra into relational calculus  DRC and TRC are slightly different syntaxes but equivalent  Given syntactic restrictions that guarantee safety of DRC query, can translate back to relational algebra  These are the principles behind initial development of relational databases  SQL is close to calculus; query plan is close to algebra  Great example of theory leading to practice!


Download ppt "Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content."

Similar presentations


Ads by Google