Presentation is loading. Please wait.

Presentation is loading. Please wait.

From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 23, 2004.

Similar presentations


Presentation on theme: "From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 23, 2004."— Presentation transcript:

1 From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 23, 2004 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 2 Administrivia  Homework 1 due now  Homework 2 will be handed out Tuesday  Will involve writing SQL  Oracle set up on eniac.seas.upenn.edu (also eniac-l.seas.upenn.edu)  Go to: www.seas.upenn.edu/~zives/cis550/oracle-faq.html Click on “created Oracle account” link Enter your login info so you’ll get an Oracle account www.seas.upenn.edu/~zives/cis550/oracle-faq.html

3 3 The Calculus: Logical Equivalences  There are two logical equivalences that will be heavily used:  p  q   (p  q) (Whenever p is true, q must also be true.)   x. p(x)   x.  p(x) (p is true for all x)  The second can be a lot easier to check!  Example:  The highest course number offered (similar to last time’s example)

4 4 Terminology: Free and Bound Variables  A variable v is bound in a predicate p when p is of the form  v… or  v…  A variable occurs free in p if it occurs in a position where it is not bound by an enclosing  or   Examples:  x is free in x > 2  x is bound in  x. x > y

5 5 Can Rename Bound Variables  When a variable is bound one can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes  Example:  x. x > 2 is equivalent to  y. y > 2  Otherwise, the variable is defined outside our “scope”…

6 6 Safety  Pitfall in what we have done so far – how do we interpret: { |   STUDENT}  Set of all binary tuples that are not students: an infinite set (and unsafe query)  A query is safe if no matter how we instantiate the relations, it always produces a finite answer  Domain independent: answer is the same regardless of the domain in which it is evaluated  Unfortunately, both this definition of safety and domain independence are semantic conditions, and are undecidable

7 7 Safety and Termination Guarantees  There are syntactic conditions that are used to guarantee “safe” formulas  The definition is complicated, and we won’t discuss it; you can find it in Ullman’s Principles of Database and Knowledge- Base Systems  The formulas that are expressible in real query languages based on relational calculus are all “safe”  Many DB languages include additional features, like recursion, that must be restricted in certain ways to guarantee termination and consistent answers

8 8 Mini-Quiz How do you write:  Which students have taken more than one course from the same professor?

9 9 Translating from RA to DRC  Core of relational algebra: , , , x, -  We need to work our way through the structure of an RA expression, translating each possible form.  Let TR[e] be the translation of RA expression e into DRC.  Relation names: For the RA expression R, the DRC expression is { |  R}

10 10 Selection: TR[   R]  Suppose we have   (e’), where e’ is another RA expression that translates as: TR[e’]= { | p}  Then the translation of  c (e’) is { | p  ’} where  ’ is obtained from  by replacing each attribute with the corresponding variable  Example: TR[  #1=#2  #4>2.5 R] (if R has arity 4) is { |  R  x 1 =x 2  x 4 >2.5}

11 11 Projection: TR[  i 1,…,i m (e)]  If TR[e]= { | p} then TR[  i 1,i 2,…,i m (e)]= { |  x j 1,x j 2, …, x j k.p}, where x j 1,x j 2, …, x j k are variables in x 1,x 2, …, x n that are not in x i 1,x i 2, …, x i m  Example: With R as before,  #1,#3 (R)={ |  x 2,x 4.  R}

12 12 Union: TR[R 1  R 2 ]  R 1 and R 2 must have the same arity  For e 1  e 2, where e 1, e 2 are algebra expressions TR[e 1 ]={ |p} and TR[e 2 ]={ |q}  Relabel the variables in the second: TR[e 2 ]={ |q’}  This may involve relabeling bound variables in q to avoid clashes TR[e 1  e 2 ]={ |p  q’}.  Example: TR[R 1  R 2 ] = { |  R 1   R 2

13 13 Other Binary Operators  Difference: The same conditions hold as for union If TR[e 1 ]={ |p} and TR[e 2 ]={ |q} Then TR[e 1 - e 2 ]= { |p  q}  Product: If TR[e 1 ]={ |p} and TR[e 2 ]={ |q} Then TR[e 1  e 2 ]= { | p  q}  Example: TR[R  S]= { |  R   S }

14 14 Relational Algebra vs. Calculus  Can translate relational algebra into relational calculus  Given syntactic restrictions that guarantee safety of calculus query, can translate back to relational algebra  These are the principles behind initial development of relational databases  SQL is close to calculus; query plan is close to algebra  But SQL can do other things (recursion, aggregation that RA/RC can’t)  Great example of theory leading to practice!  Let’s see how this works…

15 15 Tuple Relational Calculus Queries of form: {T | p} Predicate: boolean expression over T x attribs  Expressions: T x  RT X.a op T Y.b T X.a op constconst op T X.a T.a = T x.a where op is , , , , ,  T x,… are tuple variables, T x.a, … are attributes  Complex expressions: e 1  e 2, e 1  e 2,  e, and e 1  e 2  Universal and existential quantifiers predicate

16 16 Domain Relational Calculus to Tuple Relational Calculus  { | 9 cid, sem, cid, sid ( 2 COURSE Æ 2 Takes}  { | 9 s1, s2 ( 2 COURSE Æ 9 cid2, s3, s4 ( 2 COURSE Æ (cid > cid2)))}

17 17 Basic SQL: A Friendly Face Over the Tuple Relational Calculus SELECT [DISTINCT] {T 1.attrib, …, T 2.attrib} FROM {relation} T 1, {relation} T 2, … WHERE {predicates} Let’s do some examples, which will leverage your knowledge of the relational calculus…  Faculty ids  Course IDs for courses with students expecting a “C”  Courses taken by Jill select-list from-list qualification

18 18 Example Data Instance sidname 1Jill 2Qun 3Nitin 4Marty fidname 1Ives 2Saul 8Roth sidexp-gradecid 1A550-0103 1A700-1003 3A 3C500-0103 4C cidsubjsem 550-0103DBF03 700-1003AIS03 501-0103ArchF03 fidcid 1550-0103 2700-1003 8501-0103 STUDENT Takes COURSE PROFESSOR Teaches

19 19 Some Nice Features  SELECT *  All STUDENTs  AS  As a “range variable” (tuple variable): optional  As an attribute rename operator  Example:  Which students (names) have taken more than one course from the same professor?

20 20 Expressions in SQL  Can do computation over scalars (int, real or string) in the select-list or the qualification  Show all student IDs decremented by 1  Strings:  Fixed (CHAR(x)) or variable length (VARCHAR(x))  Use single quotes: ’A string’  Special comparison operator: LIKE  Not equal: <>  Typecasting:  CAST(S.sid AS VARCHAR(255))

21 21 Set Operations  Set operations default to set semantics, not bag semantics: (SELECT … FROM … WHERE …) {op} (SELECT … FROM … WHERE …)  Where op is one of:  UNION  INTERSECT, MINUS/EXCEPT (many DBs don’t support these last ones!)  Bag semantics: ALL

22 22 Exercise  Find all students who have taken DB but not AI  Hint: use EXCEPT

23 23 Nested Queries in SQL  Simplest: IN/NOT IN  Example: Students who have taken subjects that have (at any point) been taught by Roth

24 24 Correlated Subqueries  Most common: EXISTS/NOT EXISTS  Find all students who have taken DB but not AI

25 25 Universal and Existential Quantification  Generally used with subqueries:  {op} ANY, {op} ALL  Find the students with the best expected grades

26 26 Table Expressions  Can substitute a subquery for any relation in the FROM clause: SELECT S.sid FROM (SELECT sid FROM STUDENT WHERE sid = 5) S WHERE S.sid = 4 Notice that we can actually simplify this query! What is this equivalent to?

27 27 Aggregation  GROUP BY SELECT {group-attribs}, {aggregate-operator}(attrib) FROM {relation} T 1, {relation} T 2, … WHERE {predicates} GROUP BY {group-list}  Aggregate operators  AVG, COUNT, SUM, MAX, MIN  DISTINCT keyword for AVG, COUNT, SUM

28 28 Some Examples  Number of students in each course offering  Number of different grades expected for each course offering  Number of (distinct) students taking AI courses

29 29 What If You Want to Only Show Some Groups?  The HAVING clause lets you do a selection based on an aggregate (there must be 1 value per group): SELECT C.subj, COUNT(S.sid) FROM STUDENT S, Takes T, COURSE C WHERE S.sid = T.sid AND T.cid = C.cid GROUP BY subj HAVING COUNT(S.sid) > 5  Exercise: For each subject taught by at least two professors, list the minimum expected grade

30 30 Aggregation and Table Expressions  Sometimes need to compute results over the results of a previous aggregation: SELECT subj, AVG(size) FROM ( SELECT C.cid AS id, C.subj AS subj, COUNT(S.sid) AS size FROM STUDENT S, Takes T, COURSE C WHERE S.sid = T.sid AND T.cid = C.cid GROUP BY cid, subj) GROUP BY subj

31 31 Something to Ponder  Tables are great, but…  Not everyone is uniform – I may have a cell phone but not a fax  We may simply be missing certain information  We may be unsure about values  How do we handle these things?


Download ppt "From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 23, 2004."

Similar presentations


Ads by Google