Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2007 Basic Relational Algebra These slides.

Similar presentations


Presentation on theme: "1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2007 Basic Relational Algebra These slides."— Presentation transcript:

1 1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2007 Basic Relational Algebra These slides are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. For more information on how you may use them, please see http://www.openlineconsult.com/db

2 2 © Ellis Cohen 2001-2007 Overview of Lecture Relational Algebra using REAL Operator Composition Extended Projection Comparisons Case Expressions Duplicate Elimination Aggregate Functions Distinct Aggregation

3 3 © Ellis Cohen 2001-2007 Relational Algebra using REAL

4 4 © Ellis Cohen 2001-2007 Algebra Domain (e.g. numbers) Operators (e.g. for numbers) Unary Operators (e.g. Unary Minus) - 3  -3 - (-7)  7 Binary Operators (e.g. Sum, Product) 3 + 7  10 3 * -5  -15 An algebra is closed Apply the operators to values in the domain, and the result is ALWAYS another value from the domain

5 5 © Ellis Cohen 2001-2007 Relations A relation is just a collection of tuples A relation corresponds to both a SQL table a SQL result set (i.e. the result of executing a SQL query)

6 6 © Ellis Cohen 2001-2007 Relational Algebra Domain: Relations Unary Operators: Restrict (~ like the WHERE clause) Project (~ like the SELECT clause) Binary Operators: Joins (Cross, Inner & Outer Natural) Collection Operators (Union, …) Division (inverse of cross Join) The Relational Algebra DOES NOT have subqueries  They're not needed!

7 7 © Ellis Cohen 2001-2007 REAL Relations as Unordered Bags 1.Relations in REAL are unordered No way to express ORDER BY in the Relational Algebra 2.Most Relational Algebras (including the Classic Relational Algebra) do not allow duplicates tuples in a relation (or support aggregation or grouping) REAL does allow duplicates (formally, relations are bags, not sets) The language we use for the relational algebra is called REAL Relation Expression and Assignment Language

8 8 © Ellis Cohen 2001-2007 Unary Relation Operators In the relational algebra, a unary relation operator is applied to a relation and produces a relation as its result In Mathematical Terms Unary Operator: Relation  Relation That is, if O is a relation operator and R is a relation, then O (R) is also a relation

9 9 © Ellis Cohen 2001-2007 Unary Relational Operators Two important relational operators: Restrict: Chooses subset of rows Project: Chooses subset of columns

10 10 © Ellis Cohen 2001-2007 Restrict Operator Standard forms of restrict operator SELECT * FROM Emps WHERE sal > 1550 Restrict [sal > 1550] ( Emps ) σ sal > 1550 ( Emps ) SQL Equivalent Restriction is a unary relation operator: Apply it to a relation, and the result is another relation REAL restrict operator Emps[ sal > 1550 ] Restriction checks one tuple at a time and keeps tuples which satisfy the restriction condition

11 11 © Ellis Cohen 2001-2007 The Restriction Machine Emps[ sal > 1550 ] empno ename deptno sal comm 7499ALLEN301600300 7654MARTIN3012501400 7698BLAKE302850 7839KING105000 7844TURNER3015000 7986STERN501300 Emps 7499ALLEN301600300 7698BLAKE302850 7839KING105000 empno ename deptno sal comm [ sal > 1550 ] Start with a relation Run it through the restriction machine Get a new relation as a result

12 12 © Ellis Cohen 2001-2007 Project Operator REAL project operator Emps{ empno, ename } Standard forms of project operator SELECT empno, ename FROM Emps Project [empno,ename] ( Emps ) π empno,ename ( Emps ) SQL Equivalent Restriction checks one tuple at a time and keeps tuples which satisfy the restriction condition

13 13 © Ellis Cohen 2001-2007 The Projection Machine Emps{ empno, ename } empno ename deptno sal comm 7499ALLEN301600300 7654MARTIN3012501400 7698BLAKE302850 7839KING105000 7844TURNER3015000 7986STERN501300 7499ALLEN 7654MARTIN 7698BLAKE 7839KING 7844TURNER 7986STERN empno ename { empno, ename } Start with a relation Run it through the projection machine Get a new relation as a result

14 14 © Ellis Cohen 2001-2007 Why Learn REAL? Semantic Clarity –REAL is simpler than SQL ands helps explain how SQL works –Clear semantics allows us to reason about systems and is the basis for optimizing queries Succinctness –Can be a good shorthand for describing complicated queries and assertions. It can sometimes be easier to write complicated queries and assertions in REAL first, and then translate them to SQL

15 15 © Ellis Cohen 2001-2007 Operator Composition

16 16 © Ellis Cohen 2001-2007 Closure and Composition An algebra is closed Apply the operators to values in the domain, and the result is ALWAYS another value from the domain So, operators can be composed ((5 + 7) * 3) + 8  (12 * 3) + 8  36 + 8  44

17 17 © Ellis Cohen 2001-2007 Operator Composition Standard forms of composition SELECT empno, ename FROM Emps WHERE sal > 1550 Project [empno,ename] ( Restrict [sal < 1550] ( Emps ) ) π empno,ename ( σ sal < 1550 ( Emps ) ) SQL Equivalent REAL composition Emps[ sal > 1550 ]{ empno, ename }

18 18 © Ellis Cohen 2001-2007 REAL Composition Processed L-to-R Emps[ sal > 1550 ]{ empno, ename } Emps -- the base relation containing employees Emps[ sal > 1550 ] -- Emps restricted to the employees with sal > 1500 Emps[ sal > 1550 ]{ empno, ename } -- First we restrict Emps (to the employees with sal > 1500) -- Then, from that, we extract only the empno and ename attributes

19 19 © Ellis Cohen 2001-2007 Composition Yields Relations Emps[ sal > 1550 ]{ empno, ename } empno ename deptno sal comm 7499ALLEN301600300 7654MARTIN3012501400 7698BLAKE302850 7839KING105000 7844TURNER3015000 7986STERN501300 Emps 7499ALLEN301600300 7698BLAKE302850 7839KING105000 empno ename deptno sal comm 7499ALLEN 7698BLAKE 7839KING empno ename step 1 step 2 Can you do the project and the restrict in the opposite sequence? [ sal > 1550 ] step 1 { empno, ename }

20 20 © Ellis Cohen 2001-2007 Sequence Matters Emps{ empno, ename }[ sal > 1550 ] empno ename deptno sal comm 7499ALLEN301600300 7654MARTIN3012501400 7698BLAKE302850 7839KING105000 7844TURNER3015000 7986STERN501300 7499ALLEN 7654MARTIN 7698BLAKE 7839KING 7844TURNER 7986STERN empno ename step 2step 1 Fails! No sal attribute to restrict step 1 { empno, ename } [ sal > 1550 ] step 2

21 21 © Ellis Cohen 2001-2007 Composition Exercise For each of the REAL expressions below Write the corresponding SQL Write a simpler equivalent REAL expression Emps{ ename, job, sal }[sal < 1000]{ ename, job } Emps[sal < 1000]{ empno, ename, job }[ job = 'CLERK']

22 22 © Ellis Cohen 2001-2007 Combined Projection Emps{ ename, job, sal } -- get the ename, job & sal of each employee Emps{ ename, job, sal }[sal < 1000] -- get the ename, job & sal of each employee, -- and get those whose sal is less than 1000 Emps{ ename, job, sal }[sal < 1000]{ ename, job } -- get the ename, job & sal of each employee, -- get those whose sal is less than 1000 -- but really just get the ename & job -- NO POINT in restricting { ename, job, sal } first Emps[sal < 1000]{ ename, job } -- just get the employees whose sal is less than 1000 -- get just their ename & job SELECT ename, job FROM Emps WHERE sal < 1000

23 23 © Ellis Cohen 2001-2007 Combined Restriction Emps[sal < 1000] -- get the employees whose sal is less than 1000 Emps[sal < 1000]{ empno, ename, job } -- get the employees whose sal is less than 1000 -- get their empno, ename & job Emps[sal < 1000]{ empno, ename, job }[ job = 'CLERK'] -- get the employees whose sal is less than 1000 -- get their empno, ename & job -- get that information, but only for the clerks Emps[sal < 1000][ job = 'CLERK'] Emps[ (sal < 1000) AND (job = 'CLERK') ] -- get the clerks whose sal is less than 1000 Emps[sal < 1000][job = 'CLERK']{ empno, ename, job } Emps[ (sal < 1000) AND (job = 'CLERK') ]{ empno, ename, job } -- get the clerks whose sal is less than 1000 -- get their empno, ename & job SELECT empno, ename, job FROM Emps WHERE (sal < 1000) AND (job = 'CLERK')

24 24 © Ellis Cohen 2001-2007 Transformation Rules for Algebras Elementary Algebra Over: Over: Numbers Operators Operators (include) Sum (a.k.a +) Product (a.k.a. *)Rules Commutative: Sum(a,b) ↔ Sum(b,a) a + b ↔ b + a Associative: a + (b + c) ↔ ( a + b) + c Distributive: a * (b + c) ↔ a*b + a*c Relational Algebra Over: Over: Relations Operators Operators (include) Restrict (subset of rows) Project (subset of columns)Rules What are they? Algebras have rules for transforming one algebraic expression into another

25 25 © Ellis Cohen 2001-2007 Some Rules for Restrict Emps[ sal < 1000 ][ job = 'CLERK' ] Emps[ job = 'CLERK' ][ sal < 1000 ] Commutativity Rule for Restrict R[C1][C2] ↔ R[C2][C1] Emps[ (sal < 1000) AND (job = 'CLERK') ] Conjunction Rule for Restrict R[C1][C2] ↔ R[ C1 AND C2 ] What are some other rules for REAL?

26 26 © Ellis Cohen 2001-2007 Extended Projection

27 27 © Ellis Cohen 2001-2007 Calculating & Naming Attributes SELECT empno, ename AS empname, job, (sal * 52) AS yrsal FROM Emps Emps{ empno, empname:ename, job, yrsal:(sal*52) } Named Projection with Calculation in REAL

28 28 © Ellis Cohen 2001-2007 Bulk Prefixing Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but all prefixed in the same way empno ename job sal comm z_empno z_ename z_job z_sal z_comm SELECT empno AS z_empno, ename AS z_ename, job AS z_job, sal AS z_sal, comm AS z_comm FROM Emps Emps{ z_(*):* } or just z_$Emps Bulk Attribute Naming in REAL

29 29 © Ellis Cohen 2001-2007 Attribute Removal Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but with some of them removed empno ename job sal comm empno ename sal comm SELECT empno, ename, sal, comm FROM Emps Emps{ *,  job } Attribute Removal in REAL note: job not listed Then, remove job First, include all of Emps attributes

30 30 © Ellis Cohen 2001-2007 Attribute Replacement Given Emps (empno, ename, job, sal, comm) We want our query result to have all the same attributes, but with some attribute names replaced empno ename job sal comm empno empname job wksal comm SELECT empno, ename AS empname, job, sal AS wksal, comm FROM Emps Emps{ *, empname  ename, wksal  sal } Attribute Replacement in REAL Then, replace ename by empname First, include all of Emps attributes

31 31 © Ellis Cohen 2001-2007 Relational Algebra Exercise Assume sal is the weekly salary, and that all employees are paid 52 weeks/year. a)Write the REAL expressions to list the names, weekly salary (as wksal) and yearly salaries (as yrsal) of employees whose yearly salary is more than 70,000. b)Just list their names & weekly salaries (as wksal) c)Just list their employee number, name, job, & weekly salaries (as wksal)

32 32 © Ellis Cohen 2001-2007 Answer (a) to REAL Exercise List the names, weekly and yearly salaries of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000] { ename, wksal:sal, yrsal:(52*sal) } SELECT ename, sal AS wksal, 52*sal AS yrsal FROM Emps WHERE 52*sal < 70000 Emps{ ename, wksal:sal, yrsal:(52*sal) } [yrsal > 70000] SELECT ename, sal AS wksal, 52*sal AS yrsal FROM Emps WHERE yrsal < 70000 -- OK in SQL Server; NOT OK in Oracle

33 33 © Ellis Cohen 2001-2007 Answer (b) to REAL Exercise List the names, and weekly salaries (as wksal) of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000]{ ename, wksal:sal } SELECT ename, sal AS wksal FROM Emps WHERE 52*sal < 70000 Emps{ ename, wksal:sal }[52*wksal > 70000] Emps{ ename, wksal:sal, yrsal:(52*sal) } [yrsal > 70000]{ ename, wksal }

34 34 © Ellis Cohen 2001-2007 Answer (c) to REAL Exercise List the employee number, name, job, & weekly salaries of employees whose yearly salary is more than 70,000. Emps[52*sal > 70000]{ empno, ename, job, wksal:sal } Emps[52*sal > 70000]{ *, wksal  sal,  comm } SELECT empno, ename, job, sal AS wksal FROM Emps WHERE 52*sal < 70000

35 35 © Ellis Cohen 2001-2007 REAL Rules Exercise Design some additional REAL rules based on –Project –Restrict –Named Projection –Removal & Replacement

36 36 © Ellis Cohen 2001-2007 Comparisons

37 37 © Ellis Cohen 2001-2007 IS Comparison Operator v1 = v2 Result is NULL (think UNKNOWN) if either V1 IS NULL or V2 IS NULL v1 IS v2 (like =, but two NULLs match) Result is TRUE if either v1 = v2 v1 IS NULL and v2 IS NULL Result is FALSE otherwise As defined in SQL and REAL Only defined in REAL (not in SQL)

38 38 © Ellis Cohen 2001-2007 IS and IS NOT v1 IS NOT v2 means the same as NOT( v1 IS v2 ) It's like ≠, with NULL treated as an ordinary value Result is FALSE if either v1 = v2 v1 IS NULL and v2 IS NULL Result is TRUE otherwise

39 39 © Ellis Cohen 2001-2007 IS-Augmented Comparisons v1 > v2 TRUE if v1 > v2, FALSE if v2 ≤ v1 NULL if either v1 or v2 is NULL (i.e. result is unknown if either value is unknown) v1 IS > v2 TRUE if v1 > v2, FALSE if v2 ≤ v1 FALSE if either v1 or v2 is NULL (i.e. read this as v1 is definitely > v2) As defined in SQL and REAL Only defined in REAL (not in SQL)

40 40 © Ellis Cohen 2001-2007 Real Notions of Equality Strict Equality: v1 = v2 Result is NULL (think UNKNOWN) if v1 and/or v2 is NULL Projected Strict Equality: v1 IS = v2 Result is FALSE if v1 and/or v2 is NULL Extended Equality: v1 IS v2 Result is TRUE if both V1 and v2 are NULL Result is FALSE if only one of v1 or v2 is NULL All are the same if both v1 and v2 are non-NULL

41 41 © Ellis Cohen 2001-2007 IS NOT Augmented Comparisons v1 IS NOT > v2 means the same as NOT( v1 IS > v2 ) It represents the cases other than those where v1 is definitely > v2 sal IS NOT > 300 is equivalent to (sal ≤ 300) OR (sal IS NULL)

42 42 © Ellis Cohen 2001-2007 Negated Augmented Comparisons v1 IS ≤ v2 FALSE if v1 > v2, TRUE if v2 ≤ v1 FALSE if either v1 or v2 is NULL (i.e. read this as v1 is definitely ≤ v2) v1 IS NOT > v2 NOT(v1 > v2) FALSE if v1 > v2, TRUE if v2 ≤ v1 TRUE if either v1 or v2 is NULL (i.e. read this as v1 is not definitely > v2)

43 43 © Ellis Cohen 2001-2007 Real Notions of Inequality Strict Inequality: v1 ≠ v2, NOT(v1 = v2) Result is NULL (think UNKNOWN) if v1 and/or v2 is NULL Projected Strict Inequality: v1 IS ≠ v2 Result is FALSE if v1 and/or v2 is NULL Counter-Projected Strict Inequality: v1 IS NOT = v2, NOT( v1 IS = v2 ) Result is TRUE if v1 and/or v2 is NULL Extended Inequality: v1 IS NOT v2, NOT(v1 IS v2) Result is TRUE if only one of V1 or v2 is NULL Result is FALSE if both v1 and v2 are NULL All are the same if both v1 and v2 are non-NULL

44 44 © Ellis Cohen 2001-2007 Case Expressions

45 45 © Ellis Cohen 2001-2007 Simple Case Expressions SELECT ename, (CASE WHEN sal = 3000 THEN 'OVERPAID' ELSE to_char(sal) END) AS salary, sal FROM Emps Emps{ ename, salary:( sal =3000 ? 'OVERPAID', to_char(sal) ), sal } REAL Simple Case Expressions

46 46 © Ellis Cohen 2001-2007 Case Expressions and NULLs SELECT ename, (CASE WHEN sal = 3000 THEN 'OVERPAID' END) AS salary, sal FROM Emps Emps{ ename, salary:( sal =3000 ? 'OVERPAID' ), sal } salary will be NULL for those who are neither UNDERPAID or OVERPAID

47 47 © Ellis Cohen 2001-2007 Searched Case Expressions SELECT ename, (CASE job WHEN 'CLERK' THEN 'ASSISTANT' WHEN 'MANAGER' THEN 'CHIEF' ELSE job END) AS title, sal FROM Emps; Emps{ ename, title:( job='CLERK' ? 'ASSISTANT', job='MANAGER' ? 'CHIEF', job) sal } Searched Case Expressions in REAL

48 48 © Ellis Cohen 2001-2007 Duplicate Elimination

49 49 © Ellis Cohen 2001-2007 REAL Duplicate Elimination SELECT DISTINCT deptno FROM Emps Emps{ deptno ! } REAL Duplicate Elimination Note: The Classical Relational Algebra is set-based and automatically eliminates duplicates. REAL is based on Garcia-Molina, Ullman & Widom, and allows duplicate tuples in a relation Read ! as squeeze, specifically Read a trailing ! as "group squeeze" – Group together all the employees with the same deptno & squeeze out the duplicate deptno's

50 50 © Ellis Cohen 2001-2007 REAL Grouped Squeeze Emps{ deptno ! } empno ename deptno … 7839KING10… 7499ALLEN30… 7654MARTIN30… 7698BLAKE30… 7844TURNER30… 7986STERN50… Emps Order doesn't matter, so just show the Emps table ordered by deptno 10 30 50 deptno grouped squeeze { deptno ! }

51 51 © Ellis Cohen 2001-2007 Exercise: REAL Restriction & Grouped Squeeze What is the meaning of Emps[ sal > 1550 ]{ deptno ! }

52 52 © Ellis Cohen 2001-2007 Answer: REAL Restriction & Grouped Squeeze Emps[ sal > 1550 ]{ deptno ! } empno ename deptno sal comm 7499ALLEN301600300 7654MARTIN3012501400 7698BLAKE302850 7839KING105000 7844TURNER3015000 7986STERN501300 Emps 7499ALLEN301600300 7698BLAKE302850 7839KING105000 empno ename deptno sal comm 30 10 deptno step 1 step 2 [ sal > 1550 ] step 1 { deptno ! } 1.Get the employees whose make > 1550 2.Get the departments in which those employees work List the departments which have employees who make > 1550

53 53 © Ellis Cohen 2001-2007 Composite Duplicate Elimination SELECT DISTINCT deptno, job FROM Emps Emps{ deptno, job ! } 783910CLERK 749930ANALYST 765430ANALYST 769830CLERK 784430SALESMAN 798650CLERK 721450CLERK 758650SALESMAN Emps empno deptno job 10CLERK 30ANALYST 30CLERK 30SALESMAN 50CLERK 50SALESMAN deptno job List the distinct jobs within each department

54 54 © Ellis Cohen 2001-2007 Distinct Tuples 1. What is the effect of SELECT DISTINCT * from Emps Emps{ * ! } 2. What's the difference between Emps{ job, sal ! } Emps{ job, sal }{ * ! }

55 55 © Ellis Cohen 2001-2007 Distinct Tuple Answers 1. What is the effect of SELECT DISTINCT * from Emps Emps{ * ! } Lists Emps, eliminating duplicate tuples. This is the same as Emps, since Emps has a primary key, which ensures that (all values of empno, and therefore) all tuples arer unique 2. What's the difference between Emps{ job, sal ! } Emps{ job, sal }{ * ! } No difference. They both find all the unique pairs of jobs and salaries in Emps

56 56 © Ellis Cohen 2001-2007 Aggregate Functions

57 57 © Ellis Cohen 2001-2007 REAL Aggregate Functions SELECT count(comm) AS knt FROM Emps Emps{ ! knt:count(comm) } Aggregate Functions in REAL Read a leading ! as "aggregate squeeze" – Apply an aggregation function to all the rows and squeeze them down to a single result How many employees get commissions? The name is required in REAL

58 58 © Ellis Cohen 2001-2007 Aggregation Produces Relations SELECT avg(sal) AS avgsal, max(sal) AS maxsal FROM Emps Emps{ ! avgsal:avg(sal), maxsal:max(sal) } still produces a relation That relation has a single tuple with two attributes: avgsal and maxsal

59 59 © Ellis Cohen 2001-2007 REAL Aggregate Squeeze empno ename deptno sal 7499ALLEN301600 7654MARTIN301250 7698BLAKE302850 7839KING105000 7844TURNER301500 7986STERN501300 Emps 22505000 avgsal maxsal Emps{ ! avgsal:avg(sal), maxsal:max(sal) } aggregate squeeze Aggregation results in a relation with a single tuple! { ! avgsal:avg(sal), maxsal:max(sal) }

60 60 © Ellis Cohen 2001-2007 Exercise: REAL Restriction & Aggregation What is the REAL equivalent to SELECT avg(sal) AS avgsal FROM Emps WHERE deptno = 10

61 61 © Ellis Cohen 2001-2007 REAL Aggregation & Restriction Emps[ deptno = 10 ]{ ! avgsal:avg(sal) } empno ename deptno sal comm 3049DILIP101600300 7654MARTIN3012501400 7698BLAKE302850 7839KING105000 7844TURNER3015000 7986STERN501300 Emps 3049DILIP101600300 7839KING105000 empno ename deptno sal comm 3300 avgsal step 1 step 2 Can you do the project and the restrict in the opposite sequence? [ deptno = 10 ] step 1 { ! avgsal:avg(sal) } SELECT avg(sal) AS avgsal FROM Emps WHERE deptno = 10

62 62 © Ellis Cohen 2001-2007 Sequence Matters Again! Emps { ! avgsal:avg(sal) }[ deptno = 10 ] empno ename deptno sal comm 7499ALLEN301600300 7654MARTIN3012501400 7698BLAKE302850 7839KING105000 7844TURNER3015000 7986STERN501300 step 2step 1 Fails! No deptno attribute to restrict step 1 [ deptno = 10 ] step 2 2250 avgsal { ! avgsal:avg(sal) }

63 63 © Ellis Cohen 2001-2007 REAL Placement of Aggregate Functions Emps{ ! knt:count(deptno) } Aggregate functions CANNOT be used in restrictions e.g. [count(*) > 10] is ILLEGAL! Restriction specifies a test applied to a tuple at a time, so aggregation makes no sense! The ONLY place aggregate functions can appear are in curly braces after the ! The ONLY thing that can appear after the ! are (expressions involving) aggregate functions In REAL Remember: The name is required in REAL * *

64 64 © Ellis Cohen 2001-2007 Aggregate Function Exercise Using Emps( empno, ename, deptno, sal, comm ) Assume sal is the weekly salary, and that all employees work 40 hrs/week. Write REAL to determine the average hourly salary.

65 65 © Ellis Cohen 2001-2007 REAL Answers: Aggregate Functions Determine the average hourly salary. Emps{ ! avghsal:avg(sal/40) } Emps{ hrsal:(sal/40) } { ! avghsal:avg(hrsal) } Emps{ ! avgsal:avg(sal) } { avghsal:avgsal/40) }

66 66 © Ellis Cohen 2001-2007 Attribute Aggregation Problem Using Emps( empno, ename, deptno, sal, comm ) If only count(*) were allowed in REAL, but not count( attribute ), how would you write Emps{ ! knt:count(job) }

67 67 © Ellis Cohen 2001-2007 Attribute Aggregation Answer If only count(*) were allowed in REAL, but not count( attribute ), how would you write Emps{ ! knt:count(job) } Emps[ job IS NOT NULL ] { ! knt:count(*) }

68 68 © Ellis Cohen 2001-2007 Distinct Aggregation

69 69 © Ellis Cohen 2001-2007 Distinct Aggregation SELECT count(DISTINCT deptno) AS knt FROM Emps Emps{ ! knt:count(deptno !) } REAL Distinct Aggregation Distinct Aggregation can be used with any aggregation function, though it is primarily used with count How many different departments do employees work in?

70 70 © Ellis Cohen 2001-2007 Distinct Aggregation Problem Using Emps( empno, ename, deptno, sal, comm ) If distinct aggregation were not supported in REAL, (but you still could use ! for aggregation and to eliminate duplicates) how else could you write Emps{ ! knt:count(deptno !) } ?

71 71 © Ellis Cohen 2001-2007 Diagram for Distinct Aggregation Emps{ ! knt:count(deptno!) } empno ename deptno … 7839KING10… 7499ALLEN30… 7654MARTIN30… 7698BLAKE30… 7844TURNER30… 7986STERN50… Emps 10 30 50 deptno { deptno ! } { ! knt:count(deptno) } Emps{ deptno ! }{ ! knt:count(deptno) } 3 knt

72 72 © Ellis Cohen 2001-2007 Grouped Aggregation

73 73 © Ellis Cohen 2001-2007 REAL Grouped Aggregate Squeeze 105000 3018002850 5023003100 deptno avgsal maxsal Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } grouped aggregate squeeze empno deptno sal 7839105000 7499301600 7654301250 7698302850 7844301500 7986501500 7219503100 Emps group by deptnoaggregate each group A Grouped Aggregate Squeeze results in a relation with one tuple for each group! { deptno ! avgsal:avg(sal), maxsal:max(sal) }

74 74 © Ellis Cohen 2001-2007 SQL vs REAL Grouping SELECT deptno, avg(sal) AS avgsal, max(sal) AS maxsal FROM Emps GROUP BY deptno Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } GROUPING in REAL DON'T include deptno here too! The result already has attributes deptno and avgsal and maxsal

75 75 © Ellis Cohen 2001-2007 GROUP and DISTINCT Compare the results of SELECT job FROM Emps GROUP BY job SELECT DISTINCT job FROM Emps How would you write these both in REAL?

76 76 © Ellis Cohen 2001-2007 Answer: GROUP and DISTINCT SELECT job FROM Emps GROUP BY job SELECT DISTINCT job FROM Emps Emps{ job ! } Identical Results!

77 77 © Ellis Cohen 2001-2007 Composite Grouping SELECT deptno, job, count(*) AS knt FROM Emps GROUP BY deptno, job Emps{ deptno, job ! knt:count(*) } 783910CLERK 749930ANALYST 765430ANALYST 769830CLERK 784430SALESMAN 798650CLERK 721450CLERK 758650SALESMAN Emps empno deptno job 10CLERK1 30ANALYST2 30CLERK1 30SALESMAN1 50CLERK2 50SALESMAN1 deptno job knt How many employees hold each job within each department { deptno, job ! knt:count(*) }

78 78 © Ellis Cohen 2001-2007 Grouping & Distinct Aggregation SELECT deptno, count(DISTINCT job) AS njob FROM Emps GROUP BY deptno Emps{ deptno ! njob:count(job !) } 783910CLERK 749930ANALYST 765430ANALYST 769830CLERK 784430SALESMAN 798650CLERK 721450CLERK 758650SALESMAN Emps empno deptno job 101 303 502 deptno njob How many different jobs are there within each department { deptno ! njob:count(job !) }

79 79 © Ellis Cohen 2001-2007 Distinct Counts Problem What's the difference between Emps{ deptno ! knt:count(job!) } Emps{ deptno, job ! } { deptno ! knt:count(job) }

80 80 © Ellis Cohen 2001-2007 Diagram for Grouping Exercise 783910CLERK 749930ANALYST 765430ANALYST 769830CLERK 784430SALESMAN 798650CLERK 721450CLERK 758650SALESMAN Emps empno deptno job 10CLERK 30ANALYST 30CLERK 30SALESMAN 50CLERK 50SALESMAN deptno job { deptno, job ! } 101 303 502 deptno njob { deptno ! knt:count(job) } Emps{ deptno ! knt:count(job!) }

81 81 © Ellis Cohen 2001-2007 Distinct Counts With NULLs Emps{ deptno ! knt:count(job!) } – this ignores employees with NULL jobs Emps{ deptno, job ! } { deptno ! knt:count(job) } – No difference! This also ignores employees with NULL jobs Emps{ deptno, job ! } { deptno ! knt:count(*) } – the count will be one higher if any employees have NULL jobs

82 82 © Ellis Cohen 2001-2007 Group Restriction

83 83 © Ellis Cohen 2001-2007 Group Restriction Problem Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } Find the average and maximum salary of the employees in each department But suppose we only care about departments where the average salary is > 2000 105000 3018002850 5023003100 deptno avgsal maxsal grouped aggregate squeeze empno deptno sal 7839105000 7499301600 7654301250 7698302850 7844301500 7986501500 7219503100 Emps { deptno ! avgsal:avg(sal), maxsal:max(sal) }

84 84 © Ellis Cohen 2001-2007 REAL Group Restriction Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] Suppose we want to only keep departments whose average salary > 2000 105000 3018002850 5023003100 deptno avgsal maxsal empno deptno sal 7839105000 7499301600 7654301250 7698302850 7844301500 7986501500 7219503100 Emps 105000 5023003100 deptno avgsal maxsal [ avgsal > 2000 ] Keep those groups whose average salary > 2000 { deptno ! avgsal:avg(sal), maxsal:max(sal) }

85 85 © Ellis Cohen 2001-2007 Projected Group Restriction Exercise The preceding result has deptno, avgsal & maxsal attributes Write REAL to Determine just the deptno and the maximum salary of those departments where the average salary > 2000

86 86 © Ellis Cohen 2001-2007 REAL Projected Group Restriction Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] { deptno, maxsal } 105000 503100 deptno maxsal 105000 3018002850 5023003100 deptno avgsal maxsal empno deptno sal 7839105000 7499301600 7654301250 7698302850 7844301500 7986501500 7219503100 Emps 105000 5023003100 deptno avgsal maxsal [ avgsal > 2000 ] { deptno ! avgsal:avg(sal), maxsal:max(sal) } { deptno, maxsal }

87 87 © Ellis Cohen 2001-2007 Real HAVING SELECT deptno, max(sal) AS maxsal FROM Emps GROUP BY deptno HAVING avg(sal) > 2000 Determine the deptno and the maximum salary of those departments where the average salary > 2000 Emps{ deptno ! avgsal:avg(sal), maxsal:max(sal) } [avgsal > 2000] { deptno, maxsal }

88 88 © Ellis Cohen 2001-2007 Group Restriction Exercise Using Emps( empno, ename, job, sal, comm, deptno ) Write the REAL expression for the following: Show the average salary per job, excluding those jobs found only in a single department

89 89 © Ellis Cohen 2001-2007 Answer to Group Restriction Exercise Show the average salary per job, excluding those jobs found only in a single department Emps{ job ! avgsal:avg(sal), knt:count(deptno!) } [knt > 1]{ job, avgsal } SELECT job, avg(sal) AS avgsal FROM Emps GROUP BY job HAVING count(DISTINCT deptno) > 1


Download ppt "1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen 2002-2007 Basic Relational Algebra These slides."

Similar presentations


Ads by Google