SQL (cont'd) CSE3330 Spring 2014 Chengkai Li. NULL values.

SQL (cont’d) CSE3330 Spring 2014 Chengkai Li

NULL values

NULL values are all different The following query looks for employees who earn the same as their supervisors. If the database does not contain salary information of an employee and the supervisor (null values), the employee doesn’t appear in query result. (Which is what we shall expect.) SELECT E1.fname, E1.lname FROM EMPLOYEE AS E1, EMPLOYEE AS E2 WHERE E1.superssn=E2.ssn AND E1.salary= E2.salary;

Except that NULL values are all equal in UNION/INTERSECT/EXCEPT/DISTINCT/GROUP BY SELECT COUNT(*) FROM ( SELECT E.superssn FROM EMPLOYEE AS E, DEPARTMENT AS D WHERE E.dno=D.dnumber AND D.dname=‘research’ INTERSECT SELECT E.superssn FROM EMPLOYEE AS E, WORKS_ON AS W WHERE E.ssn=W.essn AND W.hours > 30 ) AS T; EMPLOYEE DEPARTMENT WORKS_ON ssn superssn dno dnumber dname essn proj hours --------------------------------- ------------------------------ -------------------------- 111 NULL 1 1 research 111 1 20 112 NULL 2 2 sales 112 2 35 113 111 1 113 1 35

Except that NULL values are all equal in UNION/INTERSECT/EXCEPT/DISTINCT/GROUP BY All employees with “null” salary values belong to a group. SELECT salary, COUNT(*) FROM EMPLOYEE GROUP BY salary;

Aggregation Take out null values, before aggregates are calculated. (except COUNT(*)) Suppose there are null values in “salary”. SELECT AVG(salary) FROM EMPLOYEE; is different from SELECT SUM(salary)/COUNT(*) FROM EMPLOYEE; SELECT COUNT(salary) FROM EMPLOYEE; is different from SELECT COUNT(*) FROM EMPLOYEE;

The confusing DISTINCT (at least in MySQL) SELECT a, b FROM T; a b ----------- 1 1 2 NULL SELECT DISTINCT a, b FROM T; a b ----------- 1 1 2 NULL SELECT COUNT (DISTINCT a, b) FROM T; COUNT (DISTINCT a, b) -------------------------------- 1 NULL values are all equal in DISTINCT, without COUNT. They are ignored in COUNT (DISTINCT …)

MySQL doesn’t provide EXCEPT/MINUS and INTERSECT

A nice write-up at http://faculty.utpa.edu/chebotkoa/main/teac hing/csci4333fall2013/slides/MySQL-set- operators.pdf http://faculty.utpa.edu/chebotkoa/main/teac hing/csci4333fall2013/slides/MySQL-set- operators.pdf However, complexity due to null value is not considered. Bag semantics is not considered either.

Rewrite EXCEPT and INTERSECT SELECT ssn FROM EMPLOYEE WHERE salary < 10000 EXCEPT SELECT essn FROM WORKS_ON WHERE hours > 30; Rewrite this query using IN, EXISTS, LEFT OUTER JOIN

Rewrite EXCEPT by IN SELECT ssn FROM EMPLOYEE WHERE salary < 10000 EXCEPT SELECT essn FROM WORKS_ON WHERE hours > 30; SELECT ssn FROM EMPLOYEE WHERE salary < 10000 AND ssn NOT IN (SELECT essn FROM WORKS_ON WHERE hours > 30);

Rewrite EXCEPT by EXISTS SELECT ssn FROM EMPLOYEE WHERE salary < 10000 EXCEPT SELECT essn FROM WORKS_ON WHERE hours > 30; SELECT ssn FROM EMPLOYEE WHERE salary < 10000 AND NOT EXISTS (SELECT essn FROM WORKS_ON WHERE hours > 30 AND essn=ssn);

Rewrite EXCEPT by LEFT OUTER JOIN SELECT ssn FROM EMPLOYEE WHERE salary < 10000 EXCEPT SELECT essn FROM WORKS_ON WHERE hours > 30; Incorrect rewriting: SELECT ssn FROM EMPLOYEE LEFT OUTER JOIN WORKS_ON ON ssn=essn WHERE salary < 10000 AND hours <= 30; WORKS_ON EMPLOYEE essn pno hours ssn salary ------------------------- --------------------- 111 1 31 111 5000 111 2 9

Rewrite EXCEPT by LEFT OUTER JOIN Correct rewriting: SELECT ssn FROM ((SELECT ssn FROM EMPLOYEE WHERE salary < 10000) AS R LEFT OUTER JOIN (SELECT essn FROM WORKS_ON WHERE hours > 30) AS T ON R.ssn=T.essn) WHERE T.essn IS NULL; WORKS_ON EMPLOYEE R T R left outer join T essn pno hours ssn salary ssn essn ssn essn ------------------------- --------------------- ------- ------- -------------------------- 111 1 31 111 5000 111 111 111 111 111 2 9 112 8000 112 112 112 1 20