Database is SQL1.mdb ◦ import using MySQL Migration Toolkit http://www.mysql.com/products/tools/ http://www.mysql.com/products/tools/
Based on tables used in ◦ “The Essence of SQL”, by David Rozenshtein ◦ Five tables STUDENT PROFESSOR COURSE TAKE TEACH ◦ Three types of SQL queries ◦ Eighteen standard questions “Essence” discusses methodology, however does not provide any data ◦ SQL1 database contains data according to the book’s table definitions ◦ Data is ‘salted’ (known results for the queries)
Can be lumped into six categories ◦ Type I, II, or III categories Type I: single level queries Type II: nested, no correlations Type III: nested, with correlations ◦ Aggregation sums, averages, grouping ◦ Enhancements sorting ◦ Extensions WHILE loop
E1.Who takes CS112? E2.What are the student names and numbers of students taking CS112? E3.Who takes CS112 or CS114? E4.Who take both CS112 and CS114? E5.Who does not take CS112? E6.Who takes a course which is not CS112? E7.Who takes at least two courses? E8.Who takes at most two courses? E9.Who takes exactly two courses? E10. Who takes only CS112? E11. Who takes either CS112 or CS114? (not both) E12. Who are the youngest students? E13. Who takes every course?
Type 1 queries ◦ basic form has three clauses SELECT FROM WHERE E1 (Who takes CS112?) SELECT sno FROM Take WHERE (cno=“CS112”); E2 (Who takes CS112? – show name and number) SELECT Student.sno, Sname FROM Student,Take WHERE (Student.sno=Take.sno) AND (cno=“CS112”);
“distinct” keyword requires sorting ◦ performance hit rewrite the WHERE clause to use list membership operator “IN” ◦ IN returns true is the value on the left is equal to one of the value list members on the right WHERE (cno IN(“CS112”,”CS114”)) The “IN” operator is important for Type II queries!
Who takes both CS112 and CS114? SELECT sno FROM take WHERE (cno=“CS112”) and (cno=“CS114”); Why is this wrong? ◦ it will compile and return an answer… so.. why? ◦ because it tries to find any entry for which the course number is CS112 and CS114… it can only be one at a time… ◦ translation from English to SQL needs to be considered rephrase the question? reverse engineer the problem?
The information resides in the ‘take’ table ◦ find students who take CS112 (who takes one) ◦ find students who take CS114 (who take the other) ◦ determine if it’s the same student (where are they equal) Ideally use another ‘copy’ of the take table SELECT take.sno FROM take, take WHERE (take.sno=take.sno) AND (take.cno=“CS112”) and (take.cno=“CS114”); however…. how does the compiler know which take table to use when??? ◦ Develop an ‘alias’ for one of the take tables, and then use the alias name!
Alias ◦ created in the FROM clause and used only for the duration of the query SELECT X.sno FROM take X, take WHERE (X.sno=take.sno) AND (X.cno=“CS112”) and (take.cno=“CS114”);
E5: Who does not take CS112? E6: Who takes a course which is not CS112? Included in E5?Included in E6? Student takes both 112 and 114 NoYes Student takes nothingYesNo Only takes 112No Only takes 114Yes
Who takes a course which is not CS112? ◦ (student will be selected even if takes CS112 in addition to another course) SELECT sno FROM take WHERE (cno!=“CS112”);
Who does not take CS112? ◦ A Type II query ◦ done in two passes, not one (subquery) 1 st pass who takes CS112? 2 nd pass who is not in the results of the first pass?
Who takes at least two classes? ◦ Last of the Type I queries ◦ Use two “copies” of the TAKE table See who exists in both copies, but have DIFFERENT classes The student number the same, but differ class numbers SELECT x.sno FROM take x, take y Where (x.sno=y.sno) AND (x.cno != y.cno);
Questions 1,2,3,4,6,7 can be answered with Type I queries ◦ Question 5 is a Type II query and cannot be answered with a Type I query
E8 through E11 E8: Who takes at most two courses? SELECT sno FROM student WHERE NOT (sno IN (Select x.sno From Take x, Take y, Take Where (x.sno=y.xno) And (y.sno=take.sno) And (x.cno!=y.cno) And (y.cno !=take.cno) And (x.cno!=take.cno)));
Who takes exactly two courses ◦ Who takes at least two courses and does not take at least three courses? Note: 1)“x” alias used twice in main and subquery, but “scope” keeps them separate 2)The “>” used in the where clause is used to eliminate duplicates
Who takes only CS112? ◦ Who takes CS112 and is not in the group of who does not take CS112? SELECT sno FROM take WHERE (cno=“CS112”) AND NOT (sno IN (SELECT sno FROM take WHERE (cno != “CS112”))); Alternate: simplified query
Who take either CS112 or CS114? ◦ But not both… ◦ Find who takes both and from that Find who is NOT in that group, but takes CS112 or CS114 SELECT sno FROM take WHERE ((cno=“CS112”) or (cno=“CS114”)) AND NOT (sno IN (SELECT x.sno FROM take x, take WHERE (x.sno=take.sno) AND (x.cno=“CS112) AND (take.cno=“CS114))); NOTE: extra parens around the “OR” In the WHERE clause. This means that the OR is executed first;. Normal Precedence: NOT, AND OR
E12: Who are the youngest students? ◦ The min( ) function can be used or a TYPE II can do it as well SELECT sno FROM student WHERE NOT (age IN (SELECT x.age FROM student x, student WHERE (x.age > student.age))); The subquery returns all ages if there is age smaller than it ◦ It excludes the smallest age Then the main query retrieves the students who are not in that group
Who takes every course? ◦ Note the use of the CONCAT( ) function Allows us to make up a new “field” for the duration of the query Saves comparing multiple fields
Local reference in each level of the query ◦ This makes it a TYPE II Innermost sno,cno comes from TAKE Middle sno comes from cross-product of STUDENT,COURSE Outermost sno comes from STUDENT
For each department that has more than 2 professors older than 40, what is the average salary of these professors? SELECT dept,avg(salary) FROM professor WHERE (age>40) GROUP BY dept HAVING (COUNT(*)>2);
Count(*) from professor ◦ Counts the rows Count(Distinct(dept)) from professor ◦ Counts how many unique values exist Count would return When using aggregate functions (min, max, avg, sum, count, etc) you cannot have a non- aggregated column in the SELECT or HAVING clause unless it’s already aggregated in the GROUP BY clause
What is the GPA of each student? Select sno, GPA=round(SUM(Grade*credits)/Sum(credits),2) FROM take,course WHERE (take.cno=course.cno) GROUP BY sno;
TYPE I question involving “at least, at most, exactly” can be performed using aggregate SELECT sno FROM take GROUP BY sno HAVING (count(*)>=3);
What is the overall average salary of all professors older than 40? SELECT AVG(salary) FROM professor WHERE (age>40); GROUP BY clause not needed ◦ Implicit ‘single group’ of all professors over 40
Which professor’s salary is greater than the overall average salary? SELECT x.fname,x.lname FROM professor x, professor y GROUP BY x.fname,x.lname,x.salary HAVING (x.salary > avg(y.salary));
Which professor’s salary is greater than his/her department’s average salary? SELECT x.fname,x.lname FROM professor x,professor y WHERE (x.dept=y.dept) GROUP BY x.fname,x.lname,x.salary HAVING (x.salary > avg(y.salary));
Sample Type III SELECT sno,sname FROM student WHERE (“CS112” IN (SELECT cno FROM take WHERE (sno=student.sno))); What makes this a Type III ? ◦ The reference in the inner where [take.]sno = student.sno ◦ “reaches” outside the scope of the inner query ◦ It is a “non-local reference” or “correlation” It’s E2… What are the student numbers and names of the students taking CS112?
Who takes CS112? ◦ Using the EXISTS operator Returns TRUE if at least one row returned by subquery SELECT sno,sname FROM student WHERE EXISTS (SELECT * FROM take WHERE (sno=student.sno) AND (cno=“CS112”)); EXISTS requires usage of the “*” in the select subquery