? Data Science 100 Databases Part 2 (The SQL) Slides by:

Slides:



Advertisements
Similar presentations
SQL: The Query Language Part 2
Advertisements

1 Advanced SQL Queries. 2 Example Tables Used Reserves sidbidday /10/04 11/12/04 Sailors sidsnameratingage Dustin Lubber Rusty.
Relational Algebra Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
CS 166: Database Management Systems
Database Management Systems 1 Raghu Ramakrishnan SQL: Queries, Programming, Triggers Chpt 5.
SQL: The Query Language Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein and etc for some slides.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5.
CS 405G: Introduction to Database Systems
SQL Review.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 SQL: Structured Query Language (‘Sequel’) Chapter 5.
SQL 2 – The Sequel R&G, Chapter 5 Lecture 10. Administrivia Homework 2 assignment now available –Due a week from Sunday Midterm exam will be evening of.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 90 Database Systems I SQL Queries.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 SQL (Simple Query Language). 2 Query Components A query can contain the following clauses –select –from –where –group by –having –order by Only select.
FALL 2004CENG 351 File Structures and Data Management1 SQL: Structured Query Language Chapter 5.
1 Relational Algebra. 2 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 The Oracle Database System Querying the Data Database Course The Hebrew University of Jerusalem.
1 SQL: Structured Query Language Chapter 5. 2 SQL and Relational Calculus relationalcalculusAlthough relational algebra is useful in the analysis of query.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
CSC343 – Introduction to Databases - A. Vaisman1 SQL: Queries, Programming, Triggers.
Relational Algebra.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L6_SQL(1) 1 SQL: Queries, Constraints, Triggers Chapter 5 – Part 1.
SQL Part I: Standard Queries. COMP-421: Database Systems - SQL Queries I 2 Example Instances sid sname rating age 22 debby debby lilly.
SQL Examples CS3754 Class Note 11 CS3754 Class Note 11, John Shieh,
SQL: Queries, Programming, Triggers. Example Instances We will use these instances of the Sailors and Reserves relations in our examples. If the key for.
ICS 321 Fall 2009 SQL: Queries, Constraints, Triggers Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 9/8/20091Lipyeow.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
Relational Algebra  Souhad M. Daraghma. Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Relational Algebra.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
CMPT 258 Database Systems SQL Queries (Chapter 5).
1 SQL: Structured Query Language (‘Sequel’) Chapter 5.
SQL: The Query Language Part 1 R &G - Chapter 5 The important thing is not to stop questioning. Albert Einstein.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
1 CS122A: Introduction to Data Management Lecture #7 Relational Algebra I Instructor: Chen Li.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Basic SQL Queries.
DataBase - Check 01 DataBase 2 nd year Computer Science & Engineer 1.
SQL – Part 2.
Basic SQL Queries Go over example queries, like 10 > ALL.
COP Introduction to Database Structures
© פרופ' יהושע שגיב, האוניברסיטה העברית
01/31/11 SQL Examples Edited by John Shieh CS3754 Classnote #10.
SQL The Query Language R & G - Chapter 5
Relational Algebra Chapter 4, Part A
Basic SQL Lecture 6 Fall
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Database Applications (15-415) SQL-Part II Lecture 9, February 04, 2018 Mohammad Hammoud.
Relational Algebra.
CS 405G: Introduction to Database Systems
SQL: Queries, Constraints, Triggers
Relational Algebra Chapter 4 - part I.
SQL: The Query Language Part 1
CS4222 Principles of Database System
SQL: Structured Query Language
CENG 351 File Structures and Data Managemnet
Relational Algebra Chapter 4 - part I.
SQL: The Query Language (Part III)
Relational Calculus Chapter 4 – Part II.
SQL: Queries, Programming, Triggers
Presentation transcript:

? Data Science 100 Databases Part 2 (The SQL) Slides by: Joseph E. Gonzalez, Joseph Hellerstein, Josh Hug jegonzal@berkeley.edu jhellerstein@berkeley.edu hug@cs.Berkeley.edu

Previously …

Database Management Systems A database management systems (DBMS) is a software system that stores, manages, and facilitates access to one or more databases. Relational database management systems Logically organize data in relations (tables) Structured Query Language (SQL) to define, manipulate and compute on data.

Physical Data Independence Database Management System Relations (Tables) Optimized Data Structures Name Prod Price Sue iPod $200.00 Joey Bike $333.99 Alice Car $999.00 B+Trees sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 58 rusty 10 Abstraction Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Optimized Storage bid bname color 101 Interlake blue 102 red 104 Marine 103 Clipper green Page Header

Conceptual SQL Evaluation SELECT [DISTINCT] target-list FROM relation-list WHERE qualification GROUP BY grouping-list HAVING group-qualification Try Queries Here http://sqlfiddle.com/#!17/67109/12 Form groups & aggregate GROUP BY [DISTINCT] Eliminate duplicates Project away columns (just keep those used in SELECT, GBY, HAVING) Apply selections (eliminate rows) WHERE SELECT One or more tables to use (outer product …) Eliminate groups FROM HAVING

Data in the Organization A little bit of buzzword bingo!

Multidimensional Data Model Sales Fact Table Locations Dimension Tables pid timeid locid sales 11 1 25 2 8 3 15 12 30 20 50 13 10 35 22 26 45 40 5 locid city state country 1 Omaha Nebraska USA 2 Seoul Korea 5 Richmond Virginia Products Fact Table minimizes redundant info. Reduces data errors Dimensions easy to manage and summarize Rename: Galaxy1  Phablet pid pname category price 11 Corn Food 25 12 Galaxy 1 Phones 18 13 Peanuts 2 An older version of this slide mentioned that this is a Normalized Representation, but students have absolutely no idea what database normalization is so I took this out (Josh). Time timeid Date Day 1 3/30/16 Wed. 2 3/31/16 Thu. 3 4/1/16 Fri.

Connections between table Products Time pid pname category price timeid Date Day Sales Fact Table pid timeid locid sales How do we do analysis? Joins!!!!! Locations locid city state country Trivia: This approach is sometimes called a “star schema” because drawing resembles a star with our fact table at the center, and other tables at the points.

Joins! Bringing tables together for decades.

Symbol for join (Rel. Alg.) Joins We’ve seen the idea of joining tables many times. Joins can be thought of as a 2-step process: Form the cartesian product (see next slide). Select the rows where common attributes are equal. Symbol for join (Rel. Alg.) R: S: R ⋈ S sid bid day 22 101 10/10/96 58 103 11/12/96 sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 58 103 11/12/96 rusty 10 35.0 http://sqlfiddle.com/#!17/53815/1140/0

The Cartesian Product (×) R × S: Each row of R paired with each row of S R: R × S sid bid day 22 101 10/10/96 58 103 11/12/96 sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 103 11/12/96 × = S: sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 Default of Union is to apply set semantics. Recall that Union All is required to keep duplicates. How many rows in the result? |R| * |S| More general idea in set theory: 2 1 a b c 1,a 1,b 1,c 2,a 2,b 2,c Sometimes called the cross product.

The Cartesian Product in SQL R × S SELECT * FROM Reserves AS R, Sailors AS S; sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 103 11/12/96 R: S: sid bid day 22 101 10/10/96 58 103 11/12/96 sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0

Joining Sailors and Reservations Joins can be thought of as a 2-step process: Form the cartesian product. Select the rows where common attributes are equal. sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 103 11/12/96 R: S: sid bid day 22 101 10/10/96 58 103 11/12/96 sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 R ⋈ S sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 58 103 11/12/96 rusty 10 35.0

Relational Algebra R ⋈ S R ⋈ S Joins can be thought of as a 2- step process: Form the cartesian product. Select the rows where common attributes are equal. If this seems cool, consider learning about relational algebra. Provides a mathematical foundation for databases. Video 1: [Link] Video 2: [Link] sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 103 11/12/96 R ⋈ S sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 58 103 11/12/96 rusty 10 35.0

Joining Sailors and Reservations in SQL SELECT * FROM Reserves AS R, Sailors AS S WHERE R.sid = S.sid; sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 103 11/12/96 R: S: sid bid day 22 101 10/10/96 58 103 11/12/96 sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 R ⋈ S sid bid day sname rating age 22 101 10/10/96 dustin 7 45.0 58 103 11/12/96 rusty 10 35.0

Joining Sailors and Reservations in SQL SELECT * FROM Reserves AS R, Sailors AS S WHERE R.sid = S.sid; Conceptually useful to think of joining as: Form the cartesian product. Select the rows where common attributes are equal. However, SQL engine does not implement joins this way. Query optimizer does something much more efficient (see CS186).

Physical Data Independence Database Management System Relations (Tables) Optimized Data Structures Name Prod Price Sue iPod $200.00 Joey Bike $333.99 Alice Car $999.00 B+Trees sid sname rating age 28 yuppy 9 35.0 31 lubber 8 55.5 44 guppy 5 58 rusty 10 Abstraction Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 Optimized Storage bid bname color 101 Interlake blue 102 red 104 Marine 103 Clipper green Page Header

Range Variables and AS The right hand side of AS keyword are “range variables”. Also possible to define range variables without AS keyword. SELECT R.sid, sname, age FROM Reserves AS R, Sailors AS S WHERE R.sid = S.sid; SELECT R.sid, sname, age FROM Reserves R, Sailors S WHERE R.sid = S.sid; sid sname age 22 dustin 45.0 58 rusty 35.0

Range Variables and AS The right hand side of AS keyword are “range variables”. Also possible to define range variables without AS keyword. Range variables often technically optional, but do use them! SELECT R.sid, sname, age FROM Reserves AS R, Sailors AS S WHERE R.sid = S.sid; SELECT R.sid, sname, age FROM Reserves R, Sailors S WHERE R.sid = S.sid; sid sname age 22 dustin 45.0 58 rusty 35.0 SELECT Reserves.sid, sname, age FROM Reserves, Sailors WHERE Reserves.sid = Sailors.sid; Technically works, but don’t do this…

Nonequi Joins The joins we’ve seen so far are equi joins: rows from two tables are matched based only on equality conditions. If rows are matched some other way, then we have a nonequi join. SELECT S1.sname as junior_name, S1.age as junior_age, S2.sname as senior_name, S2.age as senior_age FROM Sailors AS S1, Sailors AS S2 WHERE S1.age < S2.age ???? Not correct? Sailors S1 S2 junior_name junior_age senior _name senior_age rusty 35.0 dustin 45.0 lubber 55.5 sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 http://sqlfiddle.com/#!17/53815/4

Inner/Natural Joins Sailors Boats sid sname rating age 1 Fred 7 22 2 Jim 39 3 Nancy 8 27 bid bname color 101 Nina red 102 Pinta blue 103 Santa Maria SELECT s.sid, s.sname, r.bid FROM Sailors s, Reserves r WHERE s.sid = r.sid AND s.age > 20; FROM Sailors s INNER JOIN Reserves r ON s.sid = r.sid WHERE s.age > 20; FROM Sailors s NATURAL JOIN Reserves r WHERE s.age > 20; Reserves sid bid day 1 102 9/12 2 9/13 Most common format! “NATURAL” means equijoin for each pair of attributes with the same name all 3 are equivalent! http://sqlfiddle.com/#!17/4215a/10

Join Variants INNER is default SELECT (column_list) FROM table_name [INNER | LEFT OUTER |RIGHT OUTER | FULL OUTER] JOIN table_name ON qualification_list WHERE … INNER is default Inner join is akin to what we have seen so far. FULL OUTER is the equivalent of join=‘outer’ in pandas. “OUTER” is optional, but recommended for clarity.

Left Join SELECT s.sid, s.sname, r.bid Returns all matched rows, and preserves all unmatched rows from the table on the left of the join clause (use nulls in fields of non-matching tuples) SELECT s.sid, s.sname, r.bid FROM Sailors2 s LEFT OUTER JOIN Reserves2 r ON s.sid = r.sid; Returns all sailors & bid for boat in any of their reservations Note: If there is a sailor without a boat reservation then the sailor is matched with the NULL bid.

SELECT s.sid, s.sname, r.bid FROM Sailors2 s LEFT OUTER JOIN Reserves2 r ON s.sid = r.sid; rating age 22 Dustin 7 45 31 Lubber 8 55.5 95 Bob 3 63.5 sid bid day 22 101 1996-10-10 95 103 1996-11-12 sid sname bid 22 Dustin 101 95 Bob 103 31 Lubber (null) http://sqlfiddle.com/#!17/54a88/2

Left Join vs. Left Outer Join SELECT s.sid, s.sname, r.bid FROM Sailors2 s LEFT OUTER JOIN Reserves2 r ON s.sid = r.sid; is the exact same as: FROM Sailors2 s LEFT JOIN Reserves2 r Also called a Left Outer Join (note that there is no such thing as a Left Inner Join)

Right Join SELECT r.sid, b.bid, b.bname Right join returns all matched rows, and preserves all unmatched rows from the table on the right of the join clause SELECT r.sid, b.bid, b.bname FROM Reserves2 r RIGHT OUTER JOIN Boats2 b ON r.bid = b.bid; Returns all boats & information on which ones are reserved. No match for b.bid? r.sid IS NULL! As before, there is no such thing as a right inner join!

SELECT r.sid, b.bid, b.bname FROM Reserves2 r RIGHT OUTER JOIN Boats2 b ON r.bid = b.bid; color 101 Interlake blue 102 red 103 Clipper green 104 Marine sid bid day 22 101 1996-10-10 95 103 1996-11-12 sid bid bname 22 101 Interlake 95 103 Clipper (null) 104 Marine 102 Result: http://sqlfiddle.com/#!17/a7b2f/1

Full Outer Join Full Outer Join returns all (matched or unmatched) rows from the tables on both sides of the join clause SELECT r.sid, b.bid, b.bname FROM Reserves2 r FULL OUTER JOIN Boats2 b ON r.bid = b.bid If no boat for a sailor?  b.bid IS NULL AND b.bname IS NULL! If no sailor for a boat?  r.sid IS NULL!

SELECT r.sid, b.bid, b.bname FROM Reserves3 r FULL JOIN Boats2 b ON r.bid = b.bid color 101 Interlake blue 102 red 103 Clipper green 104 Marine sid bid day 22 101 1996-10-10 95 103 1996-11-12 38 42 2010-08-21 sid bid bname 22 101 Interlake  95 103 Clipper  38 (null) 104 Marine  102 Result: http://sqlfiddle.com/#!17/e1f3a/3/0

Brief Detour: Null Values Field values are sometimes unknown SQL provides a special value NULL for such situations. Every data type can be NULL The presence of null complicates many issues. E.g.: Selection predicates (WHERE) Aggregation But NULLs are common after outer joins

NULL in the WHERE clause Consider a tuple where rating IS NULL. If we run the following query Jack Sparrow will not be included in the output. INSERT INTO sailors VALUES (11, 'Jack Sparrow', NULL, 35); SELECT * FROM sailors WHERE rating > 8; http://sqlfiddle.com/#!17/36ca9/2

NULL in comparators What entries are in the output of all these queries? SELECT rating = NULL FROM sailors; SELECT rating < NULL FROM sailors; SELECT rating >= NULL FROM sailors; SELECT * FROM sailors WHERE rating = NULL; Rule: (x op NULL) evaluates to … NULL! All of these queries evaluate to null! Even this one! http://sqlfiddle.com/#!17/f35aa/6

Explicit NULL Checks To check if a value is NULL you must use explicit NULL checks SELECT * FROM sailors WHERE rating IS NULL; SELECT * FROM sailors WHERE rating IS NOT NULL; http://sqlfiddle.com/#!17/f35aa/4

NULL in Boolean Logic T F N T F N Three-valued logic: And Or Not T F N SELECT * FROM sailors WHERE rating > 8 AND TRUE; SELECT * FROM sailors WHERE rating > 8 OR TRUE; SELECT * FROM sailors WHERE NOT (rating > 8); Not T F N http://sqlfiddle.com/#!17/f35aa/2

NULL and Aggregation http://bit.ly/ds100-sp18-null 4 27 ?? SELECT count(rating) FROM sailors; 4 SELECT sum(rating) FROM sailors; 27 SELECT avg(rating) FROM sailors; ?? SELECT count(*) FROM sailors; sid sname rating age 1 Popeye  10 22 2 OliveOyl  11 39 3 Garfield  27 4 Bob  5 19 Jack Sparrow  (null) 35 http://bit.ly/ds100-sp18-null http://sqlfiddle.com/#!17/f35aa/7

NULL and Aggregation 4 27 (10+11+1+5) / 4 = 6.75 5 SELECT count(rating) FROM sailors; 4 SELECT sum(rating) FROM sailors; 27 SELECT avg(rating) FROM sailors; (10+11+1+5) / 4 = 6.75 SELECT count(*) FROM sailors; 5 sid sname rating age 1 Popeye  10 22 2 OliveOyl  11 39 3 Garfield  27 4 Bob  5 19 Jack Sparrow  (null) 35 http://sqlfiddle.com/#!17/f35aa/7

NULLs: Summary NULL op NULL is NULL WHERE NULL: do not send to output Boolean connectives: 3-valued logic Aggregates ignore NULL-valued inputs

More Advanced SQL This material is all bonus material. Won’t appear on HWs or exams.