Big Data Yuan Xue CS 292 Special topics on
Part I Relational Database (SQL) Yuan Xue
Creating and Using a Relational Database Steps in creating and using a (relational) database 1. Design schema (using DDL – data definition language) 2. Initialization: “Bulk load” initial data 3. Operation: execute queries and modifications (using DML – data manipulation language) Data Meta-data: database definition
SQL Introduction Programming language for data management in a relational database management system(RDBMS) Both Data Definition Language (DDL) and Data Manipulation Language (DML) DDL: create, drop table DML: query (select), insert, update and delete data from table Standardized and supported by all major commercial database systems One of the major reasons for commercial success of RDBMS Interactive via GUI or command line, or embedded in programs
Data Definition in SQL
CREATE in SQL CREATE TABLE MiniTwitter.User (IDVARCHAR(20)NOT NULL, NameVARCHAR(20)NOT NULL, … PRIMARY KEY (ID) FOREIGN KEY (ID) REFERENCE Follow(Followee) FOREIGN KEY (ID) REFERENCE Follow(Follower) ); User IDName Password Aadf1234 com qwer6789 CREATE SCHEMA MiniTwitter; FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Follow
CREATE in SQL Data types in SQL Numeric INT, FLOAT, DEC Character, or String CHAR, VARCHAR Bit-string BIT, BLOB (binary large object) Boolean Date, Time DATE, TIME, TIMESTAMP CREATE TABLE MiniTwitter.Tweet (IDVARCHAR(20)NOT NULL, TimestampTIMESTAMPNOT NULL, … PRIMARY KEY (ID) ); Tweet IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure..
CREATE in SQL CREATE TABLE MiniTwitter.Follow (FolloweeVARCHAR(20)NOT NULL, FollowerVARCHAR(20)NOT NULL, TimestampTIMESTAMPNOT NULL, PRIMARY KEY (Followee, Follower) ); FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Follow
Data Manipulation in SQL
SELECT in SQL SELECT-FROM-WHERE structure of basic SQL queries SELECT A 1, A 2,…,A n FROM R 1, R 2,…,R m WHERE Condition; Attribute to return Relationship/table Conditional expression
Example Retrieve the timestamp and content of all tweets whose author is “alice00” User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
Example IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Follow SELECT Timestamp, Content FROM Tweet WHERE Author = ‘Alice00’; Select condition
Example Retrieve the content of all tweets whose author is followed by “Alice00” User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
Example IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Follow SELECT content FROM Tweet, Follow WHERE Follower = ‘Alice00’ AND Author = Follower; Select condition Join condition Select-project-join query
Example Retrieve the timestamp and content of all tweets whose author is followed by “Alice00” User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Qualify the attribute name to prevent ambiguity
Example IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Follow SELECT Tweet.Timestamp, Content FROM Tweet, Follow WHERE Follower = ‘Alice00’ AND Author = Follower; Select condition Join condition
Aggregate Functions in SQL Aggregate function: summarize information from multiple tuples Basic aggregate operations in SQL COUNT, SUM, MAX, MIN, AVG Retrieve the number of people that “Alice00” is following Retrieve the number of people who are following “Alice00”
Aggregate Functions in SQL Aggregate function: summarize information from multiple tuples Basic aggregate operations in SQL COUNT, SUM, MAX, MIN, AVG SELECT COUNT(*) FROM Follow WHERE Follower = ‘Alice00’; Retrieve the number of people that “Alice00” is following SELECT COUNT(*) FROM Follow WHERE Followee = ‘Alice00’; Retrieve the number of people who are following “Alice00”
Nested SQL query Show the names of users with more than 10 followers User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
Nested SQL query User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice SELECT Name FROM User WHERE (SELECT COUNT (*) FROM Follow WHERE Followee = ID)>10);
GROUP in SQL Show the names of all users with the number of tweets from them User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
GROUP in SQL User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice SELECT User.Name, COUNT(Tweet.ID) FROM User, Tweet WHERE User.ID = Tweet.Author GROUP BY Tweet.Author;
ORDER BY in SQL Show the names of users who follow “Alice00” based on the time of “following” relationship User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
ORDER BY in SQL User Tweet Follow IDName Password Aadf1234 com qwer6789 IDTimestampAuthorContent Alice00Hello Bob2013Nice weather Not sure.. FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice SELECT User.Name, FROM User, Follow WHERE User.ID = Follow.Follower AND Follow.Followee = “Alice00” ORDER BY Follow.Timestamp;
INSERT in SQL INSERT INTO Follow VALUES (“Cathy123”, “Bob2013”, ) ; Follow FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Cathy123Bob Follow FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
DELETE in SQL DELETE FROM Follow WHERE Followee = ‘Alice00’ AND FOllower = ‘Bob2013’; Follow FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Follow FolloweeFollowerTimestamp Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
UPDATE in SQL UPDATE Follow SET Timestamp = ‘ ’ WHERE Followee = ‘Alice00’ AND Follower = ‘Bob2013’; Follow FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice Follow FolloweeFollowerTimestamp Alice00Bob Bob2013Cathy Alice00Cathy Cathy123Alice Bob2013Alice
More on SQL Drop table Outer Join Indexes, Constraints, Views, Triggers, Transactions, Authorization Substring Pattern Matching and Arithmetic Operators Check out: