Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Yuan Xue CS 292 Special topics on.

Similar presentations


Presentation on theme: "Big Data Yuan Xue CS 292 Special topics on."— Presentation transcript:

1 Big Data Yuan Xue (yuan.xue@vanderbilt.edu) CS 292 Special topics on

2 Part I Relational Database (SQL) Yuan Xue (yuan.xue@vanderbilt.edu)

3 Creating and Using a Relational Database  Steps in creating and using a (relational) database 1. Design schema (using DDL – data definition language) 2. Initialization: “Bulk load” initial data 3. Operation: execute queries and modifications (using DML – data manipulation language) Data Meta-data: database definition

4 SQL Introduction  Programming language for data management in a relational database management system(RDBMS)  Both Data Definition Language (DDL) and Data Manipulation Language (DML)  DDL: create, drop table  DML: query (select), insert, update and delete data from table  Standardized and supported by all major commercial database systems  One of the major reasons for commercial success of RDBMS  Interactive via GUI or command line, or embedded in programs

5 Data Definition in SQL

6 CREATE in SQL CREATE TABLE MiniTwitter.User (IDVARCHAR(20)NOT NULL, NameVARCHAR(20)NOT NULL, … PRIMARY KEY (ID) FOREIGN KEY (ID) REFERENCE Follow(Followee) FOREIGN KEY (ID) REFERENCE Follow(Follower) ); User IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ CREATE SCHEMA MiniTwitter; FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow

7 CREATE in SQL  Data types in SQL  Numeric  INT, FLOAT, DEC  Character, or String  CHAR, VARCHAR  Bit-string  BIT, BLOB (binary large object)  Boolean  Date, Time  DATE, TIME, TIMESTAMP CREATE TABLE MiniTwitter.Tweet (IDVARCHAR(20)NOT NULL, TimestampTIMESTAMPNOT NULL, … PRIMARY KEY (ID) ); Tweet IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure..

8 CREATE in SQL CREATE TABLE MiniTwitter.Follow (FolloweeVARCHAR(20)NOT NULL, FollowerVARCHAR(20)NOT NULL, TimestampTIMESTAMPNOT NULL, PRIMARY KEY (Followee, Follower) ); FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow

9 Data Manipulation in SQL

10 SELECT in SQL  SELECT-FROM-WHERE structure of basic SQL queries SELECT A 1, A 2,…,A n FROM R 1, R 2,…,R m WHERE Condition; Attribute to return Relationship/table Conditional expression

11 Example  Retrieve the timestamp and content of all tweets whose author is “alice00” User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

12 Example IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow SELECT Timestamp, Content FROM Tweet WHERE Author = ‘Alice00’; Select condition

13 Example  Retrieve the content of all tweets whose author is followed by “Alice00” User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

14 Example IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow SELECT content FROM Tweet, Follow WHERE Follower = ‘Alice00’ AND Author = Follower; Select condition Join condition Select-project-join query

15 Example  Retrieve the timestamp and content of all tweets whose author is followed by “Alice00” User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Qualify the attribute name to prevent ambiguity

16 Example IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow SELECT Tweet.Timestamp, Content FROM Tweet, Follow WHERE Follower = ‘Alice00’ AND Author = Follower; Select condition Join condition

17 Aggregate Functions in SQL  Aggregate function: summarize information from multiple tuples  Basic aggregate operations in SQL  COUNT, SUM, MAX, MIN, AVG  Retrieve the number of people that “Alice00” is following  Retrieve the number of people who are following “Alice00”

18 Aggregate Functions in SQL  Aggregate function: summarize information from multiple tuples  Basic aggregate operations in SQL  COUNT, SUM, MAX, MIN, AVG SELECT COUNT(*) FROM Follow WHERE Follower = ‘Alice00’;  Retrieve the number of people that “Alice00” is following SELECT COUNT(*) FROM Follow WHERE Followee = ‘Alice00’;  Retrieve the number of people who are following “Alice00”

19 Nested SQL query  Show the names of users with more than 10 followers User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

20 Nested SQL query User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 SELECT Name FROM User WHERE (SELECT COUNT (*) FROM Follow WHERE Followee = ID)>10);

21 GROUP in SQL  Show the names of all users with the number of tweets from them User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

22 GROUP in SQL User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 SELECT User.Name, COUNT(Tweet.ID) FROM User, Tweet WHERE User.ID = Tweet.Author GROUP BY Tweet.Author;

23 ORDER BY in SQL  Show the names of users who follow “Alice00” based on the time of “following” relationship User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

24 ORDER BY in SQL User Tweet Follow IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 SELECT User.Name, FROM User, Follow WHERE User.ID = Follow.Follower AND Follow.Followee = “Alice00” ORDER BY Follow.Timestamp;

25 INSERT in SQL INSERT INTO Follow VALUES (“Cathy123”, “Bob2013”, 2013.12.1.2.3.3) ; Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Cathy123Bob20132013.12.1.2.3.3 Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

26 DELETE in SQL DELETE FROM Follow WHERE Followee = ‘Alice00’ AND FOllower = ‘Bob2013’; Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow FolloweeFollowerTimestamp Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

27 UPDATE in SQL UPDATE Follow SET Timestamp = ‘2013.1.1.3.6.6’ WHERE Followee = ‘Alice00’ AND Follower = ‘Bob2013’; Follow FolloweeFollowerTimestamp Alice00Bob20132013.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7

28 More on SQL  Drop table  Outer Join  Indexes, Constraints, Views, Triggers, Transactions, Authorization  Substring Pattern Matching and Arithmetic Operators Check out: http://cse.unl.edu/~sscott/ShowFiles/SQL/CheatSheet/SQLCheatSheet.html


Download ppt "Big Data Yuan Xue CS 292 Special topics on."

Similar presentations


Ads by Google