Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Yuan Xue CS 292 Special topics on.

Similar presentations


Presentation on theme: "Big Data Yuan Xue CS 292 Special topics on."— Presentation transcript:

1 Big Data Yuan Xue (yuan.xue@vanderbilt.edu) CS 292 Special topics on

2 Part I Relational Database Yuan Xue (yuan.xue@vanderbilt.edu)

3 Discussion  Did you ever encounter a data management problem?  Experimental data from a homework?  Personal data?  Other data?  How did you manage your data?

4 Database  Database: An integrated collection of related data  Usually stored on secondary storage (as files)  Also in-memory database  Examples of databases  Vanderbilt student database, course registration and grading database (backend of YES);  Amazon’s products and customer database; Ebay’s products and transaction database;  Facebook’s user and message database;  And more… Database Data

5 Database Management System (DBMS)  DBMS: A collection of software/programs  Designed to assist in creating, and managing database  Support defining, constructing, manipulating, sharing databases  Examples of DBMSs  Relational DBMSs: Commercial: Oracle, IBM (DB2, Informix), Microsoft (SQL Server, Access); Open source: MySQL, PostgreSQL  NoSQL and newSQL: BigTable/Hbase, Cassandra, Redis, Riak, MongoDB, Dynamo, DynamoDB, Spanner  Other: object-oriented database, etc

6 Database System Environment Database DBMS Users Data Application Data Without DBMS With DBMS

7 Benefit of DBMS  Development convenience  Reduce application development time  Data independence:  Application programs not dependent on data representation and storage details  Data integrity and consistency:  Enforce consistency constraints on data  Data sharing and Concurrency control  Data is better utilized (discovered and reused), redundancy of data is minimized  Avoid undesirable race conditions that arise with simultaneous access/updates to data  Centralized control  DBA tunes the database to balance user's needs  Security  Prevent unauthorized access.  Crash recovery  Ensure the integrity of data in the presence of failures

8 Example Application – MiniTwitter  What data do we need?  What capabilities on the data do we need?

9 Example Application – MiniTwitter  What data do we need?  User profile info: ID, password, email, display name, picture, people I follow, people who follow me.  Tweets: author, time, content (topic), replies (author, time, content), favorite (author, time),  What capabilities on the data do we need?  Register a new user  Follow/unfollow a user (approve following request)  post/delete a tweet  Read/update in real-time all the tweets from the people I follow  Show the number of tweets I posted, #people following me, #people I follow  Trend information Information required to record System State Operations that update and retrieve System State

10 Three-Level Architecture  Key question: how to describe data? Conceptual Data Model Logic Data Model Physical Data Model Entities, attributes, relationships (entity-relationship model) Coming next Storage, data structure

11 Database Model  Logic Data Model: logical structure of data organization  Types of data model  Relational model:  table  Semistructured data model (XML/JSON)  tree  Various data models in NoSQL systems  key-value pair  column-family  graph  Object-oriented model  object, class, inheritance  a layer over relational model

12 Schema = structural description of relations in database Instance = actual contents at given point in time Schema – structural description of relations in database Instance – data in the database at a given point in time Relational Data Model IDNameEmailPassword Alice00Alicealice00@gmail.comAadf1234 Bob2013Bobbob13@gmail.comqwer6789

13 Schema = structural description of relations in database Instance = actual contents at given point in time Database = set of named relations (or tables) Each relation has a set of named attributes (or columns) Each tuple (or row) has a value for each attribute Each attribute has a type (or domain) Relational Data Model IDNameEmailPassword Alice00Alicealice00@gmail.comAadf1234 Bob2013Bobbob13@gmail.comqwer6789

14 Discussion  How to design relations (tables) for MiniTwitter  What are the aspects we need to consider?

15 Design – Version 0.1 IDNameEmailPassword Alice00Alicealice00@gmail.com Aadf1234 Bob2013Bobbob13@gmail. com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDTimestampAuthorContent 00012013.12.20.11.20.2 Alice00Hello 00022013.12.20.11.23.6 Bob2013Nice weather 00032014.1.6.1.25. 2 Alice00@Bob Not sure.. User Tweet FolloweeFollowerTimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow Pretending to be md5 hashcode ;)

16 Key – attribute whose value is unique in each tuple Or set of attributes whose combined values are unique Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow

17 Key – attribute whose value is unique in each tuple Or set of attributes whose combined values are unique Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow

18 Foreign Key – attribute or set of attributes in one table that point to the primary key of another Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow

19 Foreign Key – attribute or set of attributes in one table that point to the primary key of another Relational Data Model IDNameEmailPassword Alice00Alicealice00@gm ail.com Aadf1234 Bob2013Bobbob13@gmai l.com qwer6789 Cathy123Cathycath@vandyTyuoa~!@ IDtimestampAuthorContent 00012013.12.20.1 1.20.2 Alice00Hello 00022013.12.20.1 1.23.6 Bob2013Nice weather 00032014.1.6.1.2 5.2 Alice00@Bob Not sure.. User Tweet IDFollowertimestamp Alice00Bob20132011.1.1.3.6.6 Bob2013Cathy1232012.10.2.6.7.7 Alice00Cathy1232012.11.1.2.3.3 Cathy123Alice002012.11.1.2.6.6 Bob2013Alice002012.11.1.2.6.7 Follow

20 More on Relational Data Model  NULL – special value for “unknown” or “undefined”  Relational Model Constraint Summary  Domain constraints  Key constraints  Integrity contraints

21 Relational Data Model and Database  Relation Model  Simple representation  Efficient implementation  Driven by relational algebra and relational calculus  Up-front definition of schemas and types that the data will thereafter adhere to  High-level simple yet expressive query language  Relational databases  Proven success for both open source and proprietary systems  Provide full ACID guarantees.  SQL as widely used and standard way of database interaction

22 Creating and Using a Relational Database  Steps in creating and using a (relational) database 1. Design schema (using DDL – data definition language) 2. Initialization: “Bulk load” initial data 3. Operation: execute queries and modifications Data Meta-data: database definition


Download ppt "Big Data Yuan Xue CS 292 Special topics on."

Similar presentations


Ads by Google