Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFM 603: Information Technology and Organizational Context Jimmy Lin The iSchool University of Maryland Wednesday, March 5, 2014 Session 6: Relational.

Similar presentations


Presentation on theme: "INFM 603: Information Technology and Organizational Context Jimmy Lin The iSchool University of Maryland Wednesday, March 5, 2014 Session 6: Relational."— Presentation transcript:

1 INFM 603: Information Technology and Organizational Context Jimmy Lin The iSchool University of Maryland Wednesday, March 5, 2014 Session 6: Relational Databases

2 Databases Yesterday…

3 Databases Today…

4 What’s structured information? It’s what you put in a database What’s a database? It’s what you store structured information in

5 So what’s a database? An integrated collection of data organized according to some model…

6 So what’s a relational database? An integrated collection of data organized according to a relational model

7 Database Management System (DBMS) Software system designed to store, manage, and facilitate access to databases

8 Databases (try to) model reality… Entities: things in the world Example: airlines, tickets, passengers Relationships: how different things are related Example: the tickets each passenger bought “Business Logic”: rules about the world Example: fare rules

9 Source: Microsoft Office Clip Art Relational Databases

10 Components of a Relational Database Field: an “atomic” unit of data Record: a collection of related fields Sometimes called a “tuple” Table: a collection of related records Each record is a row in the table Each field is a column in the table Database: a collection of tables

11 A Simple Example NameDOBSSN John Doe04/15/1970153-78-9082 Jane Smith08/31/1985768-91-2376 Mary Adams11/05/1972891-13-3057 Field Field Name Record Primary Key Table

12 Why “Relational”? View of the world in terms of entities and relations: Tables represent “relations” Each row (record, tuple) is “about” an entity Fields can be interpreted as “attributes” or “properties” of the entity Data is manipulated by “relational algebra”: Defines things you can do with tuples Expressed in SQL

13 The Registrar Example What do we need to know? Something about the students (e.g., first name, last name, email, department) Something about the courses (e.g., course ID, description, enrolled students, grades) Which students are in which courses How do we capture these things?

14 A First Try Put everything in a big table… Why is this a bad idea?

15 Goals of “Normalization” Save space Save each fact only once More rapid updates Every fact only needs to be updated once More rapid search Finding something once is good enough Avoid inconsistency Changing data once changes it everywhere

16 Another Try... Student Table Department TableCourse Table Enrollment Table

17 Keys “Primary Key” uniquely identifies a record e.g., student ID in the student table “Foreign Key” is primary key in the other table It need not be unique in this table

18 Approaches to Normalization For simple problems: Start with the entities you’re trying to model Group together fields that “belong together” Add keys where necessary to connect entities in different tables For more complicated problems: Entity-relationship modeling

19 The Data Model Student Table Department TableCourse Table Enrollment Table

20 Registrar ER Diagram Enrollment Student Course Grade … Student Student ID First name Last name Department E-mail … Course Course ID Course Name … Department Department ID Department Name … has associated with

21 A Real Example

22 Types of Relationships One-to-OneOne-to-Many Many-to-Many

23 Database Integrity Registrar database must be internally consistent All enrolled students must have an entry in the student table All courses must have a name … What happens: When a student withdraws from the university? When a course is taken off the books?

24 Integrity Constraints Conditions that must be true of the database at any time Specified when the database is designed Checked when the database is modified RDBMS ensures that integrity constraints are always kept So that database contents remain faithful to the real world Helps avoid data entry errors Where do integrity constraints come from?

25 SQL (Don’t Panic!)

26 Select select Student ID, Department

27 Where where Department ID = “HIST”

28 Simple SQL Statements Choosing columns: SELECT Based on their labels (field names) * is a shorthand for saying “all fields” Choosing rows: WHERE Based on their contents These can be specified together department ID = “HIST”

29 Simple SQL Template select [columns in the table] from [table name] where [selection criteria]

30 SQL Tips and Tricks Referring to fields (in SELECT statements) Use TableName.FieldName Can drop TableName if FieldName is unambiguous Selection criteria Use = instead of == Note, different dialects of SQL!

31 Join “Joined” Table Student Table Department Table

32 SQL Template for Joins select [columns in the table] from [table name] join [another tablename] on [join criterion] … where [selection criteria] Join criterion: usually, based on primary/foreign key relationships e.g., Table1.PrimaryKey = Table2.ForeignKey

33 Aggregations SQL aggregation functions Examples: count, min, max, sum, avg Use in select statements Tip: when trying to write SQL query with aggregation, do it first without Group by [field] Often used in conjunction with aggregation Conceptually, breaks table apart based on the [field] select count(*)… select min(price)… select sum(length)…

34 How do you want your results served? Order by [field name] Does exactly what you think it does! Either “asc” or “desc” Limit n Returns only n records Useful to retrieving the top n or bottom n

35 So how’s a database more than a spreadsheet?

36 Database in the “Real World” Typical database applications: Banking (e.g., saving/checking accounts) Trading (e.g., stocks) Traveling (e.g., airline reservations) Social media (e.g., Facebook) … Characteristics: Lots of data Lots of concurrent operations Must be fast “Mission critical” (well… sometimes)

37 Operational Requirements Must hold a lot of data Must be reliable Must be fast Must support concurrent operations

38 Must hold a lot of data Solution: Use lots of machines (Each machine holds a small slice) So which machine has your copy?

39 Must be reliable Solution: Use lots of machines (Store multiple copies) How do you keep the copies in sync? But which copy is the right one?

40 Must be fast Solution: Use lots of machines (Share the load) How do you spread the load?

41 Must support concurrent operations Solution: this is hard! (But fortunately doesn’t matter for many applications)

42 Database Transactions Transaction = sequence of database actions grouped together e.g., transfer $500 from checking to savings ACID properties: Atomicity: all-or-nothing Consistency: each transaction yield a consistent state Isolation: concurrent transactions must appear to run in isolation Durability: results of transactions must survive even if systems crash

43 Making Transactions Idea: keep a log (history) of all actions carried out while executing transactions Before a change is made to the database, the corresponding log entry is forced to a safe location Recovering from a crash: Effects of partially executed transactions are undone Effects of committed transactions are redone Trickier than it sounds! the log

44 Source: Technology Review (July/August, 2008) Database layer: 800 eight-core Linux servers running MySQL (40 TB user data) Caching servers: 15 million requests per second, 95% handled by memcache (15 TB of RAM)

45 Now you know… Wait, but these are websites?


Download ppt "INFM 603: Information Technology and Organizational Context Jimmy Lin The iSchool University of Maryland Wednesday, March 5, 2014 Session 6: Relational."

Similar presentations


Ads by Google