Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAS CS 460 Relational Model Based on Slides created by Prof. Mitch Cherniack, Brandeis University and from Prof. Joe.

Similar presentations


Presentation on theme: "CAS CS 460 Relational Model Based on Slides created by Prof. Mitch Cherniack, Brandeis University and from Prof. Joe."— Presentation transcript:

1 CAS CS 460 Relational Model Based on Slides created by Prof. Mitch Cherniack, Brandeis University http://www.cs.brandeis.edu/~cs127b/ and from Prof. Joe Hellerstein, UC Berkeley http://db.cs.berkeley.edu/dbcoursehttp://db.cs.berkeley.edu/dbcourse and material from db-book.com

2 1.2 Review E/R Model: Entities, relationships, attributes Cardinalities: 1:1, 1:n, m:1, m:n Keys: superkeys, candidate keys, primary keys lot name Employees ssn Works_In since dname budget did Departments

3 1.3 Review n Weak Entity sets, identifying relationship n Discriminator, total participation, one-to-many Loan lno lamt Loan_Pmt Payment pno pdate pamt

4 1.4 Review Generalization-specialization Aggregation E1 a1 a2 S2 c1 b1 Isa S1 superclass subclasses E1 E2 R1 R2 E3

5 1.5 Review Data models: framework for organizing and interpreting data E/R Model OO, Object relational, XML Relational Model  Intro  E/R to relational  SQL preview

6 1.6 Relational Data Model Introduced by Ted Codd (early 70’) (Turing Award, ‘81) Before = “Network Data Model” (COBOL…) Relational data model contributes: 1.Separation of logical and physical data models (data independence) 2.Declarative query languages 3.Formal semantics 4.Query optimization (key to commercial success) First prototypes:  Ingres -> postgres, informix (Stonebraker, UC Berkeley)  System R -> Oracle, DB2 (IBM)

7 1.7 Relations account = Rows (tuples, records) Columns (attributes) Tables (relations) Why relations?

8 1.8 Relations Mathematical relations (from set theory): Given 2 sets R={ 1, 2, 3, 5}, S={3, 4}  R x S = {(1,3), (1, 4), (2, 3), (2,4), (3,3), (3,4), (5,3), (5,4)}  A relation between R and S is any subset of R x S e.g., {(1,3), (2,4), 5,3)} Database relations: Given attribute domains: bname = {Downtown, Brighton, ….} acct_no = { A-101, A-102, A-203, …} balance = { …, 400, 500, …} account subset of bname x acct_no x balance { (Downtown, A-101, 500), (Brighton, A-202, 450), (Brookline, A-312, 600)}

9 1.9 Storing Data in a Table Data about individual students One row per student How to represent course enrollment?

10 1.10 Storing More Data in Tables Students may enroll in more than one course Most efficient: keep enrollment in separate table Enrolled

11 1.11 Linking Data from Multiple Tables How to connect student data to enrollment? Need a Key Enrolled Students

12 1.12 Relational Data Model: Formal Definitions Relational database: a set of relations. Relation: made up of 2 parts:  Instance : a table, with rows and columns.  #rows = cardinality  Schema : specifies name of relation, plus name and type of each column.  E.g. Students(sid: string, name: string, login: string, age: integer, gpa: real)  #fields = degree / arity Can think of a relation as a set of rows or tuples.  i.e., all rows are distinct

13 1.13 In other words... Data Model – a way to organize information Schema – one particular organization,  i.e., a set of fields/columns, each of a given type Relation  a name  a Schema  a set of tuples/rows, each following organization specified in Schema

14 1.14 Example Instance of Students Relation Cardinality = 3, arity = 5, all rows distinct

15 1.15 ER model to Relation Schemas Primary keys allow entity sets and relationship sets to be expressed uniformly as relation schemas that represent the contents of the database. A database which conforms to an E-R diagram can be represented by a collection of relation schemas. For each entity set and relationship set there is a unique schema that is assigned the name of the corresponding entity set or relationship set. Each schema has a number of columns (generally corresponding to attributes), which have unique names.

16 1.16 Representing Entity Sets as Schemas A strong entity set reduces to a schema with the same attributes. A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set payment = ( loan_number, payment_number, payment_date, payment_amount )

17 1.17 Representing Relationship Sets as Schemas A many-to-many relationship set is represented as a schema with attributes for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set. Example: schema for relationship set borrower borrower = (customer_id, loan_number )

18 1.18 E/R to Relations E/R diagram Relational schema, e.g. account=(bname, acct_no, bal) E a 1 ….. a n E = ( a 1, …, a n ) E1 E2 R1 a 1 …. a n c 1 …. c k b 1 …. b m R1= ( a 1, b 1, c 1, …, c k )

19 1.19 Redundancy of Schemas Many-to-one and one-to-many relationship sets that are total on the many-side can be represented by adding an extra attribute to the “many” side, containing the primary key of the “one” side Example: Instead of creating a schema for relationship set account_branch, add an attribute branch_name to the schema arising from entity set account

20 1.20 Many-to-1 Relationship Example Could have : But a 1 is the key for R1 (also for E1=(a 1, …., a n )) Usual strategy:  ignore R1  Add b1, c1, …., ck to E1 instead, i.e.  E1=(a 1, …., a n, b 1, c 1, …, c k ) E1 E2 R1 a 1 …. a n c 1 …. c k b 1 …. b m R1= ( a 1, b 1, c 1, …, c k )

21 1.21 Redundancy of Schemas (Cont.) For one-to-one relationship sets, either side can be chosen to act as the “many” side  That is, extra attribute can be added to either of the tables corresponding to the two entity sets If participation is partial on the “many” side, replacing a schema by an extra attribute in the schema corresponding to the “many” side could result in null values The schema corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant.  Example: The payment schema already contains the attributes that would appear in the loan_payment schema (i.e., loan_number and payment_number).

22 1.22 E1 E2 R1 a 1 …. a n c 1 …. c k b 1 …. b m ? ? R1 E1 = ( a 1, …, a n ) E2 = ( b 1, …, b m ) R1 = ( a 1, b 1, c 1 …, c k ) E1 = ( a 1, …, a n, b 1, c 1, …, c k ) E2 = ( b 1, …, b m ) E1 = ( a 1, …, a n ) E2 = ( b 1, …, b m, a 1, c 1, …, c k ) Treat as n:1 or 1:m

23 1.23 Composite and Multivalued Attributes Composite attributes are flattened out by creating a separate attribute for each component attribute  Example: given entity set customer with composite attribute name with component attributes first_name and last_name the schema corresponding to the entity set has two attributes name.first_name and name.last_name A multivalued attribute M of an entity E is represented by a separate schema EM  Schema EM has attributes corresponding to the primary key of E and an attribute corresponding to multivalued attribute M  Example: Multivalued attribute dependent_names of employee is represented by a schema: employee_dependent_names = ( employee_id, dname)  Each value of the multivalued attribute maps to a separate tuple of the relation on schema EM  For example, an employee entity with primary key 123-45-6789 and dependents Jack and Jane maps to two tuples: (123-45-6789, Jack) and (123-45-6789, Jane)

24 1.24 E/R to Relational Weak entity sets E1 E2 IR a 1 …. a n b 1 …. b m E1 = ( a 1, …, a n ) E2 = ( a 1, b 1, …, b m ) Emp ssn Multivalued Attributes name dept Emp = (ssn, name) Emp-Dept = (ssn, dept)

25 1.25 Representing Specialization via Schemas Method 1:  Form a schema for the higher-level entity  Form a schema for each lower-level entity set, include primary key of higher-level entity set and local attributes schema attributes person name, street, city customer name, credit_rating employee name, salary  Drawback: getting information about, an employee requires accessing two relations, the one corresponding to the low-level schema and the one corresponding to the high-level schema

26 1.26 E/R to Relational E1 S2 Isa S1 a1 … an c 1 …. c k b 1 …. b m Method 1: E = ( a 1, …, a n ) S1 = (a 1, b 1, …, b m ) S2 = ( a 1, c 1 …, c k ) Method 2: S1 = (a 1,…, a n, b 1, …, b m ) S2 = ( a 1, …, a n, c 1 …, c k ) Q: When is method 2 not possible?

27 1.27 Representing Specialization as Schemas (Cont.) Method 2:  Form a schema for each entity set with all local and inherited attributes schema attributes personname, street, city customername, street, city, credit_rating employee name, street, city, salary  If specialization is total, the schema for the generalized entity set (person) not required to store information  Can be defined as a “view” relation containing union of specialization relations  But explicit schema may still be needed for foreign key constraints  Drawback: street and city may be stored redundantly for people who are both customers and employees

28 1.28 Schemas Corresponding to Aggregation To represent aggregation, create a schema containing primary key of the aggregated relationship, the primary key of the associated entity set any descriptive attributes

29 1.29 E/R to Relational Aggregation E1 E2 R1 R2 E3 a 1 …. a n b 1 …. b m c 1 …. c k d1…djd1…dj E1, R1, E2, E3 as before R2 = (c 1, a 1, b 1, d 1, …, d j )

30 1.30 Storing Relations in a Database SQL: standard language interface to a Database Data Definition Language (DDL)  create, modify, delete relations  specify constraints  administer users, security, etc. Data Manipulation Language (DML)  Specify queries to find tuples that satisfy criteria  add, modify, remove tuples

31 1.31 SQL Overview n CREATE TABLE (, … ) n INSERT INTO ( ) VALUES ( ) n DELETE FROM WHERE n UPDATE SET = WHERE n SELECT FROM WHERE

32 1.32 Creating Relations in SQL Creates the Students relation. Note: the type (domain) of each field is specified, and enforced by the DBMS  whenever tuples are added or modified. Another example: the Enrolled table holds information about courses students take. CREATE TABLE Students (sid CHAR(20), name CHAR(20), login CHAR(10), age INTEGER, gpa FLOAT) CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2))

33 1.33 Adding and Deleting Tuples Can insert a single tuple using: INSERT INTO Students (sid, name, login, age, gpa) VALUES (‘53688’, ‘Smith’, ‘smith@ee’, 18, 3.2) Can delete all tuples satisfying some condition (e.g., name = Smith): DELETE FROM Students S WHERE S.name = ‘Smith’  Powerful variants of these commands are available; more later!

34 1.34 Updating Tuples Can update all tuples, or just those matching some criterion UPDATE Students SET sid = sid + 100000 UPDATE Enrolled SET grade = ‘E’ WHERE grade = ‘F’

35 1.35 Keys Keys are a way to associate tuples in different relations Keys are one form of integrity constraint (IC) Students Enrolled

36 1.36 Primary Keys - Definitions A set of fields is a superkey if:  No two distinct tuples can have same values in all key fields A set of fields is a candidate key for a relation if :  It is a superkey  No subset of the fields is a superkey >1 candidate keys for a relation?  one of the keys is chosen (by DBA) to be the primary key. E.g.  sid is a key for Students.  What about name?  The set {sid, gpa} is a superkey.

37 1.37 Primary and Candidate Keys in SQL Possibly many candidate keys (specified using UNIQUE ), one of which is chosen as the primary key. CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid)) “For a given student and course, there is a single grade.” vs. “Students can take only one course, and receive a single grade for that course; further, no two students in a course receive the same grade.” Used carelessly, an IC can prevent storage of database instances that should be permitted! CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid), UNIQUE (cid, grade))

38 1.38 Foreign Keys A Foreign Key is a field whose values are keys in another relation. Students Enrolled

39 1.39 Foreign Keys, Referential Integrity Foreign key : Set of fields in one relation used to `refer’ to tuples in another relation.  Must correspond to primary key of the second relation.  Like a `logical pointer’. E.g. sid in Enrolled is a foreign key referring to Students:  Enrolled(sid: string, cid: string, grade: string)  If all foreign key constraints are enforced, referential integrity is achieved (i.e., no dangling references.)

40 1.40 Foreign Keys in SQL Only students listed in the Students relation should be allowed to enroll for courses. CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students ) Enrolled Students

41 1.41 Integrity Constraints (ICs) IC: condition that must be true for any instance of the database;  e.g., domain constraints.  ICs are specified when schema is defined.  ICs are checked when relations are modified. A legal instance of a relation is one that satisfies all specified ICs.  DBMS should not allow illegal instances. If the DBMS checks ICs, stored data is more faithful to real-world meaning.  Avoids data entry errors, too!

42 1.42 Relational Query Languages A major strength of the relational model: supports simple, powerful querying of data. Declarative, not Procedural  Specify what you want, not how to get it Queries can be written intuitively, and the DBMS is responsible for efficient evaluation.  The key: precise semantics for relational queries.  Allows the optimizer to extensively re-order operations, and still ensure that the answer does not change.

43 1.43 The SQL Query Language The most widely used relational query language.  Current std is SQL99; SQL92 is a basic subset To find all 18 year old students, we can write: SELECT * FROM Students S WHERE S.age=18 To find just names and logins, replace the first line: SELECT S.name, S.login FROM Students S WHERE S.age = 18

44 1.44 Querying Multiple Relations Querying Multiple Relations What does the following query compute? SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade='A' Given the following instances: we get: 53650

45 1.45 Semantics of a Query A conceptual evaluation method for the query: 1. FROM clause: compute cross-product of Students, Enrolled 2. WHERE clause: Check conditions, discard tuples that fail 3. SELECT clause: Delete unwanted fields Remember, this is conceptual.  Actual evaluation will be much more efficient,  (but must produce the same answers) SELECT S.name, E.cid FROM Students S, Enrolled E WHERE S.sid=E.sid AND E.grade='A'

46 1.46 Cross-product of Students and Enrolled Instances 53650

47 1.47 Relational Model: Summary A tabular representation of data. Simple and intuitive, currently the most widely used  Object-relational variant gaining ground  XML support being added Integrity constraints can be specified by the DBA, based on application semantics. DBMS checks for violations.  Two important ICs: primary and foreign keys  In addition, we always have domain constraints. Powerful and natural query languages exist.  Next class: Theoretical basis for Query Languages


Download ppt "CAS CS 460 Relational Model Based on Slides created by Prof. Mitch Cherniack, Brandeis University and from Prof. Joe."

Similar presentations


Ads by Google