CS 440 Database Management Systems Lecture 4: Constraints, Schema Design.

Slides:



Advertisements
Similar presentations
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Advertisements

Dr. A.I. Cristea CS 319: Theory of Databases: FDs.
Higher Normal Forms By John Nicosia CS 157a Fall 2007.
primary key constraint foreign key constraint
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
CS 440 Database Management Systems Practice problems for normalization.
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.
Manipulating Functional Dependencies Zaki Malik September 30, 2008.
Chapter 11 Functional Dependencies. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.11-2 Topics in this Chapter Basic Definitions Trivial.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Functional Dependencies (based on notes by Silberchatz,Korth, and.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Integrity Constraints.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
Integrity Constraints
Multivalued Dependency Prepared by Tomasz Kaciak CS157A.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CMSC424: Database Design Instructor: Amol Deshpande
DATABASE DESIGN Functional Dependencies. Overview n Functional Dependencies n Normalization –Functional dependencies –Normal forms.
Functional Dependency Rajhdeep Jandir. Definition A functional dependency is defined as a constraint between two sets of attributes in a relation from.
Functional Dependencies (Part 3) Presented by Nash Raghavan All page numbers are in reference to Database System Concepts (5 th Edition)
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Lecture 1 of Advanced Databases Basic Concepts Instructor: Mr.Ahmed Al Astal.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 564 Database Management Systems: Design and Implementation Discussion Session Friday, Sept 18, Apul Jain.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Lecture 6: Schema refinement: Functional dependencies
Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
1 Dept. of CIS, Temple Univ. CIS616/661 – Principles of Data Management V. Megalooikonomou Integrity Constraints (based on slides by C. Faloutsos at CMU)
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement SHIRAJ MOHAMED M | MIS 1. Learning Objectives  Identify update, insertion and deletion anomalies  Identify possible keys given an.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Deanship of Distance Learning Avicenna Center for E-Learning 1 Session - 7 Sequence - 2 Normalization Functional Dependencies Presented by: Dr. Samir Tartir.
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
CS 405G: Introduction to Database Systems
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
MIS 3053 Database Design And Applications The University Of Tulsa Professor: Akhilesh Bajaj Normal Forms Lecture 1 © Akhilesh Bajaj, 2000, 2002, 2003.
CS 222 Database Management System Spring Lecture 4 Database Design Theory Korra Sathya Babu Department of Computer Science NIT Rourkela.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
Lecture 03 Constraints. Example Schema CONSTRAINTS.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Databases 1 Sixth lecture. 2 Functional Dependencies X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes.
Relational Database Design Algorithms and Further Dependencies.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Functional Dependencies and Normalization for Relational Databases 1 Chapter 15 تنبيه : شرائح العرض (Slides) هي وسيلة لتوضيح الدرس واداة من الادوات في.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Functional Dependency
Temple University – CIS Dept. CIS331– Principles of Database Systems
Functional Dependencies
Chapter 19 (part 1) Functional Dependencies
Presentation transcript:

CS 440 Database Management Systems Lecture 4: Constraints, Schema Design

Integrity constraints We normally like to restrict the set of database instances of a schema. – ssn contains only numerical symbols. – name does not contain numerical symbols. – Tuples with equal ssn values have equal names. We use integrity constraints to express these restrictions over schema. Emp

Functional dependency (FD) & key Given attributes A 1,…, A n, B 1, …, B k in relation R, the functional dependency A 1,…, A n  B 1, …, B k means that all tuples in R that agree on attributes A 1,…, A n must also agree on B 1, …, B k. A key in R is a set of attributes of R that functionally determines all attributes in R and none of its subsets is a key. –ssn? {ssn, address}? {ssn, name, address} ? Super-key is a set of attributes that contains a key. 3 ssn  name ? Yes ssn  address ? No Emp

How to find FDs and keys? Some come from domain and domain experts. Some are logically implied by other FDs. Example: movies(title, year, actor, cost, revenue, b-buster) given FD: title, year, actor  cost given FD: title, year, actor  revenue Implied FD: title, year, actor  cost, revenue Closure of a set of FDs: all FDs implied by a set of FDs. We can generate them using Armstrong’s axioms. 4

Armstrong’s axioms Reflexivity: A 1,…, A n  A 1 generally, A 1,…, A n  A i,…, A j ; 1<= i, j <= n (trivial FD) Augmentation: If A 1,…, A n  B 1,…, B m then A 1,…, A n, C 1,…, C k  B 1,…, B m, C 1,…, C k Transitivity: If A 1,…, A n  B 1,…, B m and B 1,…, B m  C 1,…,C k then A 1,…, A n  C 1,…, C k 5

Computing the closure of a set of FDs (U) U+ = U. Repeat – Apply reflexivity and augmentation to each FD in U+ and add the resulting FDs to U+. – Apply transitivity to each pairs of FDs in U+ and add the resulting FDs to U+. Until U+ does not change anymore. 6

Useful rules They are derived from axioms and we may use them to derive implied FDs faster. Decomposition If A 1,…, A n  B 1,…, B m then A 1,…, A n  B 1, A 1,…, A n  B 2, …, and A 1,…, A n  B m. Union If A 1,…, A n  B 1, …, and A 1,…, A n  B m then A 1,…, A n  B 1,…, B m. 7

Computing the closure of a set of FDs movies(title, year, actor, cost, revenue, b-buster) FD 1 title, year, actor  cost FD 2 title, year, actor  revenue FD 3 cost, revenue  b-buster Apply union on FD 1 and FD 2 : FD 4 : title, year, actor  cost,revenue Apply transitivity on FD 4 and FD 3 : FD 5 : title, year, actor  b-buster Apply union on FD 4 and FD 5 : FD 5 : title, year, actor  cost,revenue,b-buster

Enforcing FDs Emp contains duplicate values for ssn and name. Update ‘John’ to ‘Richard’ in the first tuple. – violates ssn  name. – update anomaly. We can write a program that checks the database after each update for violations of all FDs. – since there are many FDs, it is inefficient. 9 Emp

More problems with the schema If we delete John’s addresses, we lose his ssn and name. – deletion anomaly. How to insert a new tuple with new ssn and name but no address information? – insertion anomaly. One may use NULL values to solve these problems. – They do not necessarily indicate lack of knowledge. – It is hard to write correct SQL queries over the database. 10 Emp

Normalization We can transform our schema to another schema without update, deletion, and insertion anomalies.  update anomaly? deletion anomaly? insertion anomaly? 11 Emp Emp-nameEmp-addr

Normalization Given schema S 1, find schema S 2 that does not have update/deletion/insertion anomalies.  S 1 = ({ Emp }, { ssn  name }) S 2 = ({ Emp-name, Emp-addr }, { ssn  name }) Schema S 2 is in a normal form. 12 Emp-nameEmp-addr Emp

Normal Forms ✔ ✔ 13

Boyce-Codd normal form (BCNF) Relation R is in BCNF, if and only if: – For each non-trivial FD A  B, A is a super-key of R. Every attribute depends only on super-keys. Example: ssn  name – ssn is not a super-key. – Employee is not in BCNF. Decompose Emp. – super-key of Emp-name? – super-key of Emp-addr? – Schema is in BCNF. 14 Emp Emp-name Emp-addr

BCNF decomposition of R Find the closure of functional dependencies in R. Find the keys for R. Pick an FD A  B that violates the BCNF condition in R. – select the largest possible B. Decompose relation R to relational R1 and R2. Repeat until there is no BCNF violation left. 15 Other attributes A A B R1 R2 Other attributes A B R

BCNF decomposition example Emp2(ssn, name, street, city, state, zip) FD 1 ssn  name FD 2 zip  state Key: {ssn, street, city, zip} FD 1 violates BCNF. Emp2(ssn, name) Emp-addr(ssn, street, city, state, zip) FD 2 violates BCNF. Emp2(ssn, name) Emp-addr(ssn, street, city, zip) Location(zip, state) 16

The danger of normalization Losing information. We like to preserve the information stored in R. – We must recover the tuples in R from R’s BCNF decomposition: lossless decomposition – We must recover all FDs in R from its BCNF decomposition: dependency preserving We should check these condition after normalizing a relation. 17

Lossless decomposition If (R1, R2) is a decomposition of R, then Join (R1, R2) = R for all instances of R, R1, R2. Example: R( A, B, C)  R1 (A, B), R2 (A, C)   18 ABC a1b1c1 a1b2c2 ABC a1b1c1 a1b1c2 a1b2c1 a1b2c2 we get bogus tuples, not a lossless decomposition. AB a1b1 a1b2 AC a1c1 a1c2 AC a1c1 a1c2 AB a1b1 a1b2

BCNF decomposition is lossless Example: {R( A, B, C), A  B} {R1 (A, B), R2 (A, C), A  B}  19 ABC a1b1c1 a1b1c2 The join does not produce bogus tuples. Informally speaking, if the join attribute is a key, we are OK. It is proved to be true for all BCNF decompositions. AB a1b1 AC a1c1 a1c2 AC a1c1 a1c2 AB a1b1 ABC a1b1c1 a1b1c2

BCNF is not dependency preserving Emp(ssn, name, address), ssn  name,address, name  ssn BCNF decomposition: Emp-name(ssn, name) ssn  name Emp-name(ssn, address) What will happen?  20 No FD! not dependency preserving Emp Emp-name Emp-addr satisfies the dependenciesdoes not satisfies the dependencies Given FD A  B, If normalization puts A in one relation and B in another one, it is not dependency preserving.

A dependency preserving normal form: 3NF Relation R is in 3 rd normal form if for each non-trivial FD A  B in R – A is a super-key or – B is a part of a key. Every non-key attribute must depend on a key, the whole key, and nothing but the key. Example: Emp(ssn, name, address) ssn  name, address,name  ssn – 3NF but not BCNF. – 3NF is a lossless decomposition and preserves dependencies. 21

Minimal basis (minimal cover) The FD sets U1 and U2 are equivalent if and only if U1+ = U2+. U2 is a minimal basis for U1 iff: – U2 and U1 are equivalent. – The FDs in U2 have only one attribute in their right hand side. – If we remove an FD from U2 or an attribute from an FD in U2, U2 will not be equivalent to U1. Intuition: a smaller set of FDs with fewer attributes that implies U+. 22

Finding minimal basis of U 1.Standard form: replace each FD A 1,…,A n  B 1,..., B m by A 1,…,A n  B 1,..., A 1,…,A n  B m 2.Minimize left hand side: for each attribute A j and each FD A 1,…, A j,…, A n  B i, check if you can remove A j from the FD while preserving U+. 1.Delete redundant FD: check each remaining FD to see if you can remove it while preserving U+. The minimal basis for U is not necessarily unique. 23

Finding minimal basis of U Example: U = {B  A, D  A, AB  D} 1.It is already in standard form. 2.We can remove A from {AB  D} to get U = {B  A, D  A, B  D} 3.We can remove B  A as it is implied by transitivity from others. 24

3NF synthesizing algorithm Input: relation R and set of FDs U. Output: Normalized schema S 1.Find a minimal basis M for U. 2.For each FD A  B in M, if AB is not covered by any relation in S, add Ri = (A,B) to S. 3. If none of the relations in S contain a super-key for R, add a relation to S whose attributes form a key for R. Why step 3? consider relation R(A,B,C) with U={A  B, C  B}. Minimal basis of U is U : 3NF = R 1 (A,B) R 2 (C,B) lossless join property? 3NF = R 1 (A,B) R 2 (C,B) R 3 (A,C) 25

3NF versus BCNF BCNF eliminates more redundancies than 3NF. Normalization must be dependency preserving, unless there is a strong reason. Try BCNF, but if it is not dependency preserving use 3NF. 26

De-normalization Normalization improves data quality, but it has some drawbacks. – performance: normalized schemas require more joins. – readability: normalized schemas are hard to understand. they contains larger numbers of relations. they place related attributes in different relations. DB designers find the trade-off. – No write in OnLine Analytical Processing (OLAP)  argues against normalization. – write in OnLine Transaction Processing (OLTP)  argues for normalization. 27

What you should know Integrity constraints: functional dependencies, keys. Schema refinements: BCNF, 3NF. Lossless decomposition and dependency preservation. De-normalization.

What is next? Views Web data and PageRank.