Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,

Slides:



Advertisements
Similar presentations
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #17: Schema Refinement & Normalization - Normal Forms (R&G,
Advertisements

Schema Refinement: Normal Forms
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Dr. Kalpakis CMSC 461, Database Management Systems URL: Relational Database Design.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Functional Dependencies (based on notes by Silberchatz,Korth, and.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Integrity Constraints.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
©Silberschatz, Korth and Sudarshan Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition.
7.1 Chapter 7: Relational Database Design. 7.2 Chapter 7: Relational Database Design Features of Good Relational Design Atomic Domains and First Normal.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Recap of Feb 13: SQL, Relational Calculi, Functional Dependencies SQL: multiple group bys, having, lots of examples Tuple Calculus Domain Calculus Functional.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Normalization DB Tuning CS186 Final Review Session.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Relational Database Design - part 2 - Database Management Systems I Alex Coman,
CMSC424: Database Design Instructor: Amol Deshpande
Decomposition By Yuhung Chen CS157A Section 2 October
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Boyce-Codd Normal Form By: Thanh Truong. Boyce-Codd Normal Form Eliminates all redundancy that can be discovered by functional dependencies But, we can.
Relational Database Design
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design Pitfalls in.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Database Management COP4540, SCS, FIU Relation Normalization (Chapter 14)
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Computing & Information Sciences Kansas State University Wednesday, 04 Oct 2006CIS 560: Database System Concepts Lecture 17 of 42 Wednesday, 04 October.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
1 Dept. of CIS, Temple Univ. CIS616/661 – Principles of Data Management V. Megalooikonomou Integrity Constraints (based on slides by C. Faloutsos at CMU)
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Database design and normalization.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Dept. of CIS, Temple Univ. CIS661 – Principles of Data Management V. Megalooikonomou Database design and normalization (based on slides by C. Faloutsos.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Module 5: Overview of Database Design -- Normalization
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Relational Database Design by Dr. S. Sridhar, Ph. D
CS 480: Database Systems Lecture 22 March 6, 2013.
Faloutsos & Pavlo SCS /615 Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization.
Module 5: Overview of Normalization
Lecture #17: Schema Refinement & Normalization - Normal Forms
Temple University – CIS Dept. CIS331– Principles of Database Systems
Designing Relational Databases
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)

Overview Relational model formal query languages commercial query languages (SQL) Integrity constraints domain I.C., foreign keys functional dependencies Functional Dependencies DB design and normalization

Overview - detailed DB design and normalization pitfalls of bad design decomposition normal forms

Design ‘good’ tables sub-goal#1: define what ‘good’ means sub-goal#2: fix ‘bad’ tables in short: “we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key” Let’s see why, and how: Goal

Pitfalls takes1 (ssn, c-id, grade, name, address) Ssnc-idGradeNameAddress 123 cs331 A smithMain

Pitfalls ‘Bad’ - why? because: ssn->address, name

Pitfalls Redundancy space (inconsistencies) insertion/deletion anomalies:

Pitfalls insertion anomaly: “jones” registers, but takes no class - no place to store his address!

Pitfalls deletion anomaly: delete the last record of ‘smith’ (we lose his address!)

Solution: decomposition split offending table in two (or more), e.g.: ??

Overview - detailed DB design and normalization pitfalls of bad design decomposition lossless join dependency preserving normal forms

Decompositions there are ‘bad’ decompositions we want: lossless and dependency preserving

Decompositions - lossy: R1(ssn, grade, name, address) R2(c-id,grade) ssn->name, address ssn, c-id -> grade c-idGrade cs331A cs351B cs211A

Decompositions - lossy: can not recover original table with a join! ssn->name, address ssn, c-id -> grade c-idGrade cs331A cs351B cs211A

Decompositions – lossy: Another example Decomposition of R = (A, B) into R 1 = (A), R 2 = (B) AB  A  B 1212 r A(r)A(r) B(r)B(r)  A (r)  B (r) AB 

Decompositions example of non-dependency preserving S# -> address, status address -> status S# -> addressS# -> status

Decompositions is it lossless? S# -> address, status address -> status S# -> addressS# -> status

Decompositions - lossless Definition: Consider schema R, with FD ‘F’. R1, R2 is a lossless join decomposition of R if we always have: An easier criterion?

Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally:

Decomposition - lossless example: ssn->name, address ssn, c-id -> grade Ssnc-idGrade 123 cs331 A 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade R1 R2

Overview - detailed DB design and normalization pitfalls of bad design decomposition lossless join decomp. dependency preserving normal forms

Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status

Decomposition - depend. pres. dependency preserving decomposition: S# -> address, status address -> status S# -> addressaddress -> status (but: S#->status ?)

Decomposition - depend. pres. informally: we don’t want the original FDs to span two tables more specifically: … the FDs of the canonical cover Let F i be the set of dependencies F + that include only attributes in R i. Preferably the decomposition should be dependency preserving, that is, (F 1  F 2  …  F n ) + = F + Otherwise, checking updates for violation of functional dependencies may require computing joins  expensive

Decomposition - depend. pres. why is dependency preservation good? S# -> addressaddress -> status S# -> addressS# -> status (address->status: ‘lost’)

Decomposition - depend. pres. A: eg., record that ‘Philly’ has status ‘A’ S# -> addressaddress -> status S# -> address S# -> status (address->status: ‘lost’)

Decomposition - depend. pres. To check if a dependency  is preserved in a decomposition of R into R 1, R 2, …, R n we apply the following test (with attribute closure done w.r.t. F) result =  while (changes to result) do for each R i in the decomposition t = (result  R i ) +  R i result = result  t If result contains all attributes in , then functional dependency    is preserved We apply the test on all dependencies in F to check if a decomposition is dependency preserving The test takes polynomial time Computing F + and (F 1  F 2  …  F n ) + needs exponential time

Decomposition - conclusions decompositions should always be lossless joining attribute -> superkey whenever possible, we want them to be dependency preserving (occasionally, impossible - see ‘STJ’ example later…)

Normalization using FD When decomposing a relation schema R with a set of functional dependencies F into R 1, R 2,…, R n we want: Lossless-join decomposition: otherwise … information loss No redundancy: relations R i preferably should be in either Boyce-Codd Normal Form or Third Normal Form Dependency preservation: Let F i be the set of dependencies in F + that include only attributes in R i. Preferably the decomposition should be dependency preserving, i.e., (F 1  F 2  …  F n ) + = F + Otherwise, checking updates for violation of functional dependencies may require computing joins  expensive

Normalization using FD - Example R = (A, B, C) F = {A  B, B  C) R 1 = (A, B), R 2 = (B, C) Lossless-join decomposition: R 1  R 2 = {B} and B  BC Dependency preserving R 1 = (A, B), R 2 = (A, C) Lossless-join decomposition: R 1  R 2 = {A} and A  AB Not dependency preserving (cannot check B  C without computing R 1  R 2 )

Overview - detailed DB design and normalization pitfalls of bad design decomposition (  how to fix the problem) normal forms (  how to detect the problem) BCNF, 3NF, (1NF, 2NF)

Normal forms - BCNF We saw how to fix ‘bad’ schemas - but what is a ‘good’ schema? Answer: ‘good’, if it obeys a ‘normal form’, i.e., a set of rules Typically: Boyce-Codd Normal Form (BCNF)

Normal forms - BCNF Defn.: Rel. R is in BCNF w.r.t. F, if informally: everything depends on the full key, and nothing but the key semi-formally: every determinant (of the cover) is a candidate key

Normal forms - BCNF Example and counter-example: ssn->name, address ssn, c-id -> grade

Normal forms - BCNF Formally: for every FD a->b in F+ a->b is trivial (a is a superset of b) or a is a superkey (or both)

Normal forms - BCNF Theorem: given a schema R and a set of FD ‘F’, we can always decompose it to schemas R1, … Rn, so that R1, … Rn are in BCNF and the decomposition is lossless (…but, some decomp. might lose dependencies)

BCNF Decomposition How? ….essentially, break off FDs of the cover eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade

Normal forms - BCNF eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn

Normal forms - BCNF ssn->name, address ssn, c-id -> grade Ssnc-idGrade 123 cs331 A 123 cs351 B 234 cs211 A ssn->name, address ssn, c-id -> grade

Normal forms - BCNF pictorially: we want a ‘star’ shape name addressgrade c-id ssn :not in BCNF

Normal forms - BCNF pictorially: we want a ‘star’ shape B C A G E D or FH

Normal forms - BCNF or a star-like: (e.g., 2 cand. keys): STUDENT(ssn, st#, name, address) name address ssn st# = name addressssnst#

Normal forms - BCNF but not: or B CA D G E D F H

BCNF Decomposition result := {R}; done := false; compute F + ; while (not done) do if (there is a schema R i in result that is not in BCNF) then begin let     be a nontrivial functional dependency that holds on R i such that    R i is not in F +, and    =  ; result := (result – R i )  (R i –  )  ( ,  ); end else done := true; Note: each R i is in BCNF, and decomposition is lossless-join

Normal forms - 3NF consider the ‘classic’ case: STJ( Student, Teacher, subJect) T-> J S,J -> T is it BCNF? S T J

Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T How to decompose it to BCNF? S T J

Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? ) 2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )

Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N ) 2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )

Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T in this case: impossible to have both BCNF and dependency preservation Welcome 3NF (…a weaker normal form)!

Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T informally, 3NF ‘forgives’ the red arrow in the can. cover

Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T Formally, a rel. R with FDs ‘F’ is in 3NF if: for every a->b in F+: it is trivial or a is a superkey or each b-a attr.: part of a cand. key

Normal forms - 3NF R = (J, K, L) F = {JK  L, L  K} Two candidate keys = JK and JL R is not in BCNF Any decomposition of R will fail to preserve JK  L n BCNF decomposition has (JL) and (LK) n Testing for JK  L requires a join R is in 3NF JK  L JK is a superkey L  K K is contained in a candidate key There is some redundancy in this schema…

Normal forms - 3NF TESTING FOR 3NF Optimization: Need to check only FDs in F, need not check all FDs in F + Use attribute closure to check, for each dependency   , if  is a superkey If  is not a superkey, we have to verify if each attribute in  is contained in a candidate key of R this test is more expensive; it involves finding candidate keys testing for 3NF is NP-hard Interestingly, decomposition into 3NF (described shortly) can be done in polynomial time

Decomposition into 3NF Let F c be a canonical cover for F; i := 0; for each functional dependency    in F c do if none of the schemas R j, 1  j  i contains   then begin i := i + 1; R i :=   end if none of the schemas R j, 1  j  i contains a candidate key for R then begin i := i + 1; R i := any candidate key for R; end return (R 1, R 2,..., R i ) The dependencies are preserved by building explicitly a schema for each given dependency Guarantees a lossless-join decomposition by having at least one schema containing a candidate key for the schema being decomposed

Normal forms - 3NF how to bring a schema to 3NF? In short ….for each FD in the cover, put it in a table

Normal forms - 3NF vs BCNF If ‘R’ is in BCNF, it is always in 3NF (but not the reverse) In practice, aim for BCNF; lossless join; and dep. preservation if impossible, we accept 3NF; but insist on lossless join and dep. preservation 3NF has problems with transitive dependecies

3NF vs BCNF (cont.) Example of problems due to redundancy in 3NF R = (J, K, L) F = {JK  L, L  K} A schema that is in 3NF but not in BCNF has the problems of repetition of information (e.g., the relationship l 1, k 1 ) need to use null values (e.g., to represent the relationship l 2, k 2 where there is no corresponding value for J). JLK j 1 j 2 j 3 null l1l1l1l2l1l1l1l2 k1k1k1k2k1k1k1k2

Normal forms - more details why ‘3’NF? what is 2NF? 1NF? 1NF: attributes are atomic (i.e., no set- valued attr., a.k.a. ‘repeating groups’) not 1NF

Normal forms - more details 2NF: 1NF and non-key attr. fully depend on the key counter-example: TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn

Normal forms - more details 3NF: 2NF and no transitive dependencies counter-example: B CA D in 2NF, but not in 3NF

Normal forms - more details 4NF, multivalued dependencies etc: later… in practice, E-R diagrams usually lead to tables in BCNF

Overview - conclusions DB design and normalization pitfalls of bad design decompositions (lossless, dep. preserving) normal forms (BCNF or 3NF) “everything should depend on the key, the whole key, and nothing but the key”