CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Lecture #17: Schema Refinement & Normalization - Normal Forms (R&G,

Slides:



Advertisements
Similar presentations
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos - A. Pavlo Lecture#2: E-R diagrams.
Advertisements

CMU SCS Faloutsos & PavloCMU SCS /6151 Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #18: Physical Database.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #19 (not in book) Database Design Methodology handout.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#12: External Sorting.
Schema Refinement: Normal Forms
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#14(b): Implementation of Relational.
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Functional Dependencies (based on notes by Silberchatz,Korth, and.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Integrity Constraints.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
1 Normalization. 2 Normal Forms v If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of redundancies are avoided/minimized.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Databases 6: Normalization
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Introduction to Schema Refinement
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#16: Schema Refinement & Normalization.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
1 Dept. of CIS, Temple Univ. CIS616/661 – Principles of Data Management V. Megalooikonomou Integrity Constraints (based on slides by C. Faloutsos at CMU)
Temple University – CIS Dept. CIS331– Principles of Database Systems V. Megalooikonomou Database Design and Normalization (based on notes by Silberchatz,Korth,
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization - Functional Dependencies.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Database design and normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Dept. of CIS, Temple Univ. CIS661 – Principles of Data Management V. Megalooikonomou Database design and normalization (based on slides by C. Faloutsos.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Relational Data Model, Review Relation Tuple Attribute Domains Candidate key, primary key Key attribute, non-key attribute.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Formal definition of a key A key is a set of attributes A 1,..., A n such that for any other attribute B: A 1,..., A n  B A minimal key is a set of attributes.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
CS422 Principles of Database Systems Normalization
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
Database Design Dr. M.E. Fayad, Professor
Relational Database Design by Dr. S. Sridhar, Ph. D
Faloutsos & Pavlo SCS /615 Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #16: Schema Refinement & Normalization.
Module 5: Overview of Normalization
Lecture #17: Schema Refinement & Normalization - Normal Forms
Temple University – CIS Dept. CIS331– Principles of Database Systems
Sridhar Narayan Normalization Sridhar Narayan
Normalization Edited by: Nada Alhirabi.
Lecture 8: Database Design
Normalisation to 3NF.
Database Design Dr. M.E. Fayad, Professor
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Presentation transcript:

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Lecture #17: Schema Refinement & Normalization - Normal Forms (R&G, ch. 19)

CMU SCS Faloutsos & PavloCMU SCS /6152 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition –normal forms

CMU SCS Faloutsos & PavloCMU SCS /6153 Design good tables –sub-goal#1: define what good means –sub-goal#2: fix bad tables in short: we want tables where the attributes depend on the primary key, on the whole key, and nothing but the key Lets see why, and how: Goal

CMU SCS Faloutsos & PavloCMU SCS /6154 Pitfalls takes1 (ssn, c-id, grade, name, address)

CMU SCS Faloutsos & PavloCMU SCS /6155 Pitfalls Bad - why? because: ssn->address, name

CMU SCS Faloutsos & PavloCMU SCS /6156 Pitfalls? Redundancy –??

CMU SCS Faloutsos & PavloCMU SCS /6157 Pitfalls Redundancy –space –(inconsistencies) –insertion/deletion anomalies:

CMU SCS Faloutsos & PavloCMU SCS /6158 Pitfalls insertion anomaly: –jones registers, but takes no class - no place to store his address!

CMU SCS Faloutsos & PavloCMU SCS /6159 Pitfalls deletion anomaly: –delete the last record of smith (we lose his address!)

CMU SCS Faloutsos & PavloCMU SCS /61510 Solution: decomposition split offending table in two (or more), eg.: ??

CMU SCS Faloutsos & PavloCMU SCS /61511 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition lossless join decomp. dependency preserving –normal forms

CMU SCS Faloutsos & PavloCMU SCS /61512 Decompositions There are bad decompositions. Good ones are: lossless and dependency preserving

CMU SCS Faloutsos & PavloCMU SCS /61513 Decompositions - lossy: R1(ssn, grade, name, address) R2(c-id, grade) ssn->name, address ssn, c-id -> grade

CMU SCS Faloutsos & PavloCMU SCS /61514 Decompositions - lossy: can not recover original table with a join! ssn->name, address ssn, c-id -> grade

CMU SCS Faloutsos & PavloCMU SCS /61515 Decompositions - overview There are bad decompositions. Good ones are: lossless and dependency preserving MUST HAVE Nice to have

CMU SCS Faloutsos & PavloCMU SCS /61516 Decompositions example of non-dependency preserving S# -> address, status address -> status S# -> addressS# -> status

CMU SCS Faloutsos & PavloCMU SCS /61517 Decompositions (drill: is it lossless?) S# -> address, status address -> status S# -> addressS# -> status

CMU SCS Faloutsos & PavloCMU SCS /61518 Decompositions - overview There are bad decompositions. Good ones are: #1) lossless and #2) dependency preserving MUST HAVE Nice to have How to automatically determine #1 and #2?

CMU SCS Faloutsos & PavloCMU SCS /61519 Decompositions - lossless Definition: consider schema R, with FD F. R1, R2 is a lossless join decomposition of R if we always have: An easier criterion?

CMU SCS Faloutsos & PavloCMU SCS /61520 Decomposition - lossless Theorem: lossless join decomposition if the joining attribute is a superkey in at least one of the new tables Formally:

CMU SCS Faloutsos & PavloCMU SCS /61521 Decomposition - lossless example: ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade R1 R2

CMU SCS Faloutsos & PavloCMU SCS /61522 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition lossless join decomp. dependency preserving –normal forms

CMU SCS Faloutsos & PavloCMU SCS /61523 Decomposition - depend. pres. informally: we dont want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status

CMU SCS Faloutsos & PavloCMU SCS /61524 Decomposition - depend. pres. informally: we dont want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status (Q: Why is it an issue?)

CMU SCS Faloutsos & PavloCMU SCS /61525 Decomposition - depend. pres. informally: we dont want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status (Q: Why is it an issue?) (A: insert [999, Pitts., E])

CMU SCS Faloutsos & PavloCMU SCS /61526 Decomposition - depend. pres. informally: we dont want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status

CMU SCS Faloutsos & PavloCMU SCS /61527 Decomposition - depend. pres. informally: we dont want the original FDs to span two tables - counter-example: S# -> address, status address -> status S# -> addressS# -> status of the COVER

CMU SCS Decomposition - depend. pres. A subtle point To avoid it, use the canonical cover of the FDs Faloutsos & PavloCMU SCS /61528

CMU SCS Faloutsos & PavloCMU SCS /61529 Decomposition - depend. pres. dependency preserving decomposition: S# -> address, status address -> status S# -> addressaddress -> status (but: S#->status ?)

CMU SCS Faloutsos & PavloCMU SCS /61530 Decomposition - depend. pres. informally: we dont want the original FDs to span two tables. More specifically: … the FDs of the canonical cover.

CMU SCS Faloutsos & PavloCMU SCS /61531 Decomposition - depend. pres. Q: why is dependency preservation good? S# -> addressaddress -> status S# -> address S# -> status (address->status: lost)

CMU SCS Faloutsos & PavloCMU SCS /61532 Decomposition - depend. pres. A1: insert [999, Pitts., E] -> REJECT S# -> addressaddress -> status S# -> address S# -> status (address->status: lost)

CMU SCS Faloutsos & PavloCMU SCS /61533 Decomposition - depend. pres. A2: eg., record that Philly has status A S# -> addressaddress -> status S# -> address S# -> status (address->status: lost)

CMU SCS Faloutsos & PavloCMU SCS /61534 Decomposition - conclusions decompositions should always be lossless – joining attribute -> superkey whenever possible, we want them to be dependency preserving (occasionally, impossible - see STJ example later…) MUST HAVE Nice to have

CMU SCS Faloutsos & PavloCMU SCS /61535 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition (-> how to fix the problem) –normal forms (-> how to detect the problem) BCNF, 3NF (1NF, 2NF)

CMU SCS Faloutsos & PavloCMU SCS /61536 Normal forms - BCNF We saw how to fix bad schemas - but what is a good schema? Answer: good, if it obeys a normal form, ie., a set of rules. Typically: Boyce-Codd Normal form

CMU SCS Faloutsos & PavloCMU SCS /61537 Normal forms - BCNF Defn.: Rel. R is in BCNF wrt F, if informally: everything depends on the full key, and nothing but the key semi-formally: every determinant (of the cover) is a candidate key

CMU SCS Faloutsos & PavloCMU SCS /61538 Normal forms - BCNF Example and counter-example: ssn->name, address ssn, c-id -> grade

CMU SCS Faloutsos & PavloCMU SCS /61539 Normal forms - BCNF Formally: for every FD a->b in F –a->b is trivial (a superset of b) or –a is a superkey

CMU SCS Faloutsos & PavloCMU SCS /61540 Normal forms - BCNF Example and counter-example: ssn->name, address ssn, c-id -> grade Drill: Check formal dfn: a->b trivial, or a is superkey

CMU SCS Faloutsos & PavloCMU SCS /61541 Normal forms - BCNF Example and counter-example: ssn->name, address ssn,name -> address ssn->name, address ssn, c-id -> grade ssn, name -> address ssn, c-id, name -> grade Drill: Check formal dfn: a->b trivial, or a is superkey

CMU SCS Faloutsos & PavloCMU SCS /61542 Normal forms - BCNF Theorem: given a schema R and a set of FD F, we can always decompose it to schemas R1, … Rn, so that –R1, … Rn are in BCNF and –the decompositions are lossless. (but, some decomp. might lose dependencies)

CMU SCS Faloutsos & PavloCMU SCS /61543 Normal forms - BCNF How? algorithm in book: for a relation R - for every FD X->A that violates BCNF, decompose to tables (X,A) and (R-A) - repeat recursively eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade

CMU SCS Faloutsos & PavloCMU SCS /61544 Normal forms - BCNF eg. TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn

CMU SCS Faloutsos & PavloCMU SCS /61545 Normal forms - BCNF ssn->name, address ssn, c-id -> grade ssn->name, address ssn, c-id -> grade

CMU SCS Faloutsos & PavloCMU SCS /61546 Normal forms - BCNF pictorially: we want a star shape name addressgrade c-id ssn :not in BCNF

CMU SCS Faloutsos & PavloCMU SCS /61547 Normal forms - BCNF pictorially: we want a star shape B C A G E D or FH

CMU SCS Faloutsos & PavloCMU SCS /61548 Normal forms - BCNF or a star-like: (eg., 2 cand. keys): STUDENT(ssn, st#, name, address) name address ssn st# = name addressssnst#

CMU SCS Faloutsos & PavloCMU SCS /61549 Normal forms - BCNF but not: or B CA D G E D F H

CMU SCS Faloutsos & PavloCMU SCS /61550 Overview - detailed DB design and normalization –pitfalls of bad design –decomposition (-> how to fix the problem) –normal forms (-> how to detect the problem) BCNF, 3NF (1NF, 2NF)

CMU SCS Faloutsos & PavloCMU SCS /61551 Normal forms - BCNF Theorem: given a schema R and a set of FD F, we can always decompose it to schemas R1, … Rn, so that –R1, … Rn are in BCNF and –the decompositions are lossless. (but, some decomp. might lose dependencies) Reminder

CMU SCS Faloutsos & PavloCMU SCS /61552 Normal forms - BCNF Theorem: given a schema R and a set of FD F, we can always decompose it to schemas R1, … Rn, so that –R1, … Rn are in BCNF and –the decompositions are lossless. (but, some decomp. might lose dependencies) How is this possible? Reminder

CMU SCS Faloutsos & PavloCMU SCS /61553 Subtle answer In some rare cases, like the (Student, Teacher, subJect) setting:

CMU SCS Faloutsos & PavloCMU SCS /61554 Normal forms - 3NF consider the classic case: STJ( Student, Teacher, subJect) T-> J S,J -> T is it BCNF? S T J

CMU SCS Faloutsos & PavloCMU SCS /61555 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T How to decompose it to BCNF? S T J

CMU SCS Faloutsos & PavloCMU SCS /61556 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? - lossless? - dep. pres.? ) 2) R1(T,J) R2(S,T) (BCNF? - lossless? - dep. pres.? )

CMU SCS Faloutsos & PavloCMU SCS /61557 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T 1) R1(T,J) R2(S,J) (BCNF? Y+Y - lossless? N - dep. pres.? N ) 2) R1(T,J) R2(S,T) (BCNF? Y+Y - lossless? Y - dep. pres.? N )

CMU SCS Faloutsos & PavloCMU SCS /61558 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T in this case: impossible to have both BCNF and dependency preservation Welcome 3NF!

CMU SCS Faloutsos & PavloCMU SCS /61559 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T informally, 3NF forgives the red arrow in the canonical cover

CMU SCS Faloutsos & PavloCMU SCS /61560 Normal forms - 3NF STJ( Student, Teacher, subJect) T-> J S,J -> T S J T Formally, a rel. R with FDs F is in 3NF if: for every a->b in F: it is trivial or a is a superkey or b: part of a candidate key

CMU SCS Faloutsos & PavloCMU SCS /61561 Normal forms - 3NF how to bring a schema to 3NF? two algos in book: First one: start from ER diagram and turn to tables then we have a set of tables R1,... Rn which are in 3NF for each FD (X->A) in the cover that is not preserved, create a table (X,A)

CMU SCS Faloutsos & PavloCMU SCS /61562 Normal forms - 3NF how to bring a schema to 3NF? two algos in book: Second one (synthesis) take all attributes of R for each FD (X->A) in the cover, add a table (X,A) if not lossless, add a table with appropriate key

CMU SCS Faloutsos & PavloCMU SCS /61563 Normal forms - 3NF Example: R: ABC F: A->B, C->B Q1: what is the cover? What is the cand. key? Q2: what is the decomposition to 3NF?

CMU SCS Faloutsos & PavloCMU SCS /61564 Normal forms - 3NF Example: R: ABC F: A->B, C->B Q1: what is the cover? What is the cand. key? A1: F is the cover; AB is the cand. key Q2: what is the decomposition to 3NF?

CMU SCS Faloutsos & PavloCMU SCS /61565 Normal forms - 3NF Example: R: ABC F: A->B, C->B Q1: what is the cover? What is the cand. key? A1: F is the cover; AB is the cand. key Q2: what is the decomposition to 3NF? A2: R1(A,B), R2(C,B),... [is it lossless??]

CMU SCS Faloutsos & PavloCMU SCS /61566 Normal forms - 3NF Example: R: ABC F: A->B, C->B Q1: what is the cover? What is the cand. key? A1: F is the cover; AB is the cand. key Q2: what is the decomposition to 3NF? A2: R1(A,B), R2(C,B), R3(A,C)

CMU SCS Faloutsos & PavloCMU SCS /61567 Normal forms - 3NF vs BCNF If R is in BCNF, it is always in 3NF (but not the reverse) In practice, aim for –BCNF; lossless join; and dep. preservation if impossible, we accept –3NF; but insist on lossless join and dep. preservation

CMU SCS Faloutsos & PavloCMU SCS /61568 Normal forms - more details why 3NF? what is 2NF? 1NF? 1NF: attributes are atomic (ie., no set- valued attr., a.k.a. repeating groups) not 1NF

CMU SCS Faloutsos & PavloCMU SCS /61569 Normal forms - more details 2NF: 1NF and non-key attr. fully depend on the key counter-example: TAKES1(ssn, c-id, grade, name, address) ssn -> name, address ssn, c-id -> grade name addressgrade c-id ssn not 2NF

CMU SCS Faloutsos & PavloCMU SCS /61570 Normal forms - more details 3NF: 2NF and no transitive dependencies counter-example: B CA D in 2NF, but not in 3NF

CMU SCS Faloutsos & PavloCMU SCS /61571 Normal forms - more details 4NF, multivalued dependencies etc: IGNORE in practice, E-R diagrams usually lead to tables in BCNF

CMU SCS Faloutsos & PavloCMU SCS /61572 Overview - conclusions DB design and normalization –pitfalls of bad design –decompositions (lossless, dep. preserving) –normal forms (BCNF or 3NF) everything should depend on the key, the whole key, and nothing but the key