Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.

Slides:



Advertisements
Similar presentations
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
Advertisements

The Relational Model System Development Life Cycle Normalisation
1 Database Systems: A Practical Approach to Design, Implementation and Management International Computer Science S. Carolyn Begg, Thomas Connolly Lecture.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Normalization I.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Chapter 5 Normalization Transparencies © Pearson Education Limited 1995, 2005.
Databases 6: Normalization
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
FUNCTIONAL DEPENDENCIES
Lecture 12 Inst: Haya Sammaneh
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.
Chapter 13 Normalization Transparencies. 2 Last Class u Access Lab.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
1 Pertemuan 23 Normalisasi Matakuliah: >/ > Tahun: > Versi: >
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
Lecture 5: Functional dependencies and normalization Jose M. Peña
Normalization Transparencies
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
CSC271 Database Systems Lecture # 28.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Chapter 13 Normalization Transparencies. 2 Chapter 13 - Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 15 Basics of Functional Dependencies and Normalization for Relational.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Ihr Logo Fundamentals of Database Systems Fourth Edition El Masri & Navathe Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
Chapter 10 Normalization Pearson Education © 2009.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
CS 405G: Introduction to Database Systems Database Normalization.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Ch 7: Normalization-Part 1
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
ITD1312 Database Principles Chapter 4C: Normalization.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Chapter 8 Relational Database Design Topic 1: Normalization Chuan Li 1 © Pearson Education Limited 1995, 2005.
Normalization.
Advanced Normalization
Advanced Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Database Normalization
Normalization Murali Mani.
Functional Dependencies and Normalization
Normalization Dale-Marie Wilson, Ph.D..
Normalization.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normalization cs3431.
Relational Database Design
Presentation transcript:

Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College

Building a Schema Start with a list of all attributes, considered as if you had a giant flat database (one relation) with all possible information in one place Divide the attributes into multiple relations –Intuitively –According to formal rules (normalization) This is a formalized alternative to the algorithms we learned before

What Makes A Good Schema? 1.Each relation should have clear semantics, i.e. can be easily described in a few words 2.Try to avoid redundancy (to minimize storage space, but also to avoid anomalies) 3.Avoid a design that encourages too many NULL values in a relation. NULL can be ambiguous: N/A vs. unknown vs. not-yet-entered, etc. 4.Don’t split related attributes so that the relationship between them is lost (e.g. make sure LastName and UserID are both in the same relation)

Tracking Real Estate Staff Consider a single relation for real estate It contains branch name, branch number, staff name, staff number, staff salary, etc. –One entry for each staff member of each branch –Branch information is repeated for different staff (REDUNDANCY!) –Staff information is repeated if they work in multiple branches (REDUNDANCY!) This is an example of what NOT to do

Redundancy-caused Anomalies Insertion Anomalies –A branch with no staff has many NULLs –Entering a new staff member has NULL branch info But branch number and staff number are both part of primary key! (Why) Deletion Anomalies –When the last staff member at a branch is deleted, the branch info is lost Update Anomalies –If we make a change in branch info once, it must be changed in all copies (for all staff).

Solving Redundancy Problems Decompose the relation into multiple relations Use Foreign Keys so the complete relation can be reconstructed through a join –Branch: has branch number & branch info –Staff: has staff number, staff info & branch number as foreign key Foreign Keys are exactly the attributes that are in the primary key of the other relation Insertion, deletion & update anomalies are gone! –Consider: add branch with no staff, remove last staff member, update branch info

How to Decompose? Decompositions are not (always) intuitively obvious Codd discovered mathematical properties (called Normal Forms) that describe “goodness” of decomposition First, Second, Third normal forms decrease redundancy without loss of information BCNF, Fourth and Fifth normal forms potentially introduce information loss (we will see…) To understand normal forms, start with functional dependencies

Functional Dependencies If A and B are attributes, and every value of A is associated with exactly one value of B (so knowing A predicts B), then B is functionally dependent on A (We write this as: A->B) Functional dependency is based on the semantics (meaning) of the attributes. A->B and B->A are two different constraints – -> first name is a valid dependency –First name -> is not a valid dependency

Examples of Functional Dependencies US Zip Code -> State US Area Code -> State -> Firstname, lastname HotelNo, RoomNo -> Price JobTitle, ServiceLength -> Salary

What are the dependencies? itemplacecustomer-name ringKay jewelersprince charming ringwalmartmiss piggy Place -> item? Item -> place? oilwalmarttin man

Finding Dependencies in Data If a value of attribute A is associated with two or more values of B, then it is not true that A->B. If a value of attribute A is associated with exactly one value of B, then it might be true that A->B. Only when every possible value of attribute A is associated with exactly one value of B is it true that A->B.

Characteristics of Functional Dependencies for Normalization For any given values of the attributes on the left, there is exactly one possible attribute on the right No future data will ever invalidate the dependency Dependency is nontrivial -- no attributes from the left are repeated on the right

Keys & Functional Dependency Remember, a candidate key is a subset of attributes that is (guaranteed) unique for every tuple Therefore, a valid candidate key determines all other attributes in the tuple Therefore, there is a functional dependency from the candidate key to all other non-key attributes of the relation. (Since the primary key is a candidate key, these arguments can also be made for primary keys)

Manipulating Functional Dependencies Given a set of dependencies, derive more dependencies using inference rules The closure X + of a set of dependencies is the set of all possible dependencies that can be derived from it.

Armstrong’s Inference Rules for Manipulating Dependencies 1. if Y is a subset of X, then X -> Y Alternatively: X,Y -> X(Reflexive) 1. If X->Y then X,Z->Y,Z(Augmentation) 2. If X->Y and Y->Z then X->Z(Transitive)

Additional Inference Rules 4.A->A (Self-determination) 5.If A->B,C then A->B and A->C (Decomposition) 6.If A->B and A->C then A->B,C (Union) 7.If A->B and C->D then A,C -> B,D (Composition)

When are two sets of FDs equivalent? When we can use inference rules to transform A to B, then A and B are equivalent Problem: it might take a long time to find the right set of inference rules What we need is a “standard form” of FD’s - then we can just compare

Finding the Closure F is a set of functional dependencies (e.g. the obvious ones from primary keys) We want to find X+, which is the set of all attributes that are dependent on X (based on F). X+ = X repeat for each dependency Y->Z in F do if Y is a subset of X+ then X+ = X+ union Z until no more can be added to X+

Closure Example F is the following set of dependencies: A->B,CC->D A,D -> F What is A+ (all attributes that can be derived from A)? –Initialize A+ = A –Because A->B,C add B,C to A+ –Because C is in A+ and C->D, add D to A+ –Because A and D are in A+, add F to A+ –Therefore A+ is A,B,C,D,F

Equivalence Test Are the following sets of FDs equivalent? –AB->C, D->E, AE->G, GD->H, ID->J –ABD->C, ABE->G, GD->EH, IE->J Compute closures for each, if any two are different, they are not equivalent –You will need to consider every left side…

Finding a Key Given a relation with attributes ABCDEFGHIJ and the following FDs, find a candidate key for the relation –AB->C, D->E, AE->G, GD->H, ID->J A candidate key is a subset of attributes that has the entire set of attributes as its closure. –Let’s try ABD…

What is Normalization? Formal technique for analyzing relations based on primary key (or candidate keys) and functional dependencies Series of tests (normal forms), each of which is harder to “pass” –Normal forms 1NF, 2NF, 3NF, BCNF depend on functional dependencies –Higher forms (4NF, 5NF) based on other dependencies To avoid update anomalies without loss, normalize to 3NF.