Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College.

Similar presentations


Presentation on theme: "Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College."— Presentation transcript:

1 Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College

2 Building a Schema Start with a list of all attributes, considered as if you had a giant flat database (one relation) with all possible information in one place Divide the attributes into multiple relations –Intuitively –According to formal rules (normalization) This is a formalized alternative to the algorithms we learned before

3 What Makes A Good Schema? 1.Each relation should have clear semantics, i.e. can be easily described in a few words 2.Try to avoid redundancy (to minimize storage space, but also to avoid anomalies) 3.Avoid a design that encourages too many NULL values in a relation. NULL can be ambiguous: N/A vs. unknown vs. not-yet-entered, etc. 4.Don’t split related attributes so that the relationship between them is lost (e.g. make sure LastName and UserID are both in the same relation)

4 Tracking Real Estate Staff Consider a single relation for real estate It contains branch name, branch number, staff name, staff number, staff salary, etc. –One entry for each staff member of each branch –Branch information is repeated for different staff (REDUNDANCY!) –Staff information is repeated if they work in multiple branches (REDUNDANCY!) This is an example of what NOT to do

5 Redundancy-caused Anomalies Insertion Anomalies –A branch with no staff has many NULLs –Entering a new staff member has NULL branch info But branch number and staff number are both part of primary key! (Why) Deletion Anomalies –When the last staff member at a branch is deleted, the branch info is lost Update Anomalies –If we make a change in branch info once, it must be changed in all copies (for all staff).

6 Solving Redundancy Problems Decompose the relation into multiple relations Use Foreign Keys so the complete relation can be reconstructed through a join –Branch: has branch number & branch info –Staff: has staff number, staff info & branch number as foreign key Foreign Keys are exactly the attributes that are in the primary key of the other relation Insertion, deletion & update anomalies are gone! –Consider: add branch with no staff, remove last staff member, update branch info

7 How to Decompose? Decompositions are not (always) intuitively obvious Codd discovered mathematical properties (called Normal Forms) that describe “goodness” of decomposition First, Second, Third normal forms decrease redundancy without loss of information BCNF, Fourth and Fifth normal forms potentially introduce information loss (we will see…) To understand normal forms, start with functional dependencies

8 Functional Dependencies If A and B are attributes, and every value of A is associated with exactly one value of B (so knowing A predicts B), then B is functionally dependent on A (We write this as: A->B) Functional dependency is based on the semantics (meaning) of the attributes. A->B and B->A are two different constraints –Email -> first name is a valid dependency –First name -> Email is not a valid dependency

9 Examples of Functional Dependencies US Zip Code -> State US Area Code -> State Email -> Firstname, lastname HotelNo, RoomNo -> Price JobTitle, ServiceLength -> Salary

10 What are the dependencies? itemplacecustomer-name ringKay jewelersprince charming ringwalmartmiss piggy Place -> item? Item -> place? oilwalmarttin man

11 Finding Dependencies in Data If a value of attribute A is associated with two or more values of B, then it is not true that A->B. If a value of attribute A is associated with exactly one value of B, then it might be true that A->B. Only when every possible value of attribute A is associated with exactly one value of B is it true that A->B.

12 Characteristics of Functional Dependencies for Normalization For any given values of the attributes on the left, there is exactly one possible attribute on the right No future data will ever invalidate the dependency Dependency is nontrivial -- no attributes from the left are repeated on the right

13 Keys & Functional Dependency Remember, a candidate key is a subset of attributes that is (guaranteed) unique for every tuple Therefore, a valid candidate key determines all other attributes in the tuple Therefore, there is a functional dependency from the candidate key to all other non-key attributes of the relation. (Since the primary key is a candidate key, these arguments can also be made for primary keys)

14 Manipulating Functional Dependencies Given a set of dependencies, derive more dependencies using inference rules The closure X + of a set of dependencies is the set of all possible dependencies that can be derived from it.

15 Armstrong’s Inference Rules for Manipulating Dependencies 1. if Y is a subset of X, then X -> Y Alternatively: X,Y -> X(Reflexive) 1. If X->Y then X,Z->Y,Z(Augmentation) 2. If X->Y and Y->Z then X->Z(Transitive)

16 Additional Inference Rules 4.A->A (Self-determination) 5.If A->B,C then A->B and A->C (Decomposition) 6.If A->B and A->C then A->B,C (Union) 7.If A->B and C->D then A,C -> B,D (Composition)

17 When are two sets of FDs equivalent? When we can use inference rules to transform A to B, then A and B are equivalent Problem: it might take a long time to find the right set of inference rules What we need is a “standard form” of FD’s - then we can just compare

18 Finding the Closure F is a set of functional dependencies (e.g. the obvious ones from primary keys) We want to find X+, which is the set of all attributes that are dependent on X (based on F). X+ = X repeat for each dependency Y->Z in F do if Y is a subset of X+ then X+ = X+ union Z until no more can be added to X+

19 Closure Example F is the following set of dependencies: A->B,CC->D A,D -> F What is A+ (all attributes that can be derived from A)? –Initialize A+ = A –Because A->B,C add B,C to A+ –Because C is in A+ and C->D, add D to A+ –Because A and D are in A+, add F to A+ –Therefore A+ is A,B,C,D,F

20 Equivalence Test Are the following sets of FDs equivalent? –AB->C, D->E, AE->G, GD->H, ID->J –ABD->C, ABE->G, GD->EH, IE->J Compute closures for each, if any two are different, they are not equivalent –You will need to consider every left side…

21 Finding a Key Given a relation with attributes ABCDEFGHIJ and the following FDs, find a candidate key for the relation –AB->C, D->E, AE->G, GD->H, ID->J A candidate key is a subset of attributes that has the entire set of attributes as its closure. –Let’s try ABD…

22 What is Normalization? Formal technique for analyzing relations based on primary key (or candidate keys) and functional dependencies Series of tests (normal forms), each of which is harder to “pass” –Normal forms 1NF, 2NF, 3NF, BCNF depend on functional dependencies –Higher forms (4NF, 5NF) based on other dependencies To avoid update anomalies without loss, normalize to 3NF.


Download ppt "Introduction to Normalization CPSC 356 Database Ellen Walker Hiram College."

Similar presentations


Ads by Google