Normalization Sridhar Narayan SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW.

Presentation on theme: "Normalization Sridhar Narayan SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW."— Presentation transcript:

Normalization Sridhar Narayan narayans@uncw.edu

SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW EMP_PROJ Something feels wrong about this design Try adding a row – Insertion anomaly Try deleting a row – Deletion anomaly Try updating a row – Update anomaly Need a formal way to reason about what is wrong with it and how to fix it

Functional Dependency Constraints between attribute sets in a relation If X and Y are sets of attributes of a relation R, and whenever two tuples in R have the same X-values they also have the same Y-values, we say that X functionally determines Y.

Functional Dependency Written as X -> Y – X functionally determines Y – Y is functionally determined by X – X is the determinant, Y is the dependent Examples – SSN -> SSN (trivial dependency) – PNUMBER -> PNAME – SSN -> ENAME – SSN, PNUMBER -> HOURS

Functional Dependency Between sets of attributes, not just single attributes Holds for all time, not just for a particular instance (snapshot) of a relation Formally states constraints that exist for the relation – These constraints are in addition to those imposed by primary keys and foreign keys

Functional dependencies and keys If X functionally determines all attributes of R, then X is a super key If X is irreducible, i.e. every member of X is essential for the functional dependencies to hold, then X is a candidate key. Attributes that are a part of a candidate key are key attributes

Examples Super key: – SSN, PNUMBER, PNAME -> SSN, PNUMBER, HOURS, ENAME, PNAME, PLOC Candidate key: – SSN, PNUMBER -> SSN, PNUMBER, HOURS, ENAME, PNAME, PLOC SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW

Redundancy If in a relation R, A -> B and A is not a candidate key for R, then R will involve some redundancy. SSNPNUMBERHOURSENAMEPNAMEPLOC Intuitively, all functional dependencies in a relation should involve candidate keys to eliminate redundancy

Normalization A process that utilizes functional dependencies to identify relation schemas that have an undesirable form (redundancy) and decomposes them into smaller schema in which the redundancy has been eliminated.

Decomposition Decomposition should be – Lossless join Allow exact recovery of the original schema (without spurious tuples) – Dependency preserving Allow dependencies to be checked without requiring a join

Lossy decomposition SSNPNUMBERHOURSENAME E1P120Joe E1P220Joe E2P140Joe ENAMEPNAMEPLOC JoeCIS RoofUNCW JoeRestaurantMayfaire JoeCIS RoofUNCW

Natural join to recover original SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW E2P140JoeRestaurantMayfaire

Heath’s Theorem If relation R = {A,B,C} where A,B,C are attribute sets and A -> B then R 1 = {A, B} and R 2 = {A, C} represents a lossless decomposition

Levels of normalization First normal form – 1NF Second normal form – 2NF Third normal form – 3NF Boyce-Codd Normal Form - BCNF Increasingly stringent requirements

Normal Forms 1NF 2NF 3NF BCNF

First normal form Relation is in 1NF if all attribute values are atomic (By definition, all relations are in 1NF) D_NAMED_NUMMGR_SSND_LOCATIONS RESEARCH5334619276{Lumberton, Red Springs, Raeford} Assume that a department can have multiple locations, like {Lumberton, Red Springs, Raeford} Relation not in 1NF

Resolution? D_NAMED_NUMMGR_SSND_LOCATIONS RESEARCH5334619276Lumberton RESEARCH5334619276Red Springs RESEARCH5334619276Raeford

Decomposition D_NAMED_NUMMGR_SSND_LOCATIONS D_NAMED_NUMMGR_SSND_NUMD_LOCATIONS

Second Normal Form: 2NF A relation is in 2NF if – It is in 1NF, and – If the non-key attributes are fully (irreducibly) dependent on the primary key

Example: EMP_PROJ SSNPNUMBERHOURSENAMEPNAMEPLOC Functional Dependencies? SSN -> ENAME PNUMBER -> PNAME, PLOC {SSN, PNUMBER} -> HOURS Relation not in 2NF Non-key attributes ENAME, and PLOC and PNAME, are not fully dependent on the primary key

Solution? Decompose SSNPNUMBERENAMEPNAMEPLOC SSNPNUMBERHOURS

Decompose further… SSNPNUMBERPNAMEPLOC SSNENAME

And a little more… SSNPNUMBER 3b is a part of 1a, so drop it. PNUMBERPNAMEPLOC

2NF Normalization SSNPNUMBERHOURS SSNENAMEPNUMBERPNAMEPLOC

More than one way to get here SSNPNUMBERHOURSENAMEPNAMEPLOC PNUMBERPNAMEPLOC SSNPNUMBERHOURSENAME

And a little bit more SSNPNUMBER SSNENAME

3NF Normalization A relation is in 3NF if – It is in 2NF, and – If the non-key attributes are mutually independent. That is, no functional dependencies exist between non-key attributes.

Example: EMP_DEPT Functional Dependencies? SSN -> {ENAME, DOB, ADDRESS, DNUM} DNUM -> {DNAME, DMGRSSN} Redundancy? Relation in 1NF ? 2NF ? 3NF ? SSNENAMEDOBADDRESSDNUMDNAMEDMGRSSN

BCNF Normalization S# and SNAME – Supplier# and Supplier Name are unique FDs – S# -> SNAME – SNAME -> S# – S#,P# -> QTY – SNAME, P# -> QTY Candidate keys – S#, P# and SNAME, P# S#SNAMEP#QTY S1Acme SupplyP1100 S2Gem MfgP1200 S1Acme SupplyP2400

BCNF Normalization Redundancy? 1NF? 2NF? 3NF? S#SNAMEP#QTY S1Acme SupplyP1100 S2Gem MfgP1200 S1Acme SupplyP2400

BCNF Relation is in BCNF if and only if the only determinants are candidate keys FDs – S# -> SNAME – SNAME -> S# – S#,P# -> QTY – SNAME, P# -> QTY

BCNF Normalization S#P#QTY S1P1100 S2P1200 S1P2400 S#SNAME S1Acme Supply S2Gem Mfg S1Acme Supply Two candidate keys: S# SNAME

Download ppt "Normalization Sridhar Narayan SSNPNUMBERHOURSENAMEPNAMEPLOC E1P120JoeCIS RoofUNCW E1P220JoeRestaurantMayfaire E2P140JoeCIS RoofUNCW."

Similar presentations