Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Normalization David J. Stucki. Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2.

Similar presentations


Presentation on theme: "1 Normalization David J. Stucki. Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2."— Presentation transcript:

1 1 Normalization David J. Stucki

2 Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2

3 What makes a “good” model? How do we judge the quality of a relational model?  What makes one model of the data better than another model?  Implicit goals: Information preservation  Maintain all concepts described by our higher level models (ER) Minimum redundancy  Keep the amount of redundant storage of the same information to a minimum – fewer copies of things the better 3

4 Informal Design Guidelines (1) What entity does this relation describe?  Hard to put into words Not “Parts” – same part shows up multiple times Not “Suppliers” – same supplier shows up multiple times Not easy to describe this as a single entity – it’s showing a list of parts and the suppliers who supply those parts 4

5 Informal Design Guidelines (1) What entity does this relation describe?  Problem here is that the semantics of the relation are unclear Semantics – meaning behind the attribute values in the tuple  We’re better off if we design our relations to have clear semantics 5

6 Informal Design Guidelines (1) Guideline 1  Design relation schema so that it has an easy-to- explain meaning  Do not combine attributes from multiple entity types or relationship types into a single relation 6

7 Informal Design Guidelines (1) S#SNAMESTATUSCITYP#PNAMECOLORWEIGHTQTY 7 P#PNAMECOLORWEIGHTQTY S#SNAMESTATUSCITY Poor Design: Better Design: SUPPLIER-PARTS SUPPLIER PARTS

8 Informal Design Guidelines (1) SsnEnameBdateAddressDnumberDnameDmgr_ssn 8 Poor Design: Better Design: EMP_DEPT EMP DEPT SsnEnameBdateAddressDnumber DnameDmgr_ssn

9 Informal Design Guidelines (2) What about this redundant information?  Part Number shows up multiple times  Supplier city shows up multiple times  Supplier name shows up multiple times 9

10 Informal Design Guidelines (2) What about this redundant information?  Every repeated entry is more storage you are wasting More things in database also impacts performance  Retrieval time, Insertion time  Best designs waste the smallest amount of storage 10

11 Informal Design Guidelines (2) Redundant information can cause other problems too  “Update Anomalies”  Logical problems that stem from poor choices in relational representation 11

12 Insertion Anomalies (2) If we want to add a new part, we need to:  Have a supplier OR  Put NULL values in for the supplier  But S# is part of the primary key for this relation! Violates integrity constraints Must have both supplier and part before we can add either of them  Even if S# were not part of the primary key, would still be bad Need to have a Supplier for every Part – even though we’re talking about two different things 12

13 Deletion Anomalies (2) Suppose we stop needing Part P2 and delete it from the database  We’ve now deleted all of the information related to supplier S3  Not good – just because we’ve stopped using a part that doesn’t mean we’ve dropped them as supplier entirely 13

14 Modification Anomalies (2) Suppose Supplier S1 changes its name to “Consolidated Parts Inc.”  We now need to update all of the lines describing parts that come from supplier S1  If we miss any, we leave our relation in an inconsistent state 14

15 Informal Design Guidelines (2) Guideline 2  Design relation schemas so that no insertion, deletion or modification anomalies are present in the relations  This ends up being equivalent to designing relation schemas so that no redundant information exists in tuples 15

16 Informal Design Guidelines (3) What’s wrong here?  Every employee has an attribute for which department they manage  Most employees are not managers All of those employees have NULL values in the ManagesDno column These take up space for no good reason 16 EMP SsnEnameBdateAddressDnumberManagesDno 123…Bob………05NULL 345…Mary………05 789…Tom………05NULL 444…June………05NULL 076…Alice………05NULL

17 Informal Design Guidelines (3) Reasons to avoid attributes with many NULL values  Take up storage space for no good reason  Can make entities harder to understand More difficult to figure out how to JOIN properly  Different NULL values have different meanings Does not apply Unknown Known but absent 17

18 Informal Design Guidelines (3) Guideline 3  Design relation schemas to avoid attributes that will frequently have NULL values  If NULLs must be used, make sure they are the exceptional cases and not the typical value for the attribute for the majority of tuples 18

19 Informal Design Guidelines (4) Consider the above rethinking of the SUPPLIER schema into two separate schemas  Can we easily recover the relation SUPPLIER from SUPPLIER_INFO and SUPPLIER_LOCATION?  No – note that Name does not have to be unique in SUPPLIER or in SUPPLIER_INFO But it IS unique in combination with City in SUPPLIER_LOCATION This can lead to spurious tuples when we combine tables Two different companies both with the same name 19 SUPPLIER_INFO SUPPLIER_LOCATION S#NameStatus NameCity SUPPLIER S#NameStatusCity

20 Informal Design Guidelines (4) Two different companies with the same name Original SUPPLIER table shows one line for each Querying modified schema – spurious tuples  Why? Poor choice of match condition for the SUPPLIER_LOC table 20 SUPPLIER_INFOSUPPLIER_LOC S#NameStatus 01SmithY 02SmithN NameCity SmithColumbus SmithBoston SUPPLIER S#NameStatusCity 01SmithYColumbus 02SmithNBoston SUPPLIER_INFO * SUPPLIER_LOC S#NameStatusNameCity 01SmithY Columbus 01SmithY Boston 02SmithN Columbus 02SmithN Boston

21 Informal Design Guidelines (4) Guideline 4  Design relation schemas to join using equality conditions on appropriate attributes – primary key/foreign key  Don’t build relations that contain matching attributes that are not primary key/foreign key matches 21

22 Normalization What is normalization?  A process where we examine and revise our relation schemas based on their functional dependencies and primary keys Normalization attempts to improve the quality of our database by:  Minimizing redundancy in our stored data (relation sets)  Minimizing the number of update anomalies in our stored data  Provides database designers with: A formal framework for analyzing relation schemas A series of tests that allow us to have different degrees of normalization on our data, depending on our needs 22

23 Normalization Define: Normal Form  A criteria for determining how vulnerable a relation is to logical inconsistencies (update anomalies or redundancy)  Different levels of Normal Form 1 st normal form (1NF), 2 nd normal form (2NF), … Highest Normal Form (HNF) of a relation – the highest level of normal form criteria the relation meets 23

24 Normalization Normal forms by themselves do not guarantee good database design  No “magic formula” for making a good design  Designers must confirm additional properties in their designs: Lossless join property – guarantee that the spurious tuple generation previously discussed does not occur Dependency preservation – guarantee that if a dependency existed before altering the schema it still is represented in the altered schema 24

25 First Normal Form (1NF) 1NF is the most basic normal form  So basic that since it was defined it has become part of the definition of a relation in the relational model A relation is in 1NF if it:  has only atomic attributes AND  The value of any attribute must be a single value No multivalued attributes No composite attributes 25

26 First Normal Form (1NF) Here’s an example of a schema that is NOT 1NF  Note that this violates our definition of a relation – Dlocations is a set rather than an atomic value  To fix this we can do one of two things: Change Dlocations to be atomic and expand tuples with multiple locations into multiple tuples (adding redundant information) Break Dlocations out into its own relation, using Dnumber as a foreign key 26

27 First Normal Form (1NF)  To fix this we can do one of two things: 1. Change Dlocations to be atomic and expand tuples with multiple locations into multiple tuples (adding redundant information) 2. Break Dlocations out into its own relation, using Dnumber as a foreign key 27

28 Second Normal Form (2NF) 2NF is a stricter normal form than 1NF  A relation schema R is in 2NF if it is in 1NF and no nonprime attribute A in R is dependent on a subset of the primary key of R 28

29 Second Normal Form (2NF) Testing for 2NF  No test needed if the primary key has a single attribute Always 2NF if a relation is 1NF and has a primary key with a single attribute  If primary key is made of multiple attributes: Examine your primary key and nonprime attributes Can you remove an attribute from your primary key and still have a dependency with at least one nonprime attribute?  If so, relation is not 2NF 29

30 Second Normal Form (2NF) Consider the EMP_PROJ relation above  Primary key is SSN + Pnumber The pair uniquely determines Hours  Remove Pnumber from key Ssn → Ename  Unique Ssn determines what the Ename will be – no need for Pnumber at all  Remove Ssn from key Pnumber → { Pname, Plocation}  Pname and Plocation are both independent of the employee’s Ssn  EMP_PROJ is not 2NF 30

31 Second Normal Form (2NF) Consider the above relation schema  Iname – Instructor Name  Cname, Cnumber – Course name and number Is this schema in 2NF?  What are the dependencies? EmpId → Iname SecId → {Cname, Cnumber}  2NF? 31 EmpIdSecIdCnameCnumberIname INSTRUCTOR_SECTIONS

32 Second Normal Form (2NF) Consider the above relation schema  Dept – Department Name  Cname – Course Name Is this schema in 2NF?  What are the functional dependencies? {Dept, CourseNo} → Cname  2NF? 32 DeptCourseNoCname COURSES

33 Second Normal Form (2NF) EMP_PROJ had three FDs  {Ssn,Pnumber} → Hours  Ssn → Ename  Pnumber → {Pname, Plocation} Becomes three separate relations, one for each dependency 33

34 Second Normal Form (2NF) INSTRUCTOR_SECTIONS  EmpId → Iname  SecId → {Cname, Cnumber} Becomes two separate relations, one for each dependency  IS3 is our original primary key  Keeps relationship between Instructors and Sections, but nothing else 34 EmpIdSecIdCnameCnumberIname INSTRUCTOR_SECTIONS EmpIdIname IS1 SecIdCnameCnumber IS2 EmpIdSecId IS3

35 Third Normal Form (3NF) 3NF is even stricter than 2NF  A relation schema R is in 3NF if it is in 2NF and if no nonprime attribute of R is transitively dependent on the primary key 35

36 Third Normal Form (3NF) Testing for 3NF  First make sure the relation schema is in 2NF If it isn’t in 2NF, it can’t be in 3NF  Next determine if there are any nonkey attributes that are functionally determined by other nonkey attributes If so you have a transitive dependency and are not in 3NF 36

37 Third Normal Form (3NF) Consider EMP_DEPT above  Is it in 3NF?  First, is it in 2NF? 37

38 Third Normal Form (3NF) Consider EMP_DEPT above  Is it in 3NF?  First, is it in 2NF? Yes – only a single attribute in primary key – 2NF  Are there any transitive dependencies? 38

39 Third Normal Form (3NF) Consider EMP_DEPT above  Is it in 3NF?  First, is it in 2NF? Yes – only a single attribute in primary key – 2NF  Are there any transitive dependencies? Yes:  Dnumber → {Dname, Dmgr_ssn}  Ssn → Dnumber Not in 3NF 39

40 Third Normal Form (3NF) Consider COURSE_SECTIONS above SecId – section ID CourseId – special unique id for courses  Is it in 3NF? First, is it in 2NF?  Dependencies:  SecId → {CourseId, Cname, Cnumber, Iname}  CourseId → {Cname, Cnumber} Are there any transitive dependencies? 40 SecIdCourseIdCnameCnumberIname COURSE_SECTIONS

41 Third Normal Form (3NF) Consider the above relation schema  Dept – Department Name  Cname – Course Name Is this schema in 3NF?  Is it in 2NF? Dependencies: {Dept,CourseNo} → Cname  Are there any transitive dependencies? 41 DeptCourseNoCname COURSES

42 Third Normal Form (3NF) EMP_DEPT had one transitive dependency  Dnumber → {Dname, Dmgr_ssn}  Ssn → Dnumber  Break out Dnumber, Dname and Dmgr_ssn into its own relation schema  Keep the original relation schema with Ssn as primary key 42

43 Third Normal Form (3NF) COURSE_SECTIONS had one transitive dependency  CourseId → {Cname, Cnumber}  SecId → CourseId  Break out CourseId, Cname and Cnumber into its own relation schema  Keep the original relation schema with SecId as primary key 43 SecIdCourseIdCnameCnumberIname COURSE_SECTIONS SecIdCourseIdIname CS1 CourseIdCnameCnumber CS2

44 1NF, 2NF and 3NF based on Primary Keys Summary 44


Download ppt "1 Normalization David J. Stucki. Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2."

Similar presentations


Ads by Google