Presentation on theme: "Normalization of Database"— Presentation transcript:
1 Normalization of Database Yong ChoiSchool of BusinessCSUB
2 Study ObjectivesUnderstand what normalization is and what role it plays in database designLearn about the normal forms 1NF, 2NF, 3NF, BCNF, and 4NFIdentify how normal forms can be transformed from lower normal forms to higher normal formsUnderstand normalization and E-R modeling are used concurrently to produce a good database designUnderstand some situations require denormalization to generate information efficiently
3 Database Normalization Well-Structured Relations (Normalization goal)A relation that contains minimal data redundancy and allows users to insert, delete, and update rows without causing data anomalies (inconsistencies).Technical definitionNormalization is a formal process of eliminating redundancies and decomposing relations with anomalies to produce smaller, well-structured relations.
4 Type of Anomalies Update (Modification) Anomaly Deletion Anomaly Changing data in a row forces changes to other rows because of duplicationDeletion AnomalyDeleting rows may cause a loss of data that would be needed for other future rowsInsertion AnomalyAdding new rows forces user to create duplicate data
5 Redundant Data Consider the following table that stores data about auto parts and suppliers. This seemingly harmless table contains many potential problems.Part#DescriptionSupplierAddressCityState100CoilDynar45 Eastern Ave.DenverCO101MufflerGlassCo1638 S. FrontSeattleWA102Wheel CoverA1 Auto7441 E. 4thDetroitMIStreet103BatteryDynar45 Eastern Ave.DenverCO104RadiatorUnited346 Taylor DriveAustinTXParts105ManifoldGlassCo1638 S. FrontSeattleWA106ConverterGlassCo1638 S. FrontSeattleWASuppose you want to add another part? 107 Tail Pipe GlassCo S. Front Seattle WA
6 Update Anomaly What if GlassCo moves to Olympia Update Anomaly What if GlassCo moves to Olympia? How many rows have to be changed in order to ensure that the new address is recorded.
7 Deletion Anomaly Suppose you no longer carries part number 102 and decide to delete that row from the table?
8 Now, looking at the remaining data below, what is the address of A1 Auto? Must the supplier (A1 Auto) address be deleted as well?
9 Insertion Anomaly Next, you want to add a new supplier – “CarParts Insertion Anomaly Next, you want to add a new supplier – “CarParts.” But you have not yet ordered parts from that supplier. What do you add?
10 Functional Dependencies Normalization is based on the analysis of functional dependencies.Functional Dependency: The value of one attribute determines the value of another attributeA B when value of A (of a valid instance) defines the value of B (B is functionally dependent upon A).SSN defines Name, Address (not vice versa)A is the determinant in a functional dependency
12 First Normal Form (1NF) To be in First Normal Form (1NF), Each column must contain only a single value (e.g., address)Repeating groups of records (redundancy) must be eliminatedEliminate duplicative columns from the same table.There must be no multi-valued attributes.Transformation from model to relation
15 Another 1NF Example Cust_ID L_Name F_Name Address 104 Suchecki Ray PKCust_IDL_NameF_NameAddress104SucheckiRay123 Pond Hill Road, Detroit, MI, 48161PKCust_IDSalesRep_NameRep_OfficeOrder_1Order_2Order_31022Jones412101419
16 Second Normal FormIn order to be in 2NF, a relation must be in 1NF and a relation must not have any partial dependencies.Any attributes must not be dependent on a portion of primary key.The other way to understand 2NF is that each non-key attribute (not a part of PK) in the relation must be functionally dependent upon the primary key.
19 Third Normal FormIn order to be in Third Normal Form, a relation must first fulfill the requirements to be in 2NF. Additionally, all attributes that are not dependent upon the primary key must be eliminated. In other words, there should be no transitive dependencies.remove columns that are not dependent upon the primary key.
22 Transitive dependency All attributes are functionally dependent on Cust_ID.Cust_ID Name, SalespersonHowever, there is a transitive dependency.Region is functionally dependent on Salesperson.Salesperson Region
23 Problems with Transitive dependency A new sales person (Yong) assigned to the North region cannot be entered until a customer has been assigned to that salesperson (since a value for Cust_ID must be provided to insert a row in the relation).If customer number 6837 is deleted from the table, we lose the information that salesperson Hernandez is assigned top the Easy region.If sales person Smith is reassigned to the East region, several rows must be changed to reflect that fact.
27 Boyce-Codd Normal Form (BCNF) Special case of 3NF.A relation is in BCNF if it’s in 3NF and there is no hidden dependencies.Below is in 3NF but not in BCNF
28 BCNF Student Advisor is functionally dependent on Major. Don’t confuse with Transitive Dependency!StudentStu_IDAdvisorMajorGPA123NasaPhysics4.0ElvisMusic3.3456KingLiterature3.2789Jackson3.76783.5Advisor is functionally dependent on Major.
29 BCNF Don’t confuse with Transitive Dependency! Advisor is functionally dependent on Major.Stu_ID, Advisor major, GPAMajor AdvisorDon’t confuse with Transitive Dependency!
30 BCNFIn Physics the advisor Nasa is replaced by Einstein. This change must be made in two ( or more) rows in the table.If we want to insert a row with the information that Choi advises in MIS. This cannot be done until at least one student majoring in MIS is assigned Choi as an advisor.If student number 789 withdraw from school, we lose the information that Jackson advises in Music.
31 Conversion to BCNF Student Advisor Stu_ID Advisor GPA 123 Nasa 4.0 FKStu_IDAdvisorGPA123Nasa4.0Elvis3.3456King3.2789Jackson3.76783.5AdvisorMajorNasaPhysicsElvisMusicKingLiteratureJackson
33 3NF and BCNFIn practice, most relation schemas that are in 3NF are also in BCNF. Only if a hidden dependency X -> A exists in a relation.In general, it is best to have relation schemas in BCNF. If that is not possible, 3NF will do. However, 2NF and 1NF are not considered good relation schema designs.
34 Normalization and Database Design Normalization should be part of the design processUnnormalized:Data updates less efficientIndexing more cumbersomeE-R Diagram provides macro viewNormalization provides micro view of entitiesFocuses on characteristics of specific entitiesMay yield additional entitiesGenerally, most database designers do not attempt to implement anything higher than Third Normal Form or Boyce-Codd Normal Form.
35 DenormalizationDenormalization is a technique to move from higher to lower normal forms of database modeling in order to speed up database access.Database optimization is mostly a question of time versus space tradeoffs. Normalized logical data models are optimized for minimum redundancy and avoidance of update anomalies. They are not optimized for minimum access time. Time does not play a role in the denormalization process. A 3NF or higher normalized data model can be accessed with minimum complex code if the domain reflects the relational calculus and the logical data model based on it. Normalized data models are usually better to understand than data models that reflect considerations of physical optimizations.