Presentation on theme: "Normalization Dr. Mario Guimaraes. Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints."— Presentation transcript:
Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of data. The process of decomposing relations with anomalies to produce smaller, well-structured relations. Primary Objective: Reduce Redundancy,Reduce nulls, Improve “modify” activities: –insert, –update, –delete, –but not read Price: degraded query, display, reporting
Functional Dependency and Keys Functional Dependency: The value of one attribute (the determinant) determines the value of another attribute. Candidate Key: Each non-key field is functionally dependent on every candidate key.
Normalization – most used Four most commonly used normal forms are first (1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF). Based on functional dependencies among the attributes of a relation. A relation can be normalized to a specific form to prevent possible occurrence of update anomalies.
First Normal Form No multi-valued attributes. Every attribute value is atomic. Why are the following tables not in 1NF Employee (ssn, Name, Salary, Address, ListOfSkills) Department (Did, Dname, ssn)
Second Normal Form 1NF and every non-key attribute is fully functionally dependent on the primary key. Every non-key attribute must be defined by the entire key, not by only part of the key. No partial functional dependencies. Assuming that we have a composite PK (LicensePlate, OwnerSSN) for the Vechicle Table below, why is the table not in 2NF ? Vehicle (LicensePlate, Brand, Model, PurchasePrice, Year, OwnerSSN, OwnerName)
Third Normal Form & BCNF 2NF and no transitive dependencies (or no functional dependency between non-key attributes = BCNF) Why are the following tables not in 3NF or BCNF ? Why is Employee [ssn, name, salary, did, dname] Customer
3NF & BCNF It is very rare for a Table to be in 3NF and not be in BCNF (violation of BCNF). Given a Relation R with attributes A, B and C where A and B are together the composite PK, IF A, B -> C and C -> B THEN R is in 3NF and is not in BCNF Example: Student, course -> Instructor Instructor -> Course
Steps in Normalization 1NF: a table, without multivalued attributes –if not, then decompose 2NF: 1NF and every non-key attribute is fully functionally dependent on the primary key –if not, then decompose 3NF: 2NF and no transitive dependencies –if not, then decompose GENERAL: –Each table should describe a single theme –Modification anomalies are minimized Hint: THE KEY, THE WHOLE KEY AND NOTHING BUT THE KEY
Normalized Tables Must: Two important properties of decomposition: - Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations. - Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.
4NF Dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, set of values for B and C are independent of each other.
Matching 1) A Weak Entity _____a) skills 2) A Derived Attribute _____b) ssn 3) A Composite Attribute _____c) Monthly Sales 4) A Multi-Valued Attribute_____d) full name 5) Primary Key_______e) Section of a Course
Which tables are in 3NF ? Employee (ssn, Name, Salary, pid) (obs.: employee can only work in one project) Employee (ssn, Name, Salary, pid) (obs.: employee can work in many projects) Vehicle (OwnerSSN, OwnerName, OwnerAddress, LicensePlate, Brand, Model, PurchasePrice, Year) (obs.: an owner can have several cars) Project (pid, pname, did) (pid determines did) Project (pid, pname, ssn) (ssn determines pid)