Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1.

Similar presentations


Presentation on theme: "Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1."— Presentation transcript:

1 Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1

2 Objectives Purpose of normalization. Problems associated with redundant data. Identification of various types of update anomalies such as insertion, deletion, and modification anomalies. How to recognize appropriateness or quality of the design of relations. Use of functional dependencies to group attributes. The process of normalization. Normal forms: 1NF, 2NF, 3NF Boyce–Codd (BCNF) normal form. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 2

3 Database Design What is relational database design? – The grouping of attributes to form "good" relation schemas. Two levels of relation schemas – The logical "user view" level – The storage "base relation" level Design is concerned mainly with base relations. What are the criteria for "good" base relations? D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 3

4 Data Redundancy D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 4

5 Database Design Informal guidelines for good relational design. Formal concepts of functional dependencies and normal forms: – 1NF (First Normal Form); – 2NF (Second Normal Form); – 3NF (Third Normal Form); – BCNF (Boyce-Codd Normal Form). Normalization vs. de-normalization. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 5

6 Database Design: Guideline 1 GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes). – Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation. – Only foreign keys should be used to refer to other entities. – Entity and relationship attributes should be kept apart as much as possible. Bottom Line: Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 6

7 Database Design: Guideline 1 D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 7 Employee-Department DMGRSSNDNameDNumAddressBdateSSNEnamePLocationPnameEnameHoursPNumberSSN Employee-Project

8 Database Design: Anomalies (1) Redundant Information in Tuples and Update Anomalies Mixing attributes of multiple entities may cause problems. Information is stored redundantly wasting storage. Problems with update anomalies: – Insertion anomalies; – Deletion anomalies; – Modification anomalies. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 8

9 Database Design: Anomalies (2) D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 9

10 Database Design: Anomalies (3) Consider the relation: EMP_PROJ ( SSN, PNumber, Hours, Ename, Pname,..) Update Anomaly: Changing the name of project number P1 from “ProjectX” to “ProjectM” may cause this update to be made for all employees working on the project P1. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 10 PLocationPnameEnameHoursPNumberSSN EMP_PROJ

11 Database Design: Anomalies (4) Insert Anomaly: Cannot insert a project unless an employee is assigned to. Inversely - Cannot insert an employee unless an he/she is assigned to a project. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 11 PLocationPnameEnameHoursPNumberSSN EMP_PROJ

12 Database Design: Anomalies (5) Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 12 PLocationPnameEnameHoursPNumberSSN EMP_PROJ

13 Database Design: Anomalies (6) D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 13 EMP_DEPT DMGRSSNDNameDNumAddressBdateSSNEname

14 Database Design: Guideline 2 GUIDELINE 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any present, then note them so that applications can be made to take them into account. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 14

15 Database Design: Guideline 3 GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible. Attributes that are NULL frequently could be placed in separate relations (with the primary key). Reasons for nulls: – attribute not applicable or invalid; – attribute value unknown (may exist); – value known to exist, but unavailable. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 15

16 Database Design: Guideline 4 Bad designs for a relational database may result in erroneous results for certain JOIN operations. The "lossless join" property is used to guarantee meaningful results for join operations. GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural- join of any relations. There are two important properties of decompositions: – non-additive or losslessness of the corresponding join; – preservation of the functional dependencies. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 16

17 Database Design: Guideline 4 Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations. Must be achieved at any cost. Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations. See 15.4, 15.5 and 15.6 from textbook. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 17

18 Functional Dependences (1) Functional Dependency is a property of the meaning (or semantics) of the attributes in a relation: – Describes relationship between attributes in a relation. – If A and B are attributes of relation R, B is functionally dependent on A (denoted A  B), if each value of A in R is associated with exactly one value of B in R. Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 18

19 Functional Dependences (2) D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 19

20 Functional Dependences (3) Examples of FD constraints: social security number determines employee name: SSN  ENAME project number determines project name and location: PNUMBER  {PNAME, PLOCATION} employee ssn and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER}  HOURS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 20

21 Functional Dependences (4) 1. X  Y holds if whenever two tuples have the same value for X, they must have the same value for Y. 2. For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y] 3. X  Y in R specifies a constraint on all relation instances r(R). 4. Written as X  Y; can be displayed graphically on a relation schema as in Figures ( denoted by an arrow). 5. FDs are derived from the real-world constraints on the attributes. 6. An FD is a property of the attributes in the schema R. 7. The constraint must hold on every relation instance r(R). 8. If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]). D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 21

22 The Process of Normalization Formal technique for analyzing a relation based on its primary key and functional dependencies between its attributes. Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties. As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 22

23 Relationship Between Normal Forms D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 23

24 Normalization: 1NF Definitions: A relation is in 1NF if all attribute values are atomic. Atomic: cannot be further broken down: NO multivalued attributes; NO nested attributes. Cure for Non-1NF Relations. – Multivalued attribute is eliminated by projection: {DNUM, DNAME, MSSN, DLOC}  {DNUM, DNAME, MSSN}, {DNUM, DLOC} – Nested attribute is eliminated by flattening: ENAME(FNAME MI LNAME) {FNAME, MI, LNAME} and ENAME is discarded. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 24

25 Normalization: 2NF Definitions: Prime attribute - attribute that is member of the primary key K. Non-prime attribute – attribute that is not a member of the PK. Full functional dependency - a FD Y  Z where removal of any attribute from Y means the FD does not hold any more. Examples: 1){SSN, PNUMBER}  HOURS is a full FD since neither SSN  HOURS nor PNUMBER  HOURS hold 2){SSN, PNUMBER}  ENAME is not a full FD (it is called a partial dependency ) since SSN  ENAME also holds D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 25

26 Normalization: 2NF A relation R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 26 PLocationPNameEnameHoursPNumberSSN Employee-Project SSNPNumberHours SSNEName PNumberPNamePLocation fd1 fd2 fd3

27 Normalization: 3NF Definition: Transitive functional dependency - a FD X  Z that can be derived from two FDs X  Y and Y  Z Examples: 1.SSN  DMGRSSN is a transitive FD since SSN  DNUMBER and DNUMBER  DMGRSSN hold 2.SSN  ENAME is non-transitive since there is no set of attributes X where SSN  X and X  ENAME D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 27

28 Normalization: 3NF A relation R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key. NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency. E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 28

29 Normalization: 3NF D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 29 Employee-Department DMGRSSNDNameDNumAddressBdateSSNEname DNumAddressBDateSSNEnameDMGRSSNDNameDNum SSN  Dnum  Dname/DMGRSSN

30 Normalization: 3NF Converting from 2NF to 3NF: – Identify the primary key in the 2NF relation. – Identify functional dependencies in the relation. – If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 30

31 General Definitions of 2NF and 3NF Second normal form (2NF) A relation R that is in 1NF is in 2NF if every non- primary-key attribute A is not partially functionally dependent on any candidate key in R. Third normal form (3NF) A relation R that is in 2NF is in 3NF if no non- primary-key attribute A is transitively dependent on any candidate key in R. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 31

32 Normalization: BCNF A relation R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X  A holds in R, then X is a superkey of R. Notes: Each normal form is strictly stronger than the previous one: –Every 2NF relation is in 1NF; –Every 3NF relation is in 2NF; –Every BCNF relation is in 3NF. There exist relations that are in 3NF but not in BCNF. The goal is to have each relation in BCNF (or 3NF). D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 32

33 BCNF A relation R is in BCNF if a functional dependency X  A holds in R, then X is a superkey of R. Based on functional dependency that takes into account all candidate keys in a relation. For a relation with only one candidate key, 3NF and BCNF are equivalent. A relation is in BCNF, if and only if every determinant is a candidate key. Violation of BCNF may occur in a relation that: – Contains 2 (or more) composite keys; – Which candidate keys overlap and share at least 1 attribute. D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 33

34 Example 1 D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 34 Atributes:Description Eqpt_number Order number of a piece of the equipment from given type. Location Place, where the piece of the equipment operates Oprtr_ID_1 Identifier of the first shift operator (person operates on that equipment) Oprtr_Name_1 Name of the first shift operator Oprtr_Phone_1 Phone number of the first shift operator Oprtr_ID_2 Identifier of the second shift operator (person operates on that equipment) Oprtr_Name_2 Name of the second shift operator Oprtr_Phone_2 Phone number of the second shift operator Oprtr_ID_3 Identifier of the third shift operator (person operates on that equipment) Oprtr_Name_3 Name of the third shift operator Oprtr_Phone_3 Phone number of the third shift operator Eqpt_Type Type of the equipment Producer Producer of this type of equipment Inst_Date Date of installation of the piece of the equipment Cons_Power Consumed electricity of this type of the equipment Maint_Time Required period of maintenance for this type of equipment Maint_Lst_Date Date of the last maintenance Maint_ID Identifier of the person responsible for the last maintenance Maint_Name Name of the person responsible for the last maintenance Maint_Phone Phone of the person responsible for the last maintenance Maint_Act_1 Description of the first activity in last maintenance Maint_Act_2 Description of the second activity in last maintenance Maint_Act_3 Description of the third activity in last maintenance “Equipment” holds data about the usage and maintenance activities of equipment in an enterprise Notes: Eqpt_Number, Eqpt_Type, Location, and Maint_Lst_Date composed the primary key - identify any piece of equipment and last maintenance activities. Producer, Cons_Power, Maint_Time are specific for an equipment type, identified by Eqpt_Type. Operators (from any shift) operate on a particular piece of equipment and are identified by Oprts_ID. The relation keeps track to their names and phones. For any piece of equipment, maintenance is done periodically or when fails. Maintenance activities are predefined list for a given Type of Equipment. For maintenance of a particular piece of equipment up to three activities are listed. Relation holds information about the person, who performed the maintenance activities, identified by Maint_ID, their names and phones.

35 Example 2 Chains’ supplier D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 35 AttributeDescription CodeIdentifier of the product NameSupplier’s name of the product PriceSupplier’s regular price DealerIdentifier of the Dealer DNameDealer’s name DCodeDealer’s identifier of the product DpriceDealer’s price of the product DPhoneDealer’s phone DPNameDealer’s name of the product DQuantityQuantity of the product ordered by the Dealer DateDate of the order

36 Example 3 Home delivery Pizzas D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 36 AttributeDescription Order_idIdentifier of the ordered (automatic) Pizza_idIdentifier of the products QuantityQuantity of the product, purchased by the customer Ingredient_idIdentifier of additional ingredient ordered (one or more) DescriptionDescription of the ingredient UnitMeasurement unit used for ingredient AmountQuantity per unit: quantity of ingredient used in a product unit I_pricePrice of the ingredients NameName of the customer AddressAddress of the customer PricePrice of the product

37 Q & A Attention ! Next class Quiz 3: Normal Forms D. Christozov / G.Tuparov INF 280 Database Systems: Relational Model 37


Download ppt "Database Design FUNCTIONAL DEPENDENCES NORMAL FORMS D. Christozov / G.Tuparov INF 280 Database Systems: DB design: Normal Forms 1."

Similar presentations


Ads by Google