Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part 6 Chapter 15 Normalization of Relational Database Csci455 r 1.

Similar presentations


Presentation on theme: "Part 6 Chapter 15 Normalization of Relational Database Csci455 r 1."— Presentation transcript:

1 Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1

2 Design Methodologies Goodness of design functional dependencies The normalization process and normal forms –First, second, third, BCNF Pros and cons of normalization 2 Objectives

3 Database system can be designed via –Bottom-up (design by synthesis) –Top-Down (design by analysis) 3 Design Methodology

4 Starts with the basic relationships between pair of attributes Uses these information to construct the relations not scalable and practical 4 Bottom-up design

5 The design process – Starts with one relation (set of all attributes) –Decomposes it into groups Use ER to model the conceptual schema Existing design knowledge or experiences –Maps each entity into table schema –Analyzes each table schema for goodness possible refinement and/or decomposition 5 Top-down design

6 Informal design metrics  Semantics of the related attributes  Reducing the redundant values in tuples  Minimizing the NULL values  Disallowing spurious tuples 6 Informal Design Guidelines for Relational Schemas

7 Based on the semantics of attributes or how the attributes values in a tuple relate to one another –A schema should capture facts about one entity or one relationship type 7 Semantics of the Relation Attributes

8 8

9 9 Fig10-2

10 Design a relation schema so that it is easy to explain its meaning –do not combine attributes from multiple entity types and relationship types into a single relation 10 Guideline 1

11 11 Fig10-3 Considered as POOR designs! Why?

12 The important objective of schema design –to minimize the storage space and effort –to minimize problems resulted from updates Example –Compare relations in Fig15.2 with those in Fig.15.4 12 Redundant Information in Tuples and Update Anomalies

13 13 Fig10-2

14 14 Fig10-4

15 Update Anomalies –Insertion anomalies –deletion anomalies –Modification anomalies 15 Update Anomalies

16 Insertion Anomalies Consistency: –E.g., insert a new employee »need to insert ALL attributes for Department, »or insert NULL if employee does not work Null values: –E.g., insert a new department, with no employee » violation of Entity integrity because ssn cannot be NULL e.g., EMP_DEP fig 15.4 16 Insertion Anomalies

17 17 Fig10-4

18 Deletion Anomalies –Loss of Information E.g., –delete the very last employee who works for dnum=1 from EMP_DEPT 18 Deletion Anomalies

19 19 Fig10-4

20 Modification Anomalies –Change one, change all E.g., change dept. Mgr or dept. number 20 Modification Anomalies

21 21 Fig10-4

22 Design anomaly-free base relation schemas –How? use formal approaches to validate design against these guidelines 22 Guideline 2

23 Results in a set of attributes that do not apply to all tuples –E.g., Student Phone number Not every student has a cell phone or work phone Guideline 3 –Stay a way from attributes with NULL values in the base table Waste storage, difficulties to understand, aggregate functions, and operations involving comparisons (e.g. join operation) 23 Null Values in Tuples

24 Refers to the undesirable decomposition of a relation –E.g., EMP_LOC and EMP_PROJ1 24 Generation of Spurious (or invalid) Tuples

25 25 Fig10-5

26 26 Fig10-6 ENAME

27 Design relation schema so that they can be JOINED with equality conditions on attributes that are either PKs or FKs 27 Guideline 4

28 Summary and discussion of design guidelines The problems discussed can be avoided using the following guidelines 1.Anomalies that cause redundant work to be done during insertion, deletion, and modifications 2.Waste of storage space due to NULL 3.Generations of invalid and spurious data during Join on base relations using non-key attributes 28

29 Refers to a requirement between two sets of attributes: X and Y such that –For two tuples t1, and t2 in r(R) if t1[X]=t2[X]  t1[Y] =t2[Y] Used to define normal forms 29 Functional Dependencies

30 Represented by X  Y –X functionally determines Y –or, Y functionally depends on X –if for each X value, we have ONLY one Y value, then X is Candidate Key (CK) Note: FD is the property of the semantics or meaning of attributes Legal relation states (legal extensions) of R 30 Functional Dependencies (FD): Formal definition

31 The notion of dependency has to do with a schema-based dependency –It is a semantic notation –FD is part of the process of understanding what the data means 31 Properties of functional dependencies (FDs)

32 32 Fig10-3 (b) EMP_PROJ SSN  ENAME PNUMBER  {PNAME, PLOCATION} {SSN, PNUMBER}  HOURS

33 Legal extensions (or legal relation): –Refers to the extensions r(R) that satisfy the functional dependency constraint A FD is a property of the relation schema not the relation extension 33 Important Notes on FDs

34 34 Fig10-7 FD1: TEXT  COURSE ? Yes or no FD2: TEACHER  COURSE? No FD3: COURSE  TEACHER? No

35 Normalization theory: –builds around the concept of normal forms –used in the design process a relation is in a particularly normal form if it satisfies a specified set of requirements –E.g., 1NF (i.e., all underlying domains MUST have atomic values) 35 Normalization

36 Type of Normal Forms –1NF –2NF –3NF –BCNF –4NF –5NF (PJ/NF) –DKNF (absolute normal form) 36 Normal Form Normal Form

37 37 Relationships of Normal Forms

38 1NF prevents –multi-valued attributes, –composite attributes –combinations of the above See fig 15.8 See fig 15.9 –nested relation or multivalued composite attributes 38 First Normal Form (1NF)

39 39 Fig10-8

40 40 Fig10-9

41 Based on the concepts of full functional dependency Analogy to the traditional justice oath: –Every non-key attribute depends on a key, the whole key, and nothing but the key R is in 2NF iff –R is in 1NF –Every non-key attribute is fully depend on the PK 41 Second Normal Form (2NF)

42 Normalization into 2NF, and 3NF 42

43 43 Fig10-10

44 Based on the concepts of transitive dependency Relation R is in 3NF iff –R is in 2NF –Every non-key attribute is non-transitively dependents on the PK 44 Third Normal Form

45 45 Fig10-10

46 Formal Definition –R is in 3NF if, whenever a functional dependency X  Y exists then X is super key Y is prime attribute e.g., –LOTS2 in fig.15.12.b is 3NF –LOTS1 in fig.15.12.b (FD4) is NOT 3NF 46 Interpretation of 3NF

47 47

48 48

49 49

50 50

51 Alternative definition of 3NF A relation schema R is in 3NF if every non- prime attribute of R satisfies the following conditions: –Non-primed attribute fully functionally depends on every Key of R –Non-primed attribute is non-transitively depend on every key of R 51

52 Boyce-Codd normal form –A more restricter formal form than 3NF If R is BCNF then R is also in 3NF R in 3NF does not mean R is BCNF –Attempts to eliminate more redundancy not detectable by 3NF 52 Boyce/Codd NF

53 Example Suppose we have thousands of lots in the relation but the lots are from only two counties – DeKalb and Fulton Let say lot sizes in –The Dekalb are 0.5.,…,1.0 acres –The Fulton are 1.1, 1.2, …1.9,2.0 acres Also assume that –FD5: Area  County_Name 53

54 54 Fig10-12

55 A relation R is in Boyce/Codd normal form (BCNF) iff –Every determinant is a CK (i.e., each attribute MUST describe the key, the whole key, and nothing but the key) Ensures no redundancy (GOOD) Considered the most desirable NF 55 Boyce/Codd NF (Cont’)

56 Consider a relation TEACH with –FD1: {Student, Course}  Instructor –FD2: Instructor  Course The relation is 3NF Is it in BCNF? No 56 Example Candidate key

57 BCNF Example Semantics A student can take more than one course But a student has a different instructor for each course. Each instructor (non-key) teaches only one course (partial key). 57

58 58 Fig10-13

59 Possible decompositions are 1.{Student, Instructor} and {Student, Course} 2.{Course, Instructor} and {Course, Student} 3.{Instructor, Course} and {Instructor, Student} Which of the decomposition is better? Justify it. 59 More on Example

60 Instructor-course Table InstructorCourse MarkDatabase NavatheDatabase SchulmanTheory AhmandOS OmiecinskiDatabase AmmarOS 60

61 Instructor-student Table InstructorStudent MarkNarayan MarkWallace NavatheSmith NavatheZelaya AmmarSmith AmmarNarayan SchulmanSmith AhmandWallace OMIECINSKIwWong 61

62 Decomposition: Pros and cons –Makes answering the complex queries less efficient (BAD) because additional joins must be performed during query (BAD) –May increase storage requirements if the degree of redundancy is very low (BAD) –May decrease storage requirements if the degree of redundancy is very high (Good) –Makes simple update transaction more efficient (GOOD) 62 To decompose or Not to decompose?

63 Multivalued Dependency Fourth Normal Form We discussed the concept of functional dependency (FD) Other constraints that cannot be specified as functional dependencies is – multivalued dependency (MVD) and define fourth normal form, which is based on this dependency It is a direct consequence of first normal form (1NF) which disallows an attribute in a tuple to have a set of values Happens when have two or more multivalued independent attributes in the same relation schema –i.e., having a relation consists of multiple 1:Ns 63

64 Multivalued dependency(MVD) X  Y on R, –where X  Y  R, and Z = (R – (X  Y)) specifies the following conditions on r(R): t3[X]= t4[X]= t1[X]= t2[X] t3[Y]=t1[Y] and t4[Y] = t2[Y] t3[Z]=t2[Z] and t4[Z] = t1[Z] 4NF typically involves eliminating MVDs by repeated binary decompositions as well. 64 Formal Definition of Multivalued Dependency

65 65

66 Join Dependencies (JD) Fifth Normal Form (Project-Join) Join dependency –constraint on the set of legal relations over a database scheme. –A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T –Join operation must satisfy the lossless (or nonadditive) join property A very specific semantic constraint and very difficult to detect in practice –there is no sound and complete axiomatization for join dependencies 66

67 Example (JD) Suppose that the following additional constraint always holds: –Whenever a supplier s supplies part p, –and a project j uses part p, –and the supplier s supplies at least one part p i to project j, –Then supplier s will also be supplying part p to project j. 67

68 68

69 Quiz: March 10, 2015 69


Download ppt "Part 6 Chapter 15 Normalization of Relational Database Csci455 r 1."

Similar presentations


Ads by Google