Presentation is loading. Please wait.

Presentation is loading. Please wait.

5. The Relational Model and Normalization 5.1 Relational Model5.2 Normalization 5.3 1NF to 5NF 5.4 Domain/Key Normalization 5.5 Synthesis of Relation 5.6.

Similar presentations


Presentation on theme: "5. The Relational Model and Normalization 5.1 Relational Model5.2 Normalization 5.3 1NF to 5NF 5.4 Domain/Key Normalization 5.5 Synthesis of Relation 5.6."— Presentation transcript:

1 5. The Relational Model and Normalization 5.1 Relational Model5.2 Normalization 5.3 1NF to 5NF 5.4 Domain/Key Normalization 5.5 Synthesis of Relation 5.6 Multi-value Dependencies, Iteration 2 5.7 Optimization

2 Relation: 2-dimensional table Row  tuple, Column  attribute Restriction on table to be a relation –cells of the table must be of single value –neither repeating groups nor arrays are allowed –all of the entries in any column must be of the same kind Ex.: if the third column in the first row of a table contains EmployeeNumber  the third column must contain EmployeeNumbers in all other rows of the table 5.1 Relational Model

3 –Each column has a unique name, and the order of the columns in the table is insignificant Equivalent Relational Terminology Relation File Table Tuple(Row) Record Row Attribute Field Column Relation File Table Tuple(Row) Record Row Attribute Field Column Relational Model Programmer User

4 5.1 Relational Model (Continue) 7 tuples and 4 attributes Tuple1 Attribute3 Sex Attribute4 EmployeeNumber Tuple7 Tuple2 Attribute1 Name Attribute2 Age

5 5.1 Relational Model (Continue) 5.1.1 Functional dependency –relationship between or among attributes ex) if we know the value of customerAccountNumber  can find the value of CustomerBalance CustomerBalance is functionally dependent on customerAccountNumber –Notation of Functional dependency Ex.: SID → Major (Major is dependent on SID) relationship of SID with Major  N:1(many to one) Ex.: computerSerialNumber → MemorySize determinant

6 5.1 Relational Model (Continue) –Dependency between groups of attributes Ex.: Relation GRADES (SID, ClassName, Grades) (SID, ClassName) → Grades –Difference in two patterns if X → (Y, Z) then, X → Y and X → Z Ex.: SID → (StudentName, Major) then, SID → StudentName and SID → Major are true But, if (X, Y) → Z then, X → Z or Y → Z is not true

7 5.1 Relational Model (Continue) 5.1.2 Key –a group of one or more attributes that uniquely identifies a row –ACTIVITY relation ACTIVITY (SID, Activity) Key : SID Sample Data SIDActivityFee

8 5.1 Relational Model (Continue) Combination of two attributes become a key must consult the users to decide a key (not by the rule) Every relation has at least one key SIDActivityFee

9 5.1 Relational Model (Continue) No relation can have duplicated rows (uniquely identify a row) Ex.: (Activity, Fee) is a key? No! because many students can participate in skiing at the extreme, the key consists of all of the attributes of the relation

10 5.1 Relational Model (Continue) 5.1.3 Functional Dependencies, Key, and Uniqueness –a determinant of a functional dependency may or may not be unique in a relation Ex.: in ACTIVITYS relation, Activity functionally determines Fee, yet there can be many instances of a particular Activity in the relation –Unlike determinants, keys are always unique

11 5.2 Normalization Undesirable relation  desirable relation 5.2.1 Modification Anomalies –deletion anomaly: by deleting the facts about one entity, we inadvertently delete facts about another entities –insertion anomaly:cannot insert a fact about one entity until we have an additional fact about another entity

12  Solution : dividing the relation into two relations (separating has a disadvantage) –Two relations from ACTIVITY SIDActivity Fee STU-ACT (SID, Activity) Key :SID ACT-COST (Activity, Fee) Key :Activity

13 5.2 Normalization (continue) STU-ACT[Activity]  ACT-COST[Activity] –[ ] denotes a column of data that extracted from a relation Referential integrity constraint Ex.: before we allow an Activity to be entered into STU-ACT, we must check to make sure that it is already present in ACT-COST

14 5.2 Normalization (continue) 5.2.2 Essence of Normalization –In Figure 5-3: Anomalies occurs because ACTIVITY contains facts about two different themes two themes –the students who participate in each activity –how much each activity costs we must add data about two themes at once and delete data about two themes Each relation must have a single theme –Every time break up a relation, we may create a referential integrity constraint

15 5.2 Normalization (continue) 5.2.3 Classes of Relations –Normal forms: classes of relations and the techniques for preventing anomalies –Normal forms: 1970 (Codd and others) E.F.Codd : 1NF, 2NF, 3NF Boyce-Codd : BCNF, 4NF, 5NF No theory guaranteed that any of them would eliminate all anomalies R.Fagin : Domain/Key normal form (DK/NF)

16 5.2 Normalization (continue) Relationship of the Normal Forms First Normal Form(1NF) Second Normal Form(2NF) Third Normal Form(3NF) Boyce-Codd Normal Form(BCNF) Fourth Normal Form(4NF) Fifth Normal Form(5NF) * Domain/Key Normal Form (DK/NF) *

17 5.3 First through Fifth Forms Any table of data that meets the definition of a relation  1NF Definition of a relation –All cells of the table must ve of single value –Repeating groups and arrays are not allowed as values –all entries in any attribute must be of the same kind –Each column must have a unique name (order of the column is insignificant) –no two rows in a table may be identical (order of the row is insignificant)

18 5.3 First through Fifth Forms 5.3.1 2NF –Definition : if all nonkey attributes in a relation are dependent on all of the key a relation is in second normal form

19 5.3 First through Fifth Forms Ex.: Figure 5-4, key= (SID, Activity) –Activity  Fee ( Fee is partially dependent) –Determinant Activity is only part of the key –To eliminate anomalies, separate the relation into two relations SIDActivityFee If we delete the tuple for Student 175 we will lose the fact that squash costs $50. Also, cannot enter an activity until a student signs up for it.

20 5.3 First through Fifth Forms 5.3.2 3NF –2NF also can have anomalies –In Figure 5-7(a), SID  Building and Building  Fee SID  Fee –Transitive dependency: SID determines Building and Building determines Fee (SID determines Fee through the attribute Building –What happens if we delete the second tuple shown in Figure 5-7(a)? Lose the fact that Student 150 lives in Ingersoll Hall Lose the fact that it cost $1100 to live there

21 5.3 First through Fifth Forms SIDBuildingFee HOUSING (SID, Building,Fee) Key :SID Functional dependencies : Building  Fee SID  Building  Fee (So, SID  Fee indirectly) SIDBuilding STU-HOUSING (SID, Building) Key :SID Transitive dependency (Figure 5-7) BLDG-FEE (Building,Fee) Key :Building Randolph1200 Ingersoll1100 Pitkin1100 BuildingFee

22 5.3 First through Fifth Forms SIDBuilding Fee STU-HOUSING (SID, Building) Key :SID BLDG-FEE (Building,Fee) Key :Building –A relation is in 3NF if it is in 2NF and has no transitive dependencies –Relation HOUSE can be divided into two relations in 3NF

23 5.3 First through Fifth Forms 5.3.3 BCNF –3NF also can have anomalies Ex.: In Figure 5-8(a), suppose Student 300 drops out of school. If we delete Student 300’s tuple, will lose the fact that Perl advises in psychology (Deletion anomaly) cannot store the fact that Keynes advises in economics(insertion anomaly) –A relation is in BCNF if every determinant is a candidate key (relation ADVISER is not in BCNF, because the determinant, Fname, is not a candidate key)

24 5.3 First through Fifth Forms SIDMajorFname ADVISER(SID, Major,Fname) Key(primary) :(SID, major) Key(candidate) :(SID, Fname) Functional dependencies : Fname  Major Two or more attributes or attribute collections that can be a key are called candidate keys. Whichever of the candidates is selected to be the key is called the primary key.

25 5.3 First through Fifth Forms BCNF for Figure 5-8(a) SIDFname Major STU-ADV (SID, Fname) Key :SID,Fname ADV-MAJ(Fname,Major) Key :Fname Anomalies can arise from situations other than functional dependencies

26 5.3 First through Fifth Forms 5.3.4 4NF –A relation is in 4NF if it is in BCNF and has no multi- value dependencies –A multi-value dependency exists when a relation has at least three attributes two of them are multi-value their values depend on only the third attribute –Update anomalies: too much updating needs to be done to make a simple change in the data

27 5.3 First through Fifth Forms SIDMajorActivity STUDENT(SID, Major,Activity) Key :(SID,Major, Activity) Multi-valued dependencies : SID   Major SID   Activity This relation is in BCNF (2NF because it is all key; 3NF because it has no transitive dependencies;and BCNF because is has no non-key determinants)

28 5.3 First through Fifth Forms Anomalies in STUDENT relation –If a student add a major, we must enter a tuple for the new major –In order to keep the data consistent, we must add one row for each of her majors paired with skiing SIDMajor Activity [100,MUSIC,SKIING] tuple insertion SID MajorActivity Too much updating needs to be done to make a simple change in the data

29 5.3 First through Fifth Forms –Eliminating multi-value dependency 5.3.5 5NF –concerns dependencies that are rather obscure –do not know that the consequences of such dependencies are or even if they have any practical consequences SIDMajor SIDActivity STU-MAJOR (SID, Major) Key :(SID,Major) STU-ACT (SID,Activityt) Key :(SID,Activity)

30 5.4 Domain/Key Normal Form No higher normal form is needed at least in order to eliminate modification anomalies 5.4.1 Definition –A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains No Modification Anomalies DK/NF Proved

31 5.4 Domain/Key Normal Form –Constraint Rule governing static values of attributes that is precise enough that we can ascertain whether or not it is true examples: Edit rules, intra-relation and inter-relation constraints, functional dependencies, and multi-value dependencies Exclude constraints pertaining to changes in data values, or time-dependent constraints –Key unique identifier of a tuple

32 5.4 Domain/Key Normal Form –Domain description of an attribute’s allowed values –Physical description: Set of values the attribute can have –Logical description: Meaning of the attribute Informally, a relation is in DK/NF if enforcing key and domain restrictions causes all of the constraints to be met There is no known algorithm for converting a relation to DK/NF. In spite of this, in the practical world of DB design, DK/NF is an exceedingly useful design objective

33 5.4 Domain/Key Normal Form 5.4.2 Example 1 of DK/NF STUDENT (SID, GradeLevel, Building, Fee) Key : SID Constraint : Building  Fee SID must not begin with digit 1

34 5.4 Domain/Key Normal Form –Domain/Key definition of Example 1 Domain Definition SID in CDDD, where C is decimal digit not=1; D= decimal digit GradeLevel in {‘FR’,’SO’,’JR’,’SN’,’GR’} Building in CHAR(4) Fee in DEC(4) Relation and Key Definitions STUDENT (SID, GradeLevel, Building) Key :SID BLDB-FEE (Building, Fee) Key : Building Domain Definition SID in CDDD, where C is decimal digit not=1; D= decimal digit GradeLevel in {‘FR’,’SO’,’JR’,’SN’,’GR’} Building in CHAR(4) Fee in DEC(4) Relation and Key Definitions STUDENT (SID, GradeLevel, Building) Key :SID BLDB-FEE (Building, Fee) Key : Building

35 5.4 Domain/Key Normal Form 5.4.3 Example 2 of DK/NF –Functional dependency and multi-value dependency PROFESSOR (FID, Fname, Class, SID, Sname) Key : (FID, Class, SID) Constraints : FID  Fname Fname  FID FID   Class | SID Fname   Class | SID SID  FID SID  Fname SID  Sname FID must start with 1 ; SID must not start with 1 PROFESSOR (FID, Fname, Class, SID, Sname) Key : (FID, Class, SID) Constraints : FID  Fname Fname  FID FID   Class | SID Fname   Class | SID SID  FID SID  Fname SID  Sname FID must start with 1 ; SID must not start with 1

36 5.4 Domain/Key Normal Form Domain/Key definition of Example 2 Domain Definition FID in CDDD, C=1; D= decimal digit Fname in CHAR(30) Class in CHAR(10) SID in CDDD, C is decimal digit, not =1; D= decimal digit Sname in CHAR(30) Relation and Key Definitions FACULTY (FID, Fname) Key (primary): FID Key (candidate): Fname PREPARATION (Fname, Class) Key : Fname, Class STUDENT(SID Sanme,Fname) Key :SID Domain Definition FID in CDDD, C=1; D= decimal digit Fname in CHAR(30) Class in CHAR(10) SID in CDDD, C is decimal digit, not =1; D= decimal digit Sname in CHAR(30) Relation and Key Definitions FACULTY (FID, Fname) Key (primary): FID Key (candidate): Fname PREPARATION (Fname, Class) Key : Fname, Class STUDENT(SID Sanme,Fname) Key :SID

37 5.4 Domain/Key Normal Form 5.4.4 Example 3 of DK/NF –constraint among data values within a tuple that is neither a functional dependency nor a multi-value dependency STU-ADVISER (SID, Sname, FID, Fname, GradFacultyStatus) Key : SID Constraints : FID  Fname Fname  FID FID and Fname  GradFacultyStatus only graduate faculty can advise graduate students FID beginas with 1 SID must not begin with 1 SID of graduate student begins with 9 GradFacultyStatus= {0 for undergraduate faculty {1 for graduate faculty STU-ADVISER (SID, Sname, FID, Fname, GradFacultyStatus) Key : SID Constraints : FID  Fname Fname  FID FID and Fname  GradFacultyStatus only graduate faculty can advise graduate students FID beginas with 1 SID must not begin with 1 SID of graduate student begins with 9 GradFacultyStatus= {0 for undergraduate faculty {1 for graduate faculty

38 5.4 Domain/Key Normal Form Domain/Key definition of Example 3 Domain Definition FID in CDDD, where C=1; D= decimal digit Fname in CHAR(30) Grad-Faculty-Status in [0,1] GSID in CDDD, where C=9; D= decimal digit ;graduate student UGSID in CDDD, WHERE C  1 and C  9; D= decimal digit; undergraduate student Sname in CHAR(30) Additional Domain Definitions Gfname in {Fname of FACULTY, where GradFacultyStatus=1} Relation and Key Definitions FACULTY (FID, Fname,GradFacultyStatus) Key : FID or Fname G-ADV(GSID, Sname, Gfname) Key:GSID UG-ADV (UGSID, Sname,Fname) Key:UGSID

39 5.4 Domain/Key Normal Form Summary of Normal Forms FormDefining characteristic 1NF Any relation 2NF All non-key attributes are dependent on all of the keys 3NF There are no transitive dependencies BCNF Every determinant is a candidate key 4NF There are no multi-valued dependencies 5NF Not described in this discussion DK/NF All constraints on relations are logical consequences of domains and keys FormDefining characteristic 1NF Any relation 2NF All non-key attributes are dependent on all of the keys 3NF There are no transitive dependencies BCNF Every determinant is a candidate key 4NF There are no multi-valued dependencies 5NF Not described in this discussion DK/NF All constraints on relations are logical consequences of domains and keys

40 5.5 The Synthesis of Relations They determine each other One determines the other They are functionally unrelated A → B and B → A Hence A and B have a one-to-one attribute relationship A → B and B → A Hence A and B have a one-to-one attribute relationship A not → B and B not → A Hence A and B have a many-to-many attribute relationship A not → B and B not → A Hence A and B have a many-to-many attribute relationship A → B, but B not → A Hence A and B have a many-to-one attribute relationship A → B, but B not → A Hence A and B have a many-to-one attribute relationship

41 5.5 The Synthesis of Relations 5.5.1 One-to-One Attribute Relationships (FID and Fname in Example 2 and 3 on DKNF) –If two attributes functionally determine each other, the relationship of their data values is one to one –If two attributes uniquely identify the same thing ( entity or object), the relationship of their data values is one to one –If two attributes have a one-to-one relationship, they functionally determine each other

42 5.5 The Synthesis of Relations Summary of Three Types of Attribute Relationship

43 5.5 The Synthesis of Relations Summary of Rules for Constructing Relations –One-to-One relationship Attributes that have a one-to-one relationship must occur together in at least one relation. Call the relation R and the attributes A and B Either A or B must be the key of R An attribute can be added to R if it is functionally determined by A or B An attribute that is not functionally determined by A or B cannot be added to R A and B must occur together in R, but should not occur together in other relations Either A or B should be consistently used to represent the pair in relations other than R

44 5.5 The Synthesis of Relations Summary of Rules for Constructing Relations –Many-to-One Relationship Attributes that have a many-to-one relationship can exist in a relation together. Assume C determines D in relation S C must be the key of S An attribute can be added to S if it is determined by C An attribute that is not determined by C cannot be added to S

45 5.5 The Synthesis of Relations –Many-to-Many Relationship Attributes that have a many-to-many relationship can exist in a relation together. Assume two such attributes, E and F, reside together in relation T The key of T must be (E, F) An attribute ca be added to T if it is determined by the combination (E, F) An Attribute may not be added to T if it is not determined by the combination (E, F) If adding a new attribute, G, expands the key to (E, F, G), then the theme of the relation has been changed. Either G does not belong in T or the name of T must be changed to reflect the new theme

46 5.5 The Synthesis of Relations 5.5.2 N:1 Attribute Relationships –Ex.2: SID determines FID. Many students(SID) are advised by a faculty member(FID), but each student is advised by only one faculty member 5.5.3 M:N Attribute Relationships –Ex.2: Fname and Class have a many-to-many relationship. A professor teaches many classes, and a class is taught by many professors. Both attributes must be a key of the relationship A → B, but B not → A Hence A and B have a many-to-one attribute relationship A → B, but B not → A Hence A and B have a many-to-one attribute relationship A not → B and B not → A Hence A and B have a many-to-many attribute relationship A not → B and B not → A Hence A and B have a many-to-many attribute relationship

47 5.6 Multi-value dependencies, Iteration 2 Solution to the Multi-value dependency –STUDENT(SID, Major, Activity) two independent M:N relationships –Split the relation into two relations (each with a single theme) SIDMajor STU-MAJOR (SID, Major) Key :(SID,Major) Everything we know about the combination is in a single row, and we will not gain more information about that combination by examining more rows

48 5.7 Optimization 5.7.1 De-Normalization –Sometimes the result of normalization is not worth the cost –Ex. Not DK/NF CUSTOMER (CustNumber, CustName, City, State, Zip) Key : CustNumber FD : Zip  (city, state) CUSTOMER (CustNumber, CustName, Zip) Key : CustNumber CODES (Zip, City, State) Key : Zip Transformed into DK/NF Does not represent a better design

49 5.7 Optimization 5.7.2 Optimization –A college has one dean and from one to three assistant deans COLLEGE (CollegeName, Dean, AssistantDean) DEAN (CollegeName, Dean) ASSISTANT-DEAN (CollegeName, AssistantDean) COLLEGE1(CollegeName, Dean, AssistantDean1, AssistantDean2, AssistantDean3) Key: (CollegeName, AssistantDean) Transformed into DK/NF Must read 2 ~ 4 rows of data to obtain data about the college

50 5.7 Optimization Query Form (to see if there is a COLLEGEs that had an assistant dean named ‘Mary Aberanthy’) Using the normalized design with ASSISTANT- DEAN SELECT CollegeName FROM COLLEGE1 WHERE AssistantDean1= ‘Mary Abernathy’ or AssistantDean2= ‘Mary Abernathy’ or AssistantDean3= ‘Mary Abernathy’ or SELECT CollegeName FROM ASSISTANT-DEAN WHERE AssistantDean= ‘Mary Abernathy’ There is no hard and fast rule stating how to select among them.

51 Controlled Redendancy –For performance reason, it is sometimes appropriate to duplicate data intentionally –ITEM(PartNumber, PartName, PartColor, PartDesc, PartPic, QuantityOnHand, QuantityOnOrder, StandardPrice, StandardCost, BuyerName) If PartDesc and PartPic take large memory and some application does not use these two attributes –create second table duplicating ITEM A potential for serious integrity problems need to develop both programmatic and manual controls to ensure that such problems does not occur


Download ppt "5. The Relational Model and Normalization 5.1 Relational Model5.2 Normalization 5.3 1NF to 5NF 5.4 Domain/Key Normalization 5.5 Synthesis of Relation 5.6."

Similar presentations


Ads by Google