Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation 5.

Similar presentations


Presentation on theme: "1 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation 5."— Presentation transcript:

1

2 1 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation 5

3 2 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Outline Revision: FD, 2NF, 3NF Boyce-Codd Normal Form (BCNF) normalisation non-loss decomposition Heaths theorem normalisation process semantic assumptions and FDs CKs decomposition normalisation vs dependency preservation a decomposition may yield to a better solution than another one either-or situations: normalise or preserve FDs

4 3 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Functional dependency (FD) R - relation, X and Y - subsets of attributes of R X Y iff in every possible legal value of R each X-value has a single Y-value associated

5 4 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Functional dependency (FD) In particular: Every attribute is dependent on any CK

6 5 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Full functional dependency (FFD) R - relation, X and Y - subsets of attributes of R Y is fully functionally dependent on X iff (1)X Y (2)there is no X subset of X such that X Y Alternative formulation: the dependency X Y is left-irreducable

7 6 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Full functional dependency (FFD) Is every attribute fully dependent on any CK? Only in 2NF!

8 7 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 2NF - simple definitions Class: (for a relation with only one CK) - no FD from a subset of the PK to a non-key attribute Book: (based only on PK) - every non-primary-key attribute is fully functionally dependent on the PK Are they equivalent?

9 8 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 2NF - general definitions Class: all non-key attributes are irreducibly dependent on the candidate keys Book: every non-key attribute is fully functionally dependent on any CK Are they equivalent?

10 9 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Transitive FD If A,B,C are attributes of a relation such that A B and B C, then C is transitively dependent on A via B (provided that neither B A, nor C A).

11 10 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 3NF - simple definitions Class: (for a relation with only one CK) - no FD between non-key attributes Book: (based only on PK) - no non-primary-key attribute is transitively dependent on the PK Are they equivalent?

12 11 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 3NF - general definition No non-key attribute is transitively dependent on any CK

13 12 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 1

14 13 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 2NF and 3NF optional 2NF a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys 3NF (Zaniolo) R is a relation; X is any set of attributes of R; A is any single attribute of R; consider the following conditions: –X contains A –X contains a candidate key of R –A is contained in a candidate key of R if either of the three is true for every FD X A then R is in 3NF

15 14 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College

16 15 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College BCNF a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant informally the determinant of each relevant FD is a CK

17 16 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College

18 17 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College

19 18 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Example devise examples in class relations in BCNF relations not in BCNF

20 19 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College BCNF any relation can be non-loss decomposed into an equivalent set of BCNF relations BCNF 3NF 2NF 1NF BCNF is still not guaranteed to be free of any update anomalies

21 20 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 2

22 21 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation the process of transforming a relation with redundancies into an equivalent set of relations that have less redundancies transformation projection input :: one relation, say R output :: many relations, say R 1, …, R n equivalent non-loss decomposition R 1 join R 2 … join R n = R R 1, …, R n should have normal forms higher than or equal to that of R

23 22 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Projection (Relational Algebra) If R=(A 1,A 2,...A n ) (i.e. R is a relation with attributes A 1...A n ) and if X is a subset of {A 1,A 2,...A n } then we obtain the projection of R on X by simply keeping the values of R for each attribute in X, and removing the other attributes from the relation. Trivially, the degree of the projection is determined by the number of elements in X. What about the cardinality of the projection?

24 23 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Natural Join (Relational Algebra) If R and S are relations, we can join them by extending one table to contain all attributes (horizontally) we include only the tuples in which all common attributes have equal values (vertically) we include only one of the columns for each common attribute (i.e. repeated columns are discarded). The resulting table defines a relation which is the natural join of R and S. We may write T = R join S

25 24 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Non-loss decomposition Split a given relation to projections, such that the natural join of these projections is equal to the original relation itself.

26 25 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Non-loss decomposition (Patient, Symptom, Doctor, Office, Diagnosis) semantic assumptions exercise

27 26 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Lossy Decomposition (Patient, Symptom, Doctor, Office, Diagnosis) semantic assumptions exercise

28 27 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Heaths theorem can be used as the basis for normalisation theorem suppose R = (A, B, C), where A, B and C are disjoint sets of attributes A B then R = (A, B) join (A, C) state in English prove

29 28 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation – rules of thumb take as basis for normalisation/Heaths theorem a problem FD maximise B when applying Heaths theorem, on the basis of A B try to maintain a one-to-one correspondence with real life entities

30 29 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation steps semantic assumptions FDs CKs decomposition

31 30 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Simple example (M_id, M_name, Type, Value) M_id M_name M_id Type M_id Value Type Value not BCNF Heaths theorem for Type Value results (Type, Value) (M_id, M_name, Type) both relations are now in BCNF

32 31 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 3

33 32 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Example (R) (project, task, max-budget, duration, payment-rate, contractor, contr-time) FDs: (project, task) max_budget, duration (task, max_budget, duration) payment_rate (project, task, contractor) contr_time (project, task, max-budget, duration, payment-rate, contractor, contr-time)

34 33 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Example – decomposition for R Heaths theorem for R (the initial relation) based on task, max_budget, duration payment_rate leads to: R 1 (task, max_budget, duration, payment_rate) R 2 (project, task, max_budget, duration, contractor, contr_time) R 1 is in BCNF R 2 is not in BCNF, due to project, task max_budget, duration

35 34 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Example – decomposition for R 2 Heaths theorem for R 2, based on project, task max_budget, duration leads to R 21 (project, task, max_budget, duration) R 22 (project, task, contractor, contracted_time) R 21 is in BCNF R 22 is in BCNF

36 35 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Example – solution (task, max_budget, duration, payment_rate) (project, task, max_budget, duration) (project, task, contractor, contracted_time)

37 36 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 4

38 37 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Decomposition – 2 or more solutions in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one

39 38 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Decomposition – 2 solutions – example Modules(M_id, M_name, Type, Value) solution #1 Modules_Descr(M_id, M_name, Type) Type_Val(Type, Val) solution #2 Modules_Descr(M_id, M_name, Type) Module_Val(M_id, Val) are they both non-loss? (apply Heaths theorem) is there one better than the other?

40 39 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Solution #1 vs Solution #2 updates u1: insert the fact that a 3 semester module is worth 1.5cu u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu u3: change the type of a module but forget to change its value solution #2 u1 and u2 are impossible or very difficult to perform u3 is allowed solution #1 u1 and u2 are straightforward u3 is not allowed

41 40 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Solution #1 vs Solution #2 solution #1 more expressive certain facts cannot be expressed in solution #2; e.g. the value of a new type updates can be independently performed on the two component relations (i.e. all constraints are properly expressed) in solution #2: Type Value is lost, so this constraint must be enforced by the user by procedural code independent projections updates can be performed independently on each projection, without the danger of ending with inconsistent data

42 41 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Independent projections M-idTypeValue Solution #1Solution #2 M_name M-id Type M_name Type ValueM-id Type M_name M_id Value all direct : intra all transitive : inter one transitive : intra one direct : lost

43 42 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Independent projections - Risanen R 1 and R 2 are two projections of R; R 1 and R 2 are independent if and only if every FD in R is a logical consequence of the FDs in R 1 and R 2 the common attributes of R 1 and R 2 for a candidate key for at least one of R 1 or R 2 atomic relation cannot be decomposed into independent projections

44 43 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Dependency preservation R was decomposed (normalisation) into R 1, …, R n S - the set of FDs for R S 1, …, S n - the set of FDs for R 1, …, R n (each S i refers to only the attributes of R i ) S = S 1 … S n (usually, S S) the decomposition is dependency preserving if S + = S +

45 44 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College 5

46 45 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation vs dependency preservation there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies: either the relation is normalised and some FDs are lost or, some FDs are not lost (they are expressed in the original relation), but the relation is not in its higher normal form possible in this case, no solution is better than the other other criteria will have to be considered to judge better

47 46 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College a patient is treated by a single doctor for a certain disease each doctor only treats one kind of disease a doctor can treat more than one patient is this relation BCNF? can you identify update anomalies? consider also (Patient, Disease, Doctor, Treatment) with Patient, Disease Treatment Disease Doctor Patient Normalisation vs dependency preservation: Example

48 47 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Possible decompositions non-loss? (choose PKs) Heaths theorem (choose PKs)

49 48 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College BCNF vs dependency preservation and do not enforce a FD existing in the original specification, namely: e.g. a patient can be given two doctors that treat the same disease (the system will not disallow this); the constraint would have to be maintained by procedural code

50 49 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College BCNF vs dependency preservation not every FD is expressible through normalisation when the relation was in its original form (3NF) (Patient, Disease) Doctor was expressed a doctor could not be assigned to more than one patient-disease Doctor Disease was not expressed generated update anomalies in BCNF (decomposed) Doctor Disease was expressed (Patient, Disease) Doctor was not expressed generated update anomalies (refer to previous slide) this latter FD would not have been expressed even if the decomposition in all three 2-attribute relations had been considered

51 50 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College –

52 51 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Conclusions normal forms : formalisation of common sense art engineering possibility for automation; difficult, because of non- determinism (more than one choices at one step) BCNF always achievable not always free of update anomalies, because it cannot always express all the FDs existing in the problem there are higher normal forms (4NF, 5NF) defined on the basis of other concepts (not FDs)

53 52 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College © Pearson Education Limited 1995, 2005 Hierarchy of normal forms


Download ppt "1 Term 2, 2007, Lectures 2/3, NormalisationD. Tidhar (based on M. Ursu) Department of Computing, Goldsmiths College Normalisation 5."

Similar presentations


Ads by Google