Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.

Similar presentations


Presentation on theme: "ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE."— Presentation transcript:

1 ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE M ANAGEMENT S YSTEM 1

2 Outline Functional dependencies Normalization 2

3 Normalization – use Functional Dependencies ProductNumberDateTimeEmployeeIDPhoneNumberPayRateWithholding CR76 CR56 CR74 CR56 05/13/02 07/04/02 13:30 12:00 10:30 SG5 SG37 SG ProductNumber,Date -> Time, EmployeeID, PhoneNumber EmployeeID, Date, Time -> ProductNumber PhoneNumber, Date, Time -> EmployeeID, ProductNumber EmployeeID, Date -> PhoneNumber Date -> PayRate PayRate -> Withholding Product Support Coverage 3

4 Illustrate FDs Product Coverage Support ProductNumber,Date -> Time, EmployeeID, PhoneNumber EmployeeID, Date, Time -> ProductNumber PhoneNumber, Date, Time -> EmployeeID, ProductNumber EmployeeID, Date -> PhoneNumber Date -> PayRate PayRate -> Withholding WithholdingPayRatePhoneNumberEmployeeIDTimeDateProductNumber 4

5 Product Support Coverage Pay Coverage Wages Taxes Product Coverage Phone Assignment 1NF 2NF 3NF BCNF Decompose 5

6 Functional Dependencies A functional dependency is a constraint between two sets of attributes in a relational database. If X and Y are two sets of attributes in the same relation T, then X  Y means that X functionally determines Y so that  the values of the attributes in X uniquely determine the values of the attributes in Y  for any two tuples t 1 and t 2 in T, t 1 [X] = t 2 [X] implies that t 1 [Y] = t 2 [Y]  if two tuples in T agree in their X column(s), then their Y column(s) should also be the same. 6

7 FD and Keys  Key constraint is a special kind of functional dependency  K is a superkey for relation schema R if and only if K → R  K is a candidate key for R if and only if K → R, and for no α ⊂ K, α → R  Key is on LHS, all attributes are on RHS ROLL  ROLL, Name, Address  For a key, no two rows share the same values, thus by default, when ever a tuple agrees on LHS it agrees on the RHS. 7

8 8

9 9

10 WELCOME 2 ND DAY OF NORMALIZATION 10

11 Armstrong’s Axioms of FDs 1. Reflexivity: If X  Y then X  Y (trivial FD)  Name, Address  Name 2. Augmentation: If X  Y then X Z  YZ  If Town  Zip then Town, Name  Zip, Name 3. Transitivity: If X  Y and Y  Z then X  Z But keep in mind 2 other rules that are useful: Union: If X → Y and X → Z, then X → YZ Decomposition:  If X → YZ, then X → Y and X → Z 11

12 12

13 Soundness Axioms are sound:  If an expression f: X  Y can be derived from a set of FDs F using the axioms, then f is a FD. We say F entails f. Completeness Axioms are complete:  If F entails f, then f can be derived from F using the axioms  As a result, to determine if F entails f, use the axioms in all possible ways to generate F + (the set of possible FD’s is finite so this can be done) and see if f is in F + 13

14 Functional Dependency Closure (F+) Set F of Functional Dependencies (given) Relation:  EmpProj: SSN, Pnumber, Hours, Ename, Pname, Plocation  FDs F: {SSN → Ename} Pnumber → {Pname, Plocation} {SSN, Pnumber} → Hours} Closures:  {SSN}+ = {SSN, Ename}  {Pnumber}+ = {Pnumber, Pname, Plocation} F+  {SSN, Pnumber}+ = {SSN, Pnumber, Ename, Pname, Plocation, Hours} 14

15 Generating F + F AB  C AB  BCD A  D AB  BD AB  BCDE AB  CDE D  E BCD  BCDE Thus, AB  BD, AB  BCD, AB  BCDE, and AB  CDE are all elements of F + union aug trans aug decomp 15

16 16

17 Attribute Closure Calculating attribute closure is a more efficient way of checking entailment The attribute closure of a set of attributes, X, with respect to a set of functional dependencies, F, (denoted X + F ) is the set of all attributes, A, such that X  A  X + F1 is not necessarily the same as X + F2 Checking entailment: Given a set of FDs, F, then X  Y if and only if X + F  Y (by union & decomposition rule) 17

18 Computation of Attribute Closure Example AB  C (a) A  D (b) D  E (c) AC  B (d) Problem: Compute the attribute closure of AB with respect to the set of FDs : Initially closure = {AB} Using (a) closure = {ABC} Using (b) closure = {ABCD} Using (c) closure = {ABCDE} Solution: 18

19 Computation of Attribute Closure X + F closure := X; --since X  X + F repeat old := closure; if there is an FD Z  V in F such that Z  closure then closure := closure  V until old = closure -- If T  closure then X  T is entailed by F 19

20 Example - Computing Attribute Closure F: AB  C A  D D  E AC  B X X F + A {A, D, E} AB {A, B, C, D, E} (Hence AB is a key) B {B} D {D, E} Is AB  E a FD? Yes Is D  C a FD? No Result: X F + allows us to determine FDs entailed by F of the form X  Y 20

21 21

22 N ORMALIZATION 22

23  The goal is to remove redundancy based on dependencies 23

24 Normal Forms Each normal form is a set of conditions on a schema that guarantees certain properties (relating to redundancy and update anomalies) The two commonly used normal forms are  third normal form (3NF) and  Boyce-Codd normal form (BCNF) 24

25 Levels of Normalization 1 NF 2 NF 3 NF BCNF 25

26 Normal Forms Considerations:  Relational design by analysis  Normal forms are based on functional dependencies (FDs)  Intuitive, perhaps, but identifying a strictly controlled procedure allows a programmatic process  Should consider 2 additional properties Lossless join (nonadditive join property)  required Dependency preservation property  use when possible 26

27 27

28 28

29 29

30 30

31 Lossless Joins and Dependency Preservation If relation R and FDs F hold over R, then decomposing R into R1 and R2 is lossless if the closure of F contains either:  FD R1 ∩ R2 -> R1 or  FD R1 ∩ R2 -> R2 If the closure of the attributes in R1, independent of those attributes in R2, unioned with the closure of attributes of R2, independent of those attributes in R1, are equivalent to the closure F, then dependency is preserved 31

32 32

33 First Normal Form (1NF) A relational schema R is in first normal form if the domains of all attributes of R are atomic (Atomicity) Domain is atomic if its elements are considered to be indivisible units Non-atomic values complicate storage and encourage redundant (repeated) storage of data Requirements:  1NF disallows multivalued attributes, or composite attributes, or their combinations, by requiring only single atomic (indivisible) values in the domain of an attribute 33

34 Business Rules Example Staffing hours (S) are on a per project activity (activities within projects) basis - AN Managers (PM) and their departments (D) are assigned to projects (PN)  A department is assigned to a project managers  A project manager is assigned to projects Project no (PN) Activity no (AN) Project Manager (PM) dept (D) Hour (S) AD A000.8 AD A001.5 AD A001 AD A AD A AD A MA D110.5 MA D111 OP D IF A001 IF A000.5 IF A NF 34

35 Prime Attribute: An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R Non Prime Attribute: An attribute of relation schema R is called a non prime attribute of R if it is not a member of any candidate key. 35

36 Second Normal Form (2NF) 2NF based on the concept of full functional dependency. A functional dependency is full functional dependency if removal of any attribute A from X means that the dependency does not hold anymore; i.e. for any attribute A  X, (X-{A}) does not functionally determine Y. If FD X→Y, removal of A eliminates the FD Partial functional dependency :  For any attribute A  X, (X-{A})→Y  If A can be removed and FD remains, X→Y is a partial functional dependency (a violation of 2NF) 36

37 Partial functional dependency (a violation of 2NF) {SSN,PNUMBER  HOURS is a full dependency. {SSN,PNUMBER}  ENAME is partial because ENO  ENAME holds 37

38 Second Normal Form (2NF) Definition of 2NF:  A relational schema R is in 2NF if every nonprime attribute A in R is fully functionally dependent on the primary key of R. 38

39 2NF Project no (PN) Activity no (AN) Project Manager (PM) dept (D) Hour (S) AD A000.8 AD A001.5 AD A001 AD A AD A AD A MA D110.5 MA D111 OP D IF A001 IF A000.5 IF A000.5 Project no (PN) Activity no (AN) Hour (S) AD AD AD AD AD AD MA MA OP IF IF IF Project no (PN) Project Manager (PM) dept (D) AD311120A00 MA210010D11 OP100010D11 IF100020A00 Staffing is on a per project activity ( and activities within projects) basis Managers and their departments are assigned to projects {PN,AN→h} PN →{PM,D} 1NF 39

40 WELCOME 3 rd DAY OF NORMALIZATION 40

41 Third Normal Form (3NF) Definition of 3NF:  A relational schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key. 3NF is based on the concept of transitive dependency of nonprime attributes on another nonprime attribute. {X→Y,Y→Z} ⊨ X→Z Transitive dependencies - is a 3NF violations LHS of FD should be superkey, or RHS is a prime attribute. 41

42 3NF Project no (PN) Project Manager (PM) dept (D) AD311120A00 MA210010D11 OP100010D11 IF100020A00 A department is assigned to a project manager A project manager is assigned to projects Project no (PN) Project Manager (PM) AD MA OP IF Project Manager (PM) dept (D) 20A00 10D11 PN→PM →D 2NF 42

43 Boyce-Codd Normal Form (BCNF) BCNF is a simpler form of 3NF that is more restrictive.  Every relationship in BCNF is also in 3NF; however 3NF is not necessarily in BCNF. Definition of BCNF:  A relation schema R is in BCNF if whenever a nontrivial functional dependencies X → A holds in R, then X is a superkey of R. LHS of a FD should be superkey Note:  Each attribute is identified by nothing but the key  Sometimes too restrictive, may not be dependency-preserving with regard to closure 43 3 NF BCNF

44 KEY A1 … 44

45 New example: LOT 45 1NF 2NF

46 New example: LOT 46 3NF

47 47 FD5  Let there exist a new FD5;  FD5 violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A  FD5 satisfy 3NF in LOTS1A because COUNTRY_NAME is a prime attribute, but this condition does not exist in definition of BCNF  So decompose LOTS1A in to LOTS1AX and LOTS1AY

48 3NF to BCNF 48

49 Other dependences and normal forms (not commonly used) Multivalued dependences (4NF)  If X and Y are subsets of attributes of relation schema R:  MVD X ↠ Y independent of the values in other attributes R is in 4NF if for every MVD X ↠ Y that holds over R, one of the following is true:  Y  X or XY = R (trivial MVD), or  X is a superkey 49

50 MVD An employee can be assigned to any project and, within those projects, to any activities, but the assignments are consistent for that employee A project or activity can have any number of employees assigned to it ENPA 130Query Services, User Education Debug, Supp 30Query ServicesDebug, Test, Code ENP 130Query Services 130User Education 30Query Services ENA 130Debug 130Supp 30Debug 30Test 30Code EN→P EN→A 50

51 Join Dependences (5NF)  A further generalization of MVDs  All MVDs are JD, but not all JDs are MVDs  For every JD ⋈ {R 1,…R n }, one of the following is true:  R i = R for some i, or  The JD is implied by the set of those FDs over R in which the left side is a key for R  If a relation schema is in 3NF and each of its keys consists of a single attribute, it is also in 5NF More - normal forms (rare) 51

52 5NF ENPA 130Query ServicesDebug 130User EducationSupp 140Query ServicesSupp 130Query ServicesSupp ENP 130Query Services 130User Education 140Query Services ENA 130Debug 130Supp 140Supp PA Query Services Debug User Education Supp Query Services Supp {EN,P,A} JD {EN,P} {EN,P,A} JD {P,A} {EN,P,A} JD {EN,A} If an employee works for a project, the employee will be assigned to activities within that project 52


Download ppt "ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE."

Similar presentations


Ads by Google