Presentation is loading. Please wait.

Presentation is loading. Please wait.

Schema Refinement What and why

Similar presentations


Presentation on theme: "Schema Refinement What and why"— Presentation transcript:

1 Schema Refinement What and why
Copyright © Curt Hill

2 What is a good schema? A good schema should:
Represent all the data needed Group the data into relations that make sense Have little or no redundancy Make common operations efficient Not just a common sense notion We have some objective ways of determining if a schema is indeed good Copyright © Curt Hill

3 Redundancy What is wrong with redundant data?
Space and access tradeoff Update anomaly One copy is changed and others not Insert anomaly An insertion requires that unrelated information also be inserted Delete anomaly Deleting something deletes unrelated information Copyright © Curt Hill

4 Normalization Design activities to preclude the redundancy and functional anomalies There are a series of normal forms that are contained within one another 5thNF=PJ  4thNF BCNF 3rdNF 2ndNF 1stNF  implies or contains NF = Normal Form PJ = Project Join, form of 5thNF BC = Boyce-Codd BCNF is a slight strengthening of 3rdNF Copyright © Curt Hill

5 How we will do this? We will start with the simplest and work up to the most complicated Show how to determine the particular normal form Show what problems the next normal form solves The literature describes an 18th Normal Form We will stop at 5th Normal Form Warning: Mathematics ahead If there is no math, this is not science Copyright © Curt Hill

6 First Normal Form Default case in a relational database
Rectangular tables Fixed number of fields A file is not in 1stNF if it allows repeating groups Such as a variable number of fields A relational database may allow variable length field but that is an implementation consideration The field is considered atomic Copyright © Curt Hill

7 1st NF and non 1st NF 1013 Joe Smith Biology English 1043 Jon Smith
Not in 1st Normal Form Repeated Groups 1013 Joe Smith Biology English 1043 Jon Smith CIS 1152 Jane Jones Math 1st Normal Form 1013 Joe Smith Biology 1013 Joe Smith English 1043 Jon Smith CIS 1152 Jane Jones Math Copyright © Curt Hill

8 An example in 1st NF Attributes SID - numeric student ID
SNAME - student name LCODE - location (campus) STATUS - numeric status of the location CID - course ID (numeric) CNAME - course name SITE - location of the course GRADE - grade this student received Key is SID and CID Copyright © Curt Hill

9 A picture 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91
SID SName LCode Status CID CName Site Grade 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91 385 DB I VNG 76 62 Copyright © Curt Hill

10 What problems exist? Twos:
Locations, student and course Names IDs Both of these depend on part but not all of the key Looks like two tables not one Table is in 1stNF but not 2ndNF Copyright © Curt Hill

11 Anomalies Update anomaly Insert anomaly Delete anomaly
Changing course number requires changing several records Changing the LCode requires several updates Insert anomaly We cannot have a student without their taking at least one class Delete anomaly Deleting first record destroys all that we know about 170 Copyright © Curt Hill

12 Problem again The real problem is that things like CName are not dependent on the entire key CName is dependent on CID Just part of the key We need to consider functional dependencies Copyright © Curt Hill

13 Functional Dependencies (FD)
If field A determines field B then B is functionally dependent on A In other words: if we know A we know B Notation: AB This is read: A determines B A does not have to be an atomic attribute Every field is functionally dependent on every candidate key Includes every field with uniqueness property Copyright © Curt Hill

14 Full Functional Dependency
Somewhat stronger than previous B is fully functionally dependent on A iff B is functionally dependent on A B is not functionally dependent on any subset of A If A is atomic FD = FFD Notation is A ↠ B Copyright © Curt Hill

15 Observations We cannot tell FDs by just looking at the data
We must understand the data relationships Small tables may have apparent FDs that were not actually FDs If every AB was projected onto its relation then A would be the key Each FD represents an integrity constraint Copyright © Curt Hill

16 Closure of a Set of FDs The closure (denoted F+) of a set F of FDs is a set that includes: All FDs Every FD that can be derived from the given FDs FDs obey some properties that allow us to find FDs implied by other FDs These properties are called Armstrong’s Axioms Copyright © Curt Hill

17 Armstrong’s Axioms There are three basic rules:
Reflexivity Augmentation Transitivity Two additional rules may be derived using these three Union Decomposition Copyright © Curt Hill

18 Reflexivity If Y is a subset of X then X  Y
A set of fields determines all of its members Examples: A  A AB  B Trivial FDs are any FD where the right hand side is a subset of the left hand side Copyright © Curt Hill

19 Augmentation If X determines Y Then XZ determines YZ
It is always possible to add a field to both sides of a functional dependency Example: If A  B then AC  BC Copyright © Curt Hill

20 Transitivity If X determines Y and Y determines Z Then X determines Z
We can chain FDs together Example: If: A  B B  C C  D then: A  C A  D Copyright © Curt Hill

21 Union If a field determines two separate fields it determines both of them together If X determines Y and X determines Z Then X determines YZ If: A  B A  C then: A  BC Copyright © Curt Hill

22 A Example Suppose that a table has six fields: ABCDEF
The following dependencies exist: AC  B C  DE F  AC How many dependencies can be derived? What dependencies are contained in the closure? Copyright © Curt Hill

23 Closure The closure is the union of any dependency that may be derived from the original set: AC  B, C  DE, F  AC Reflexivity (AKA trivial) A  A, B  B, AB  B, ABC  C, … Augmentation CA  ADE, ACD  BD, … Transitive F  B, F  DE Copyright © Curt Hill

24 Keys and Dependencies A key is any set of fields that determine all other fields Either directly or transitively A candidate key must be minimal No field may be removed and stay a key In the above: The entire relation is a key by reflexivity but is not minimal F is the key – it determines every other field directly or using transitivity Super key: set of fields that contains a key Copyright © Curt Hill

25 Decomposition If a field determines two combined fields it determines both of them separately If X determines YZ Then X determines Y and X determines Z This is the reverse of Union If: A  BC then: A  B A  C Copyright © Curt Hill

26 Decompositions Use projections to subdivide a table into several tables in order to move to a higher normal form However, can all projections be done without problems? No There are both lossless and lossy projections The kind of desired projections are called: lossless join decompositions This kind allows us to exactly reconstruct the original table Copyright © Curt Hill

27 Lossless Join Decomposition
How may we subdivide one relation into two without losing anything? There must be some attributes in common in the two tables Otherwise the relationship between a key and attribute is broken The decomposition is lossless if one of the attributes in common is a key of either table Copyright © Curt Hill

28 Lossless Decomposition Again
Let R be a set of fields in a relation F be a set of FDs that hold over R The decomposition of R into R1 and R2 is lossless if and only if either F+ contains either R1  R2  R1 or R1  R2  R2 The attributes in common must contain the key for R1 or the key for R2 Copyright © Curt Hill

29 Example Original Join is larger than original, some information lost S
D S1 P1 D1 S2 P2 D2 S3 D3 S P D S1 P1 D1 S2 P2 D2 S3 D3 Decomposed into two S P S1 P1 S2 P2 S3 P D P1 D1 P2 D2 D3 Copyright © Curt Hill

30 Why did that not work? The common field was P P is not the key
Recall: The functional dependencies cannot be determined from looking at the data The data may only show what is not an FD In this case either S or D or both could be the key Copyright © Curt Hill

31 Example Revisited This works now, but may not work, with more data.
Original Reconstructed the same as original S P D S1 P1 D1 S2 P2 D2 S3 D3 S P D S1 P1 D1 S2 P2 D2 S3 D3 Decomposed into two better tables S P S1 P1 S2 P2 S3 S D S1 D1 S2 D2 S3 D3 This works now, but may not work, with more data. Copyright © Curt Hill

32 Other Notes This generalizes to decomposing a table into more than two tables Decompose R1 into R1A and R1B We can then reconstruct R1 if needed From the viewpoint of lossless decomposition: The common fields must include the key, but may include other fields From the viewpoint of decomposing into higher normal forms: The common fields are usually only key fields Non-key fields are just redundant data Copyright © Curt Hill

33 Second Normal Form (2ndNF)
A table is in Second Normal Form if and only if It is in 1st NF and Every non-key attribute is fully functionally dependent on the whole key No partial dependencies Copyright © Curt Hill

34 Partial Dependencies XA X is part of key but not all of it
Violation of 2nd NF Copyright © Curt Hill

35 Student Table Our previous student table was 1stNF but not 2ndNF
The key is SID and CID LCODE is dependent on SID CNAME is dependent on CID The fix is projecting it into two (or more) tables This must be dependency preserving Copyright © Curt Hill

36 What dependencies? SIDSNAME SIDLCODE LCODESTATUS CIDCNAME
SID,CIDGRADE CIDSITE SID,CIDEverything Copyright © Curt Hill

37 Now what? The two piece key implies three tables:
One where SID is the key One where CID is the key One with both SID and CID as the key Each table has only fields dependent on the whole key Copyright © Curt Hill

38 Original 1NF Table 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC
SID SName LCode Status CID CName Site Grade 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91 385 DB I VNG 76 62 Copyright © Curt Hill

39 New Relations Student SID SName LCode Status 21 Jones A1 1 32 Smith
Enroll Course SID CID Grade 21 170 89 32 160 68 91 385 76 62 CID CName Site 170 C Lit MCF 160 C++ RSC 385 DB I VNG Copyright © Curt Hill

40 The new schema is better
Used a three-way lossless join decomposition Now at Second Normal Form Lost some anomalies The insertion and deletion anomalies We may have a student without a class The update anomaly Changing a course title needs only one update One anomaly still exists: Changing LCode of one requires changing other LCodes as well More work to be done Copyright © Curt Hill

41 Finally Dependencies are mathematical concept
Strongly related to the concept of a key We can use dependencies to determine a table’s normal form Second, third and Boyce-Codd First is any rectangular table Second has no partial dependencies A 1NF table with a single field for a key must be in 2NF Copyright © Curt Hill


Download ppt "Schema Refinement What and why"

Similar presentations


Ads by Google