Schema Refinement What and why

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
1 Database Design Theory Which tables to have in a database Normalization.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Normalization I.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Normalization. 2 Objectives u Purpose of normalization. u Problems associated with redundant data. u Identification of various types of update anomalies.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
Your name here. Improving Schemas and Normalization What are redundancies and anomalies? What are functional dependencies and how are they related to.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Normalization.
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
Functional Dependency and Normalization
Advanced Normalization
CS422 Principles of Database Systems Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
Gergely Lukács Pázmány Péter Catholic University
Normalization DBMS.
Schema Refinement and Normal Forms
Database Management Systems (CS 564)
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
CS422 Principles of Database Systems Normalization
Relational Database Design by Dr. S. Sridhar, Ph. D
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Advanced Normalization
Functional Dependencies and Normalization
Schema Refinement and Normalization
Schema Refinement and Normalization
Database Normalization
Module 5: Overview of Normalization
Normalization Murali Mani.
Lecture #17: Schema Refinement & Normalization - Normal Forms
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Normalization.
Normalization Part II cs3431.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Schema Refinement and Normal Forms
Designing Relational Databases
Relational Database Design
Database.
Chapter 3: Design theory for relational Databases
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
CS4222 Principles of Database System
Design Theory for Relational Databases
Presentation transcript:

Schema Refinement What and why Copyright © 2003-2015 Curt Hill

What is a good schema? A good schema should: Represent all the data needed Group the data into relations that make sense Have little or no redundancy Make common operations efficient Not just a common sense notion We have some objective ways of determining if a schema is indeed good Copyright © 2003-2015 Curt Hill

Redundancy What is wrong with redundant data? Space and access tradeoff Update anomaly One copy is changed and others not Insert anomaly An insertion requires that unrelated information also be inserted Delete anomaly Deleting something deletes unrelated information Copyright © 2003-2015 Curt Hill

Normalization Design activities to preclude the redundancy and functional anomalies There are a series of normal forms that are contained within one another 5thNF=PJ  4thNF BCNF 3rdNF 2ndNF 1stNF  implies or contains NF = Normal Form PJ = Project Join, form of 5thNF BC = Boyce-Codd BCNF is a slight strengthening of 3rdNF Copyright © 2003-2015 Curt Hill

How we will do this? We will start with the simplest and work up to the most complicated Show how to determine the particular normal form Show what problems the next normal form solves The literature describes an 18th Normal Form We will stop at 5th Normal Form Warning: Mathematics ahead If there is no math, this is not science Copyright © 2003-2015 Curt Hill

First Normal Form Default case in a relational database Rectangular tables Fixed number of fields A file is not in 1stNF if it allows repeating groups Such as a variable number of fields A relational database may allow variable length field but that is an implementation consideration The field is considered atomic Copyright © 2003-2015 Curt Hill

1st NF and non 1st NF 1013 Joe Smith Biology English 1043 Jon Smith Not in 1st Normal Form Repeated Groups 1013 Joe Smith Biology English 1043 Jon Smith CIS 1152 Jane Jones Math 1st Normal Form 1013 Joe Smith Biology 1013 Joe Smith English 1043 Jon Smith CIS 1152 Jane Jones Math Copyright © 2003-2015 Curt Hill

An example in 1st NF Attributes SID - numeric student ID SNAME - student name LCODE - location (campus) STATUS - numeric status of the location CID - course ID (numeric) CNAME - course name SITE - location of the course GRADE - grade this student received Key is SID and CID Copyright © 2003-2015 Curt Hill

A picture 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91 SID SName LCode Status CID CName Site Grade 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91 385 DB I VNG 76 62 Copyright © 2003-2015 Curt Hill

What problems exist? Twos: Locations, student and course Names IDs Both of these depend on part but not all of the key Looks like two tables not one Table is in 1stNF but not 2ndNF Copyright © 2003-2015 Curt Hill

Anomalies Update anomaly Insert anomaly Delete anomaly Changing course number requires changing several records Changing the LCode requires several updates Insert anomaly We cannot have a student without their taking at least one class Delete anomaly Deleting first record destroys all that we know about 170 Copyright © 2003-2015 Curt Hill

Problem again The real problem is that things like CName are not dependent on the entire key CName is dependent on CID Just part of the key We need to consider functional dependencies Copyright © 2003-2015 Curt Hill

Functional Dependencies (FD) If field A determines field B then B is functionally dependent on A In other words: if we know A we know B Notation: AB This is read: A determines B A does not have to be an atomic attribute Every field is functionally dependent on every candidate key Includes every field with uniqueness property Copyright © 2003-2015 Curt Hill

Full Functional Dependency Somewhat stronger than previous B is fully functionally dependent on A iff B is functionally dependent on A B is not functionally dependent on any subset of A If A is atomic FD = FFD Notation is A ↠ B Copyright © 2003-2015 Curt Hill

Observations We cannot tell FDs by just looking at the data We must understand the data relationships Small tables may have apparent FDs that were not actually FDs If every AB was projected onto its relation then A would be the key Each FD represents an integrity constraint Copyright © 2003-2015 Curt Hill

Closure of a Set of FDs The closure (denoted F+) of a set F of FDs is a set that includes: All FDs Every FD that can be derived from the given FDs FDs obey some properties that allow us to find FDs implied by other FDs These properties are called Armstrong’s Axioms Copyright © 2003-2015 Curt Hill

Armstrong’s Axioms There are three basic rules: Reflexivity Augmentation Transitivity Two additional rules may be derived using these three Union Decomposition Copyright © 2003-2015 Curt Hill

Reflexivity If Y is a subset of X then X  Y A set of fields determines all of its members Examples: A  A AB  B Trivial FDs are any FD where the right hand side is a subset of the left hand side Copyright © 2003-2015 Curt Hill

Augmentation If X determines Y Then XZ determines YZ It is always possible to add a field to both sides of a functional dependency Example: If A  B then AC  BC Copyright © 2003-2015 Curt Hill

Transitivity If X determines Y and Y determines Z Then X determines Z We can chain FDs together Example: If: A  B B  C C  D then: A  C A  D Copyright © 2003-2015 Curt Hill

Union If a field determines two separate fields it determines both of them together If X determines Y and X determines Z Then X determines YZ If: A  B A  C then: A  BC Copyright © 2003-2015 Curt Hill

A Example Suppose that a table has six fields: ABCDEF The following dependencies exist: AC  B C  DE F  AC How many dependencies can be derived? What dependencies are contained in the closure? Copyright © 2003-2015 Curt Hill

Closure The closure is the union of any dependency that may be derived from the original set: AC  B, C  DE, F  AC Reflexivity (AKA trivial) A  A, B  B, AB  B, ABC  C, … Augmentation CA  ADE, ACD  BD, … Transitive F  B, F  DE Copyright © 2003-2015 Curt Hill

Keys and Dependencies A key is any set of fields that determine all other fields Either directly or transitively A candidate key must be minimal No field may be removed and stay a key In the above: The entire relation is a key by reflexivity but is not minimal F is the key – it determines every other field directly or using transitivity Super key: set of fields that contains a key Copyright © 2003-2015 Curt Hill

Decomposition If a field determines two combined fields it determines both of them separately If X determines YZ Then X determines Y and X determines Z This is the reverse of Union If: A  BC then: A  B A  C Copyright © 2003-2015 Curt Hill

Decompositions Use projections to subdivide a table into several tables in order to move to a higher normal form However, can all projections be done without problems? No There are both lossless and lossy projections The kind of desired projections are called: lossless join decompositions This kind allows us to exactly reconstruct the original table Copyright © 2003-2015 Curt Hill

Lossless Join Decomposition How may we subdivide one relation into two without losing anything? There must be some attributes in common in the two tables Otherwise the relationship between a key and attribute is broken The decomposition is lossless if one of the attributes in common is a key of either table Copyright © 2003-2015 Curt Hill

Lossless Decomposition Again Let R be a set of fields in a relation F be a set of FDs that hold over R The decomposition of R into R1 and R2 is lossless if and only if either F+ contains either R1  R2  R1 or R1  R2  R2 The attributes in common must contain the key for R1 or the key for R2 Copyright © 2003-2015 Curt Hill

Example Original Join is larger than original, some information lost S D S1 P1 D1 S2 P2 D2 S3 D3 S P D S1 P1 D1 S2 P2 D2 S3 D3 Decomposed into two S P S1 P1 S2 P2 S3 P D P1 D1 P2 D2 D3 Copyright © 2003-2015 Curt Hill

Why did that not work? The common field was P P is not the key Recall: The functional dependencies cannot be determined from looking at the data The data may only show what is not an FD In this case either S or D or both could be the key Copyright © 2003-2015 Curt Hill

Example Revisited This works now, but may not work, with more data. Original Reconstructed the same as original S P D S1 P1 D1 S2 P2 D2 S3 D3 S P D S1 P1 D1 S2 P2 D2 S3 D3 Decomposed into two better tables S P S1 P1 S2 P2 S3 S D S1 D1 S2 D2 S3 D3 This works now, but may not work, with more data. Copyright © 2003-2015 Curt Hill

Other Notes This generalizes to decomposing a table into more than two tables Decompose R1 into R1A and R1B We can then reconstruct R1 if needed From the viewpoint of lossless decomposition: The common fields must include the key, but may include other fields From the viewpoint of decomposing into higher normal forms: The common fields are usually only key fields Non-key fields are just redundant data Copyright © 2003-2015 Curt Hill

Second Normal Form (2ndNF) A table is in Second Normal Form if and only if It is in 1st NF and Every non-key attribute is fully functionally dependent on the whole key No partial dependencies Copyright © 2003-2015 Curt Hill

Partial Dependencies XA X is part of key but not all of it Violation of 2nd NF Copyright © 2003-2015 Curt Hill

Student Table Our previous student table was 1stNF but not 2ndNF The key is SID and CID LCODE is dependent on SID CNAME is dependent on CID The fix is projecting it into two (or more) tables This must be dependency preserving Copyright © 2003-2015 Curt Hill

What dependencies? SIDSNAME SIDLCODE LCODESTATUS CIDCNAME SID,CIDGRADE CIDSITE SID,CIDEverything Copyright © 2003-2015 Curt Hill

Now what? The two piece key implies three tables: One where SID is the key One where CID is the key One with both SID and CID as the key Each table has only fields dependent on the whole key Copyright © 2003-2015 Curt Hill

Original 1NF Table 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC SID SName LCode Status CID CName Site Grade 21 Jones A1 1 170 C Lit MCF 89 32 Smith 160 C++ RSC 68 91 385 DB I VNG 76 62 Copyright © 2003-2015 Curt Hill

New Relations Student SID SName LCode Status 21 Jones A1 1 32 Smith Enroll Course SID CID Grade 21 170 89 32 160 68 91 385 76 62 CID CName Site 170 C Lit MCF 160 C++ RSC 385 DB I VNG Copyright © 2003-2015 Curt Hill

The new schema is better Used a three-way lossless join decomposition Now at Second Normal Form Lost some anomalies The insertion and deletion anomalies We may have a student without a class The update anomaly Changing a course title needs only one update One anomaly still exists: Changing LCode of one requires changing other LCodes as well More work to be done Copyright © 2003-2015 Curt Hill

Finally Dependencies are mathematical concept Strongly related to the concept of a key We can use dependencies to determine a table’s normal form Second, third and Boyce-Codd First is any rectangular table Second has no partial dependencies A 1NF table with a single field for a key must be in 2NF Copyright © 2003-2015 Curt Hill