Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Management Systems (CS 564)

Similar presentations


Presentation on theme: "Database Management Systems (CS 564)"— Presentation transcript:

1 Database Management Systems (CS 564)
Fall 2017 Lecture 6

2 Schema Refinement: Escaping Data Traps
“Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.” - A. de Saint- Exupery CS 564 (Fall'17)

3 Motivating Example Student(SID: int, Name: string, Age: int, Major: string) Course(CID: string, Name: string, Credits: int, Department: string) Section(SecID: int, CID: string, Semester: string, Year: int, Instructor: string, NumEnrollments: int) Prerequisite(CID: string, PrereqID: string) GradeReport(SID: int, SecID: int, Grade: string) Q: Any problems with this design? A: NumEnrollments is redundant. Why? CS 564 (Fall'17)

4 SQL Exercise Write a SQL DDL command to create a view on Section which would automatically compute the NumEnrollments column based on the the other tables. Student(SID: int, Name: string, Age: int, Major: string) Course(CID: string, Name: string, Credits: int, Department: string) Section(SecID: int, CID: string, Semester: string, Year: int, Instructor: string) GradeReport(SID: int, SecID: int, Grade: string) Prerequisite(CID: string, PrereqID: string) CS 564 (Fall'17)

5 SQL Exercise Solution CREATE VIEW SectionWithEnrollments AS SELECT E.SecID, E.CID, E.Semester, E.Year, E.Instructor, (SELECT COUNT(*) FROM GradeReport AS GR WHERE GR.SecID = E.SecID) AS NumEnrollments FROM Section AS E; CS 564 (Fall'17)

6 What’s Wrong with Redundancy
Redundant storage Costs money! Insert anomalies Have to insert other data or deal with NULLs Delete anomalies May lose information by deleting all the copies Update anomalies If one copy is updated, an inconsistency is created unless all other copies are updated CS 564 (Fall'17)

7 Motivating Example (Cont.)
Student(SID: int, Name: string, Age: int, Major: string) Course(CID: string, Name: string, Credits: int, Department: string) Section(SecID: int, CID: string, Semester: string, Year: int, Instructor: string, NumEnrollments: int) Prerequisite(CID: string, PrereqID: string) GradeReport(SID: int, SecID: int, Grade: string) Q: What types of anomaly might we have here? A: Redundant storage and update anomaly. CS 564 (Fall'17)

8 Another Example Student(SID: int, Name: string, Age: int, Major: string) CourseSection(CID: string, SecID: int, CourseName: string, Credits: int, Department: string, Semester: string, Year: int, Instructor: string) Prerequisite(CID: string, PrereqID: string) GradeReport(SID: int, SecID: int, Grade: string) Q: What is the source of the problem? A: The moment we know CID, the values of Name, Credits and Department are fixed. i.e. there is a functional dependency between some of the CourseSection non-key attributes. CS 564 (Fall'17)

9 What is a Functional Dependency
A functional dependency (FD) is a form of constraint Generalizes the concept of key Schema refinement (a.k.a. normalization) is the process of Detecting FDs that cause anomalies, and Decomposing the relations to get rid of those anomalies CS 564 (Fall'17)

10 Schema Refinement: An Outline
Detect anomalies Find FDs in the relations’ schemas Apply Armstrong’s axioms to expand these FDs Use the FDs to find the anomalies in the schemas Remove anomalies Decompose the anomalous schemas CS 564 (Fall'17)

11 Functional Dependency
Let 𝓡(J, K, L) be a relational schema J, K and L are sets of attributes A functional dependency J → K holds if and only if for any instance R of 𝓡(J, K, L) and for any pair of tuples t1 and t2 in R t1.J = t2.J ⇒ t1.K = t2.K “J determines K” CS 564 (Fall'17)

12 Functional Dependency: Example
CourseSection(CID, SecID, CourseName, Credits, Department, Semester, Year, Instructor) Functional dependencies SecID → CID, Name, Credits, Department, Semester, Year, Instructor CID → Name, Credits, Department SecID, CID → Name, Credits, Department, Semester, Year, Instructor A FD is a property of the application for which the database is designed e.g. we might know that CID → Instructor CS 564 (Fall'17)

13 Functional Dependency: Example (Cont.)
Every key constraints is a FD! Reminder Superkey: a subset of attributes uniquely identifying (i.e. determining all the attributes of) each tuple Key: a minimal/irreducible superkey Candidate key: any of the set of keys of a relation Primary key: a designated candidate key of a relation CS 564 (Fall'17)

14 How to Infer FDs Create ER model Translate it into a relational schema
Think about FDs that are valid From the application point of view Given a table with a set of tuples, the best you can do is to Confirm that a FD seems to be valid, or Prove that a FD is definitely not valid (through counterexamples) You cannot prove that a FD is valid CS 564 (Fall'17)

15 How to Infer FDs (Cont.) Suppose you want to inspect the FD J → K for relation R with schema 𝓡(J, K, L) Example procedure Remove attributes in L from all R tuples If the remaining relation is many-to-one, then FD is probably valid i.e. if each combination of J values corresponds to exactly one combination of K values If not, then the FD is definitely invalid CS 564 (Fall'17)

16 How to Infer FDs (Cont.) Example: does CID → Instructor hold for the following instance of Section? Section SecID CID Semester Year Instructor 30098 MATH240 Fall 2017 Euclid 40026 CS367 2016 Dijkstra 1005 Spring 2004 Gauss 30451 CS764 Patel 20006 CS564 2001 Codd CID Instructor MATH240 Euclid CS367 Dijkstra Gauss CS764 Patel CS564 Codd Q: How about Instructor → CID? CS 564 (Fall'17)

17 How to Infer FDs (Cont.) Easy-to-spot FDs: using key constraints
Refresher: a key of a relationship R is an irreducible subset of R’s attributes which uniquely identify each tuple in R i.e. the key determines all the other attributes of R Example: SecID → CID, Semester, Year, Instructor From which we can also infer that SecID → CID / SecID → Semester / SecID → CID, Year / … CS 564 (Fall'17)

18 Closure of FD Set More generally, given a set S of FDs, we want to know the set S+ of all the FDs that are logically implied by S We call S+ the closure of S Given S, find S+ using Armstrong’s axioms CS 564 (Fall'17)

19 Armstrong’s Axioms Let X, Y and Z be three sets of attributes
Axiom 1 (Reflexivity Rule) Y ⊆ X ⇒ X → Y (called a trivial FD) Example {Semester} ⊆ {Semester, Year} ⇒ {Semester, Year} → {Semester} As seen before, we usually write the above FD as Semester, Year → Semester CS 564 (Fall'17)

20 Armstrong’s Axioms (Cont.)
Axiom 2 (Augmentation Rule) X → Y ⇒ XZ → YZ Example SecID → Instructor ⇒ SecID, Semester, Year → Instructor, Semester, Year CS 564 (Fall'17)

21 Armstrong’s Axioms (Cont.)
Axiom 3 (Transitivity Rule) X → Y and Y → Z ⇒ X → Z Example SecID → CID and CID → Textbook ⇒ SecID → Textbook CS 564 (Fall'17)

22 Using Armstrong’s Axioms
Given a set S of FDs, apply the three axioms above repeatedly to S in order to obtain S+ S+ = S loop foreach f in S+ Apply reflexivity and augmentation rules Add the new FDs to S+ foreach pair f1,f2 of FDs in S Apply the transitivity rule to f1,f2 Add the new FD to S+ until S+ does not change any further CS 564 (Fall'17)

23 Using Armstrong’s Axioms (Cont.)
Theorem: Armstrong’s axioms are sound and complete Sound: any FD generated by applying these axioms to S holds for any relation satisfying FDs in S Complete: repeated application of the axioms on S will eventually generate all the FDs in S+ CS 564 (Fall'17)

24 Derived Rules Additional rules, which can be derived from Armstrong’s axioms More convenient to use them than to derive them every time CS 564 (Fall'17)

25 Derived Rules (Cont.) Union Rule Decomposition Rule
X → Y and X → Z ⇒ X → YZ Decomposition Rule X → YZ ⇒ X → Y and X → Z Pseudo-transitive Rule X → Y and YZ → U ⇒ XZ → U CS 564 (Fall'17)

26 Checking FDs Let S be a set of FDs defined on the attributes in the set X e.g. X={SID, Name, SSN}, S={(SID → Name, SSN), (SSN → SID)} Question: is Y ⊆ X a superkey? To answer this question among others, we find all the attribute sets that Y determines CS 564 (Fall'17)

27 Attribute Set Closures
Given a set X of attributes and a set S of FDs, the closure of Y ⊆ X (under S), called Y+, is the set of all attributes Z ∈ X such that Y → Z e.g. X={SecID, CID, CName, Year, Department}, S={(SecID → CID, CName, Year, Department), (CID → Department)} Y={CName, CID} Y+={CName, CID, Department} CS 564 (Fall'17)

28 Compute Attribute Set Closures
Y+ = Y loop if ∃ FD Z → T in S s.t. Z ⊆ Y+ : Y+ = Y+ ∪ T until Y+ does not change any further CS 564 (Fall'17)

29 Use Attribute Set Closures
Test if Y is a superkey Compute Y+ and check if Y+ contains all the attributes of R Test if a given FD Y → Z holds (without computing S+) Compute Y+ and check if Z is in Y+ CS 564 (Fall'17)

30 Minimal Basis of FD Sets
Opposite of closure S is a minimal basis for a set F of FDs if  S+ = F+ Every FD in S has one attribute on the RHS If we remove any FD from S, the closure would not be F+ anymore If for any FD in S we remove one or more attributes from the LHS, the closure would not be F+ anymore CS 564 (Fall'17)

31 Minimal Basis of FD Sets (Cont.)
S is a minimal basis for a set F of FDs if S+ = F+ Every FD in S has one attribute on the RHS If we remove any FD from S, the closure is not F+ If for any FD in S we remove one or more attributes from the LHS, the closure is not F+ Example a ⟶ b a,b,c,d ⟶ e e,f ⟶ g,h a,c,d,f ⟶ e,g What is the minimal basis for the above FD set? a ⟶ b a,c,d ⟶ e e,f ⟶ g e,f ⟶ h CS 564 (Fall'17)

32 Recap: Schema Refinement
Redundancy causes various kinds of anomalies To refine schemas: Detect anomalies Find FDs in the relations’ schemas Apply Armstrong’s axioms to expand these FDs Use the FDs to find the anomalies in the schemas Remove anomalies Decompose the anomalous schemas CS 564 (Fall'17)


Download ppt "Database Management Systems (CS 564)"

Similar presentations


Ads by Google