CMSC424: Database Design Instructor: Amol Deshpande

Slides:



Advertisements
Similar presentations
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Advertisements

Functional Dependencies and Normalization for Relational Databases.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 6 A First Course in Database Systems.
Lossless Decomposition (2) Prof. Sin-Min Lee Department of Computer Science San Jose State University.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Midterm Review II. Redundancy. –Information may be repeated unnecessarily in several tuples. –E.g. length and filmType. Update anomalies. –We may change.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
Instructor: Amol Deshpande  Data Models ◦ Conceptual representation of the data  Data Retrieval ◦ How to ask questions of the database.
603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 8 A First Course in Database Systems.
Decomposition By Timothy Chen CS157A. Goal to Decomposition Eliminate redundancy by decomposing a relation into several relations in a higher normal form.
CMSC424: Database Design Instructor: Amol Deshpande
Closure The closure of {B 1 …B k } under the set of FDs S, denoted by {B 1 …B k } +, is defined as follows: {B 1 …B k } + = {B | any relation satisfies.
CMSC424: Database Design Instructor: Amol Deshpande
Functional Dependencies Definition: If two tuples agree on the attributes A, A, … A 12n then they must also agree on the attributes B, B, … B 12m Formally:
CMSC424: Database Design Instructor: Amol Deshpande
The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.
CMSC424: Database Design Instructor: Amol Deshpande
Decomposition By Yuhung Chen CS157A Section 2 October
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CMSC424: Database Design Instructor: Amol Deshpande
Chapter 8 Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Design Theory for Relational Databases 2015, Fall Pusan National University Ki-Joune Li.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Introduction to Indexes. Indexes An index on an attribute A of a relation is a data structure that makes it efficient to find those tuples that have a.
Functional Dependencies and Normalization for Relational Databases.
Functional Dependencies and Normalization for Relational Databases
Revisit FDs & BCNF Normalization 1 Instructor: Mohamed Eltabakh
1 Functional Dependencies and Normalization Chapter 15.
© D. Wong Ch. 3 (continued)  Database design problems  Functional Dependency  Keys of relations  Decompositions based on Functional Dependency.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Referential Integrity checks, Triggers and Assertions Examples from Chapter 7 of Database Systems: the Complete Book Garcia-Molina, Ullman, & Widom.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
© D. Wong Normalization  Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies.  Decompose relation.
CS 157B Database Systems Dr. T Y Lin. Updates 1.Red color denotes updated data (ppt) 2.Class participation will be part of “extra” credits to to “quiz.
3 Spring Chapter Normalization of Database Tables.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
SQL Exercises – Part I April
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Database design and normalization.
Ch 7: Normalization-Part 1
CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.
CS 157B Database Systems Dr. T Y Lin. 1.2 Overview of a Database Management System Data-Definition Language Commands –Illustrated by three examples.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Databases : SQL Multi-Relations 2007, Fall Pusan National University Ki-Joune Li These slides are made from the materials that Prof. Jeffrey D. Ullman.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
More on Decompositions and Third Normal Form CIS 4301 Lecture Notes Lecture /16/2006.
Relational Database Design by Dr. S. Sridhar, Ph. D
Chapter 8: Relational Database Design
Module 5: Overview of Normalization
Functional Dependencies and Normalization
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

CMSC424: Database Design Instructor: Amol Deshpande

Today… Relational Database Design

Where did we come up with the schema that we used ? E.g. why not store the actor names with movies ? Topics: Formal definition of what it means to be a “good” schema. How to achieve it.

Movies Database Schema Movie(title, year, length, inColor, studioName, producerC#) StarsIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Movie(title, year, length, inColor, studioName, producerC#, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Changed to:

TitleYearLengthinColorStudioNameprodC#StarName Star wars YesFox128Hamill Star wars YesFox128Fisher Star wars YesFox128H. Ford King Kong YesUniversal150Watts King Kong noRKO20Fay Issues: 1.Redundancy  higher storage, inconsistencies (“anomalies”) 2.Need nulls Unable to represent some information without using nulls How to store movies w/o actors (pre-productions etc) ? Movie(title, year, length, inColor, studioName, producerC#, starName)

TitleYearLengthinColorStudioNameprodC#StarNames Star wars YesFox128{Hamill, Fisher, H. ford} King Kong YesUniversal150Watts King Kong noRKO20Fay Issues: 3. Avoid sets - Hard to represent - Hard to query Movie(title, year, length, inColor, studioName, producerC#, starNames)

NameAddress FoxAddress1 Studio2Address1 UniversialAddress2 This process is also called “decomposition” Issues: 4. Requires more joins (w/o any obvious benefits) 5. Hard to check for some dependencies What if the “address” is actually the presC#’s address ? No easy way to ensure that constraint (w/o a join). Split Studio(name, address, presC#) into: Studio1 (name, presC#) Studio2(name, address)??? NamepresC# Fox101 Studio2101 Universial102 Smaller schemas always good ????

movieTitlestarName Star WarsHamill King KongWatts King KongFaye Issues: 6. “joining” them back results in more tuples than what we started with (King Kong, 1933, Watts) & (King Kong, 2005, Faye) This is a “lossy” decomposition We lost some constraints/information The previous example was a “lossless” decomposition. Decompose StarsIn(movieTitle, movieYear, starName) into: StarsIn1(movieTitle, movieYear) StarsIn2(movieTitle, starName) ??? movieTitlemovieYear Star wars1977 King Kong1933 King Kong2005 Smaller schemas always good ????

Desiredata No sets Correct and faithful to the original design Avoid lossy decompositions As little redundancy as possible To avoid potential anomalies No “inability to represent information” Nulls shouldn’t be required to store information Dependency preservation Should be possible to check for constraints

Approach We will encode and list all our knowledge about the schema somehow Functional dependencies (FDs) SSN  name (SSN “implies” length) If two tuples have the same “SSN”, they must have the same “name” movietitle  length --- Not true. But, (movietitle, movieYear)  length --- True. We will define a set of rules that the schema must follow to be considered good “Normal forms”: 1NF, 2NF, 3NF, BCNF, 4NF, … Rules specify constraints on the schemas and FDs

Functional Dependencies Let R be a relation schema   R and   R The functional dependency    holds on R iff for any legal relations r(R), whenever two tuples t 1 and t 2 of r have same values for , they have same values for . t 1 [  ] = t 2 [  ]  t 1 [  ] = t 2 [  ] On this instance, A  B does NOT hold, but B  A does hold A B

Functional Dependencies Difference between holding on an instance and holding on all legal relation Title  Year holds on this instance Is this a true functional dependency ? No. Two movies in different years can have the same name. Can’t draw conclusions based on a single instance TitleYearLengthinColorStudioNameprodC#StarName Star wars YesFox128Hamill Star wars YesFox128Fisher Star wars YesFox128H. Ford King Kong noRKO20Fay

Functional Dependencies Functional dependencies and keys: A key constraint is a specific form of a FD. E.g. if A is a superkey for R, then: A  R Similarly for candidate keys and primary keys.

Definitions A relation instance r satisfies a set of functional dependencies, F, if the FDs hold on the relation F holds on a relation schema R if not legal (allowable) relation instance of R violates it A functional dependency, A  B, is trivial if: B is a subset of A e.g. Movieyear, length  length

Definitions 4. Given a set of functional dependencies, F, its closure, F +, is all FDs that are implied by FDs in F. e.g. If A  B, and B  C, then clearly A  C We will see a formal method for inferring this later

BCNF Given a relation schema R, and a set of functional dependencies F, if every FD, A  B, is either: Trivial A is a superkey of R Then, R is in BCNF (Boyce-Codd Normal Form) Why is BCNF good ?

BCNF What if the schema is not in BCNF ? Decompose (split) the schema into two pieces. Careful: you want the decomposition to be lossless Rule: A decomposition of R into (R1, R2) is lossless, iff: R1 ∩ R2  R1 or R1 ∩ R2  R2