CM20145 Further DB Design – Normalization

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Functional Dependencies and Normalization for Relational Databases
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Dr. Kalpakis CMSC 461, Database Management Systems URL: Relational Database Design.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Functional Dependencies and Normalization for Relational Databases.
Normalization and Other Data Modeling Methods There are many paths to the top of the mountain but the view is always the same Chinese proverb.
©Silberschatz, Korth and Sudarshan Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Functional Dependency CS157a Sec. 2 Koichiro Hongo.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Relational Database Design - part 2 - Database Management Systems I Alex Coman,
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
1 Multi-valued Dependencies. 2 Multivalued Dependencies There are database schemas in BCNF that do not seem to be sufficiently normalized. Consider a.
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
Chapter 14 Advanced Normalization Transparencies © Pearson Education Limited 1995, 2005.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Computing & Information Sciences Kansas State University Wednesday, 01 Oct 2008CIS 560: Database System Concepts Lecture 15 of 42 Wednesday, 01 October.
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design Pitfalls in.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Functional Dependencies and Normalization for Relational Databases.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
CSE314 Database Systems Basics of Functional Dependencies and Normalization for Relational Databases Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E.
1 Functional Dependencies and Normalization Chapter 15.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Dr. Mohamed Osman Hegaz1 Logical data base design (2) Normalization.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
Normalization and Other Data Modeling Methods There are many paths to the top of the mountain but the view is always the same Chinese proverb.
Ch 7: Normalization-Part 1
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Computing & Information Sciences Kansas State University Friday, 03 Oct 2007CIS 560: Database System Concepts Lecture 16 of 42 Wednesday, 03 October 2007.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Advanced Normalization
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Relational Database Design by Dr. S. Sridhar, Ph. D
Relational Database Design
Chapter 8: Relational Database Design
Advanced Normalization
Chapter 7: Relational Database Design
Module 5: Overview of Normalization
Functional Dependencies and Normalization
Normalization.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Chapter 7: Relational Database Design
Instructor: Mohamed Eltabakh
Relational Database Design
Asst.Prof.Dr.İlker Kocabaş UBİ502 at
Chapter 7: Relational Database Design
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

CM20145 Further DB Design – Normalization Dr Alwyn Barry Dr Joanna Bryson CM20145 Further DB Design – Normalization

Last Time Database design is an ongoing, iterative process. Requirements come from data, user demands, design issues. Change occurs: Corporations & technologies grow. Programmers & users learn. Views / security. Lossless-join decomposition Now: Science for improving design.

Design Process & Normalization We assume a schema R is given. R could have been generated when converting E-R diagram to a set of tables. R could have been a single relation containing all attributes that are of interest (called universal relation). Normalization breaks R into smaller relations. R could be the result of any ad hoc design of relations, which we then test & convert to normal form.

Overview First Normal Form. Functional Dependencies. Second Normal Form. Third Normal Form. Boyce-Codd Normal Form. Fourth Normal Form. Fifth Normal Form. Domain Key / Normal Form. Design Process & Problems.

BAD First Normal Form – 1NF You aren’t supposed to have more than one value per attribute of a tuple. All tuples have the same number of attributes. Necessary for a relational database. Name Office Office Hours Barry 2.23 1pm, 4pm Bryson L2.27 11am, 5pm BAD

Getting Caught Out With 1NF A domain is atomic if its elements are considered to be indivisible units. Examples of non-atomic domains: Set-valued attributes, composite attributes. Identifiers like CS101 that can be broken up into parts. A relational schema R is in first normal form if the domains of all attributes of R are atomic. Non-atomic values: complicate storage, encourage redundancy, Depend on interpretation built into application programs.

Are You Atomic? Atomicity is not an intrinsic property of the elements of the domain. Atomicity is a property of how the elements of the domain are used. E.g. strings containing a possible delimiter (here: a space) cities = “Melbourne Sydney” (non-atomic: space separated list) surname = “Fortescue Smythe” (atomic: compound surname) E.g. strings encoding two separate fields bucs_login = cssjjb If the first two characters are extracted to find the department, the domain bucs_login is not atomic. Leads to encoding of information in application program rather than in the database.

Second Normal Form (2NF) Violated when a nonkey column is a fact about part of the primary key. A column is not fully functionally dependent on the full primary key. CUSTOMER-CREDIT in this case: From Watson ORDER ITEMID CUSTOMERID QUANTITY CUSTOMER-CREDIT Desk JJB 25 OK Chair AMB 3 POOR BAD FIX ITEM *itemid … ORDER quantity … CUSTOMER *customerid customer-credit …

Def: Functional Dependency Let R be a relation schema   R and   R The functional dependency (FD)    holds on R (“ is FD on ”) iff for any legal relations r(R): whenever any two tuples t1 and t2 of r agree on the attributes  they also agree on the attributes . i.e. (t1) = (t2)   (t1) =  (t2) Example: Consider r(A,B) with the following instance of r: A  B does NOT hold, but B  A does hold A: Initials B: Chore JJB Grading AMB Setting Tutorials Writing Unit Reviews

Functional Dependencies: Uses Way to encode “business rules”. Specify constraints on the set of legal relations. We say that F holds on R if all legal relations on R satisfy the set of FDs F. Test relations to see if they are legal under a given set of FDs. If a relation r is legal under a set F of FDs, we say that r satisfies F.

Functional Dependencies An FD is an assertion about a schema, not an instance. If we only consider an instance or a few instances, we can’t tell if an FD holds. Inspecting only a few bird species (e.g. crows, cardinals and canaries) we might conclude colour  species. However, this would be a bad FD as we would find out if we found some ravens. Thus, identifying FDs is part of the data modelling process.

Trivial Functional Dependencies An FD is trivial if it is satisfied by all instances of a relation E.g. customer-name, loan-number  customer-name customer-name  customer-name In general,    is trivial if    Permitting such FDs makes certain definitions and algorithms easier to state.

Functional Dependency vs Key FDs can express the same constraints we could express using keys: Superkeys: K is a superkey for relation schema R if and only if K  R Candidate keys: K is a candidate key for R if and only if K  R, and there is no K’  K such that K’  R Of course, which candidate key becomes the primary key is arbitrary.

FDs <> Keys FDs can represent more information than keys can on their own. Consider the following Loan-info-schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). We expect these FDs to hold: loan-number  amount loan-number  branch-name We could try to express this by making loan-number the key, however the following FD does not hold: loan-number  customer-name Incidentally, this isn’t a very good table! (¬2NF)

FD Closure Given a set F of FDs, other FDs are logically implied. E.g. If A  B and B  C, we can infer that A  C The set of all FDs implied by F is the closure of F, written F+ . Find F+ by applying Armstrong’s Axioms: if   , then    (reflexivity) if   , then      (augmentation) if   , and   , then    (transitivity) Additional rules (derivable from Armstrong’s Axioms): If    and    holds, then     holds (union) If     holds, then    holds and    holds (decomposition) If    holds and     holds, then     holds (pseudotransitivity)

Bad Decomposition Example (From Last Time) A Non Lossless-Join Decomposition R = (A, B)  R1 = (A), R2 = (B) A B A B A B   1 2   1 2   1 2 B(r) A(r) r A (r) ⋈ B (r) Thus, r is different to A (r) ⋈ B (r) So A,B is not a lossless-join decomposition of R.

FDs & Lossless Decomposition All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1  R2 Lossless-join decomposition. For all possible relations r on schema R r = R1 (r) ⋈ R2 (r) A decomposition of R into R1 and R2 is lossless-join if and only if at least one of the following dependencies is in F+: R1  R2  R1 R1  R2  R2

Second Normal Form (2NF) Violated when a nonkey column is a fact about part of the primary key. A column is not fully functionally dependent on the full primary key. CUSTOMER-CREDIT in this case: From Watson ORDER ITEMID CUSTOMERID QUANTITY CUSTOMER-CREDIT Desk JJB 25 OK Chair AMB 3 POOR BAD FIX ITEM *itemid … ORDER quantity … CUSTOMER *customerid customer-credit …

FIX BAD Third Normal Form (3NF) Violated when a nonkey column is a fact about another nonkey column. A column is not fully functionally dependent on the primary key. R is 3NF iff R is 2NF and has no transitive dependencies. EXCHANGE RATE violates this. FIX STOCK STOCK CODE NATION EXCHANGE RATE GOOG USA 0.67 NOK FIN 0.46 BAD

BAD FIX Boyce-Codd (BCNF) Arises when a table: has multiple candidate keys, the candidate keys are composite, the candidate keys overlap. R is BCNF iff every determinant is a cand. key. E.g. Assume one consultant per problem per client, and one problem per consultant. If client-problem is the primary key, how do you add a new consultant? Like 3NF but now worry about all fields. BAD ADVISOR CLIENT PROBLEM CONSULTANT Alpha Marketing Gomez Production Raginiski FIX

Design Goals & their discontents Goals for a relational database design: eliminate redundancies by decomposing relations, must be able to recover original data using lossless joins, prefer not to loose dependencies. BCNF: no redundancies, no guarantee of dependency preservation. 3NF: dependency preservation, but possible redundancies.

Fourth normal form (4NF) A row should not contain two or more independent multivalued facts. 4NF iff BCNF & no non-trivial multi-valued dependencies. Multivalued dependency means the value of one attributed determines a set of values for another. BAD STUDENT STUDENTID SPORT SUBJECT … 50 Football English Music Tennis Botany Karate FIX

Fifth normal form (5NF) 5NF iff a relation has no join dependency. The schemas R1, R2,.., Rn have a join dependency over R if they define a lossless-join decomposition over R. This is way too complicated, don’t worry about it.

Domain Key Normal Form Every constraint on the relation must be a logical consequence of the domain constraints and the key constraints that apply to the relation. Key: unique identifier. Constraint: rule governing attribute values. Domain: set of values of the same data type. No known algorithm gives DK/NF.

E-R Model and Normalization When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated should not need further normalization. However, in a real (imperfect) design there can be FDs from non-key attributes of an entity to other attributes of the entity. The keys identified in E-R diagrams might not be minimal - FDs can help us to identify minimal keys. FDs from non-key attributes of a relationship set are possible, but rare.

Denormalization & Performance May want to use non-normalized schema for performance. E.g. displaying customer-name along with account-number and balance requires join of account with depositor. Alternative 1: Use denormalized relation containing attributes of account as well as depositor. faster lookup. extra space and extra execution time for updates. extra coding work for programmer and possibility of error in extra code. Alternative 2: use a materialized view defined as account ⋈ depositor as above, except less extra coding, errors.

Limits of Normalization Examples of bad database design, not caught by normalization. Good: earnings(company-id, year, amount) Bad: earnings-2000, earnings-2001, earnings-2002, etc., on (company-id, earnings) all are BCNF, but querying across years difficult needs a new table each year company-year(company-id, earnings-2000,earnings-2001, earnings-2002) in BCNF, but querying across years difficult requires new attribute each year

Summary 1 – Rules to Watch 1NF: attributes not atomic. 2NF: non-key attribute FD on part of key. 3NF: one non-key attribute FD on another. Boyce-Codd NF: overlapping but otherwise independent candidate keys. 4NF: multiple, independent multi- valued attributes. 5NF: join dependency. Domain Key / NF: all constraints either domain or key

Summary 2 – Concepts Functional Dependencies: Axioms & Closure. Lossless-join decomposition. Design Process. Normalization Problems. Next: Interfaces and Architectures

Reading & Exercises Reading Exercises: Connolly & Begg Chapter (13, 14) Silberschatz Chapters 7. Any other book, the design/normalization chapter. Exercises: Silberschatz 7.1, 7.2, 7.16, 7.23, 7.24, 7.27-29

Next Week Architectures and Implementations Integrity and Security

Slides after and including this one you are not responsible for, but I am saving in case I decide to use them in the future.

Goal: Formalize “Good Design” Process: Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that: each relation is in good form, the decomposition is a lossless-join decomposition. Theory: Constraints on the set of legal relations. Require that the value for a certain set of attributes determines uniquely the value for another set of attributes – functional dependencies.