Why ER Model? ER modeling is relatively easy to learn and use ERD shows a concise representation of the real- world in terms of entities and relationships.

Slides:



Advertisements
Similar presentations
Functional Dependencies and Normalization for Relational Databases
Advertisements

primary key constraint foreign key constraint
Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.
Relational Database. Relational database: a set of relations Relation: made up of 2 parts: − Schema : specifies the name of relations, plus name and type.
Review of INFO 605 In the lecture we will summarize the main concepts that were covered in DB1. Refer to the lecture notes of DB1 for more detail.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Ch 10, Functional Dependencies and Normal forms
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
Functional Dependencies and Normalization for Relational Databases.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
1 Normalization. 2 Normal Forms v If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of redundancies are avoided/minimized.
The Relational Model System Development Life Cycle Normalisation
Functional Dependencies and Normalization for Relational Databases
Design Guidelines Normalisation Table Design. Informal Design Guidelines Table Semantics A table should hold information about one and only one entity/concept.
CMSC424: Database Design Instructor: Amol Deshpande
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Database Design Theory Which tables to have in a database Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
1 Multi-valued Dependencies. 2 Multivalued Dependencies There are database schemas in BCNF that do not seem to be sufficiently normalized. Consider a.
Databases 6: Normalization
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Chapter 10 Functional Dependencies and Normalization for Relational Databases Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Week 6 Lecture Normalization
Functional Dependencies
Web-Enabled Decision Support Systems
Concepts and Terminology Introduction to Database.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
IS 230Lecture 8Slide 1 Normalization Lecture 9. IS 230Lecture 8Slide 2 Lecture 8: Normalization 1. Normalization 2. Data redundancy and anomalies 3. Spurious.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
DatabaseIM ISU1 Chapter 10 Functional Dependencies and Normalization for RDBs Fundamentals of Database Systems.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Topic 10 Functional Dependencies and Normalization for Relational Databases Faculty of Information Science and Technology Mahanakorn University of Technology.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
Instructor: Churee Techawut Functional Dependencies and Normalization for Relational Databases Chapter 4 CS (204)321 Database System I.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
FEN Quality checking table design: Design Guidelines Normalisation Table Design Is this OK?
Functional Dependencies and Normalization for Relational Databases.
By Abdul Rashid Ahmad. E.F. Codd proposed three normal forms: The first, second, and third normal forms 1NF, 2NF and 3NF are based on the functional dependencies.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
1 Functional Dependencies and Normalization Chapter 15.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Chapter 7 Functional Dependencies Copyright © 2004 Pearson Education, Inc.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Ch 7: Normalization-Part 1
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 Normalization David J. Stucki. Outline Informal Design Guidelines Normal Forms  1NF  2NF  3NF  BCNF  4NF 2.
COP 6726: New Directions in Database Systems
Functional Dependency and Normalization
Schedule Today: Next After that Normal Forms. Section 3.6.
Relational Database Design by Dr. S. Sridhar, Ph. D
Module 5: Overview of Normalization
Normalization Murali Mani.
Functional Dependencies and Normalization
NORMALIZATION FIRST NORMAL FORM (1NF):
Chapter 7a: Overview of Database Design -- Normalization
CS4222 Principles of Database System
Presentation transcript:

Why ER Model? ER modeling is relatively easy to learn and use ERD shows a concise representation of the real- world in terms of entities and relationships We know how to translate ERDs into relational schema Most CASE tools support one or more variation of ERDs.

ER Model (Peter Chen 1976) Representation Constraints Operations

Representation Entity and Relationship. The three main concepts of ER modeling at a lower level: Entity, Relationships, and Attributes. - Types of Entity:.. (Regular) entity: Has its own identifier.. Weak entity: Its identifier is the concatenation of the identifier of owner entity and its partial key. - Types of Relationships:. By degree:... Recursive relationship... Binary relationships... Ternary relationships

Constraints - Constraints of ER Models: (1) Cardinality constraints: 1:1. 1:N, and M:N (2) Participation constraints: TOTAL (Mandatory) and PARTIAL (Optional) - Entity constraints -- The identifier of an entity cannot be null -- Weak entity constraints -- The concatenated identifier of the weak entity -- Existence dependency

Operations - What kinds of operations does the ER model inherently support? - There have been many research proposals that automatically navigate the ERD to process queries. - However, since we use the ERD as a high level design tool and translate the ERD into RDB, they are not important to our discussion.

Characteristics of Next Generation Databases In this part, we will briefly look at recent trends in database technology. Database systems which will come in next decade is referred to as Next Generation database systems.

Characteristics of Next Generation Databases Rich data model-- which means the new data models will have more data modeling components than ER or relational data model - Object-oriented - Multimedia data - Choices for structures Highly distributed - Heterogeneous environment, WWW - Self-installing, self-managed, highly robust and automatic coordination

Cont. Large storage and memory - Will have more RAM - Many commercial DB systems will have tera/pera bytes or more Component DBMS, DB applications may be built by buying components as we buy HW components - Support portable DBMSs - Need to have public interfaces High-level environment - High level query languages and supporting tools Intelligent processing

Technologies for Next Generation Databases Traditional Database Technology: Extended RDBMS (Relational Technology, Semantic Data models, 4GL, CASE) Object-oriented Technology: OODBMS, ORDBMS, OOA&D, OOP (Rich data model, Natural representation, SW development, Integration, Productivity, Reusability, Component-based) Knowledge Based Techniques: Expert DBMS (Inference, AI Technology, Tools for User Interface, Data Mining, and Knowledge Acquisition, etc.) Hypermedia: Multimedia DBMS (User Interface, Multimedia data, GIS, Imaging DB, VOD (Video on demand)) Online Information Retrieval: (Text database, Information Retrieval, Intelligent Retrieval)

Cont’ Internet, Networking & Distributed systems (WWW interface, internet/intranet, Heterogeneous, Resource- sharing, Robust and automatic coordination, Legacy systems, Client/Server) Mass Storage (Optical disks, Scanning, Electronic publishing, Digital library, DIS) Other trends - Standards (SQL3, OMG, ODMG, CORBA, DCOM...) - High-level environment (HLQL, supporting tools) - Component databases (public interface, interoperable, portable) - Larger memory - Parallelism - New applications (Data Warehousing, Electronic Commerce, Health-care systems, EOSDIS,... )

Normalization - Basic concepts of normalization - Functional Dependency (FD) -- Definitions and Semantics -- FD as integrity constraints -- Armstrong's axioms -- Minimal cover - Lossless join and spurious tuples - Normal forms (1, 2, 3, 4, 5) -- Multi-valued dependencies & 4NF -- Join dependencies & 5NF - Practical ways to use normalization - Denormaliztion techniques

What is the normalization? A process to design a highly desirable relational schemas using relational theory

Why Normalization? - Normal forms are guidelines for relational database design -- Minimize redundancy -- Avoid potential inconsistency - Can predict the behavior (problems) of database systems - Avoid update anomalies discussed below

What if we don't normalize our DB schema? Your DB will have the following update anomalies. Insertion problem Deletion problem Update problem

Hierarchy of normal forms The normal forms from less strict to more strict: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF We can directly decompose into BCNF or 3NF without going through 2NF. Note that BCNF (Boyce-Codd normal form) is a variation of 3NF. In most cases, 3NF and BCNF are the same and we will not discuss it in this course.

FD (Functional Dependency) FD is a way of representing relationships among attributes in a relation. Notation: X --> Y, where both X and Y can be a group of attributes, X: LHS, Y: RHS We say that 1. X uniquely determines Y, 2. For a given value of X, there is at most one value of Y associated with X at a time.

Example Suppose we have R(A, B, C, D, E, F) and data instances as follows: A BCDEF a1b1c1d1e1f1 a1b2c1d1e2f2 a2b2c2d1e2f2 a3b3c1d2e3f1

Which one is valid? Based on relation above, which of the following FDs are valid? (a)A -> C(b) C -> A (b)(c) B -> E(d) C -> D (e) B -> F(f) BD -> E (g) CD -> E(h) F -> B

SELECTIVE ANSWERS: (a) A --> C: This is True since for each a1 value, we have the same c1. Note that it doesn't matter that both a1 and a3 ends up with the same c1. This is similar to the fact that two different employees may have the same age. (b) C --> A: This is false since c1 maps to both a1 and a3. The examples of (a) and (b) show that FDs are not symmetric. That is, the fact that A-->C is true doesn't mean C--> A is true. (c) B --> E is True (d) C --> D is False. (f) BD--> E is True. Since we have two attributes in LHS, we have to consider the pair of value together as a single of the LHS. (g) CD -> E is False.

FD The FDs in a given relation are determined by semantics of the relation, not by data instances.

Example TEACH (Teacher, Course, Text) TeacherCourseText SmithDSBartram SmithDBMSAl-nour HallCompilersHoffman BrownDSAugen

Example - TEACH looks to satisfy TEXT --> COURSE since each text ends up with different course. - However, it don't semantically make sense to determine the course by the text book since two different courses could use the same book. SO, TEXT --> COURSE is False. - However, instances can be used to disprove a FD TEACHER -\-> COURSE since two teacher (Smith) teaches two different courses. - The correct FD of this relation is TEACHER+COURSE --> TEXT. - What else can be disproved from the above data instances?

FD FD as an integrity constraint Example WORK (EMP#, DEPT, LOC) FDs of WORK are: EMP# --> DEPT DEPT --> LOC

Example Suppose WORK table has the following three instances: EMP#DEPTLOC E1D1Market E2D2Walnut E3D1Market

Example Which of the following are valid or invalid? and why? (Hints: check whether or not your insertion or update would violate any existing FD!) INSERTUPDATE (1)E1D1Walnut (2)E1D2Walnut (3)E5D3Market

SOLUTION for INSERTION (1) The first insertion is invalid since doing so would violate the FD DEPT--> LOC. (2) The second insertion is also invalid since doing so would violate the FD EMP#--> DEPT (3) The third insertion is allowed since doing so does not violate any FD.

SOLUTION FOR UPDATE (1) This means changing Market to Walnut. This update would violate FD DEPT --> LOC. (2) This means changing E2 of the 2nd tuple to E1. This update would violate FD EMP# --> DEPT (3) The third update is INSERTION and valid

Example BORROW (Loan#, Bname, Cname, Amount) and FDs: Loan# --> Amount Loan# --> Bname What is the semantic difference between the following two FDs? (1) Loan# --> Cname (2) Loan# -\-> Cname

Example (1) means there is only one customer for each loan, which means a loan cannot be checked out by the husband and wife together, for example. (2) means for each loan, they may be more than customers

How to find FDs? - List only most direct FDs, not indirect FDs (e.g., SSN --> DLOC is an indirect FD) -List only non-trivial FDs (e.g., SSN --> SSN is a trivial FD) -Do not include redundant attributes in an FD in either LHS or RHS (e.g., SSN, ENAME --> ENAME, BDATE, ADDRESS has a redundant attribute in LHS (ENAME))

Example from Book: EMP_DEPT ( ENAME, SSN, BDATE, ADDRESS, DNUMBER, DNAME, DMGRSSN, DLOC) The valid FDs in this relation are: (1) SSN --> ENAME, BDATE, ADDRESS, DNUMBER (2) DNUMBER --> DNAME, DMGRSSN, DLOC

Transitive dependency (TD) If A --> B and B --> C, then A --> C is called a TD.

Find a TD in the above EMP_DEPT One TD is: SSN --> DNAME since SSN --> DNUMBER and DNUMBER --> DNAME. Two other TDs are SSN --> DMGRSSN and SSN --> DLOC

Candidate key (CK) and FDs The CK can determine all other attributes of the R(A, B, C, D, E, F). Suppose we have two CKs, CK = {A, BD} Then, A --> B, C, D, E. F BD --> A, C, E, F

Algorithm for Finding a Key Once we find a minimal cover, we can find a key using the following algorithm. (1) Find attributes not appearing in the RHS of any FDs. Then, these are part of any candidate keys. (2) Check whether they can determine all other attributes by using FDs. (3) If not, what other attributes do I need to add to determine all other attributes?

Examples STORE (SNAME, ADDR, ZIP, ITEM, PRICE) FDs:SNAME --> ADDR ADDR --> ZIP SNAME, ITEM --> PRICE

Finding a key: (1) SNAME does not appear in RHS, so SNAME must be a part of the key. (2) since SNAME --> ADDR --> ZIP, we know SNAME --> ADDR, ZIP (3) But SNAME alone cannot determine any more. How can we determine ITEM and PRICE ? If we have ITEM, then we can determine PRICE So, SNAME, ITEM --> SNAME, ADDR, ZIP, ITEM, PRICE so it satisfies the definition of the key.

Examples of Finding a Key for relation R (A, B, C, D) FDsKey (a) A--> C B --> D C --> D (b)A -->B B --> C A --> D D --> A (c) A --> D D --> A C --> B

ANSWER (a) {AB} (b) {A, D} Note that A and D are in 1:1 relationship since A --> D and D --> A. (c) {CA, CD} Note that A-->D and D --> A.

Lossless-Decomposition and Spurious Tuples - Decomposition means dividing a table into multiple tables. - Decomposition is lossless if it is possible to reconstruct R from decomposed relations using JOINs. Condition for Lossless Join when R was decomposed into R1, R2,...., Rn R = R1¥ R2 ¥ R3 ¥.... ¥ Rn, where ¥ means JOIN operation.

Cont. Why need it ? To maintain the accurate database What if not ? Cause wrong answers for queries How to check ? It is sufficient if any Ri contains a candidate key of R when we used the normalization algorithms for 3NF/BCNF

Cont. This means that if any of the decomposed relation contains a CK (or PK) of the original relation, then the decomposition is called lossless. This means by joining all the decomposed relations, we can reconstruct the original relation

Example LOAN_ACC (L#, AMT, ACC#, BAL) L# --> AMT ACC# --> BAL Key ? L# + ACC# Possible decomposition: R1(L#, AMT)R2 (ACC#, BAL) The decomposition is not loss-less, since R1 or R2 does not have a candidate key. (Note that we cannot correlate L# and ACC#)

Example) WORK (EMP#, DEPT, LOC) EMP# --> DEPT DEPT --> LOC Key ? EMP#, since EMP# --> DEPT, LOC Decomposition R1 (EMP#, DEPT) R2(DEPT, LOC) The decomposition is lossless, since R1 contains a candidate key.

Spurious Tuples Spurious Tuples are those that appear in the result of lossy decomposition, but that do not exist in the original relation R.

Example) A BC a1b1c1 a2b2c2 a3b1c1 a3b2c2

Cont Lossy decompositionLoss-less decomposition R1 R2 R3 R4 A B ACABBC a1 b1 a1c1a1b1b1c1 a2 b2 a2c2a2b2b2c2 a3 b1 a3c1a3b1 a3 b2 a3c2a3b2

Perform the join between R1(A,B) ¥ R2(A,C): A BC a1b1c1 a2b2c2 a3b1c1 a3b1c2* a3b2c1* a3b2c2

Cont The two tuples with * are spurious tuples that do not exist in the original relation R. - Perform the join between R3(A,B)¥ R4(B,C): The result should be the same as the original R.

Questions: Why does R1 JOIN R2 cause lossy decomposition and result in spurious tuples? Because the decomposition of R into R1 and R2 didn't follow the FDs. The FDs in R are: A --> B B --> C The decomposition that follows the FDs are lossless as shown in R3(A,B) and R4(B,C). This means: - When we normalize we decompose based on FDs, not randomly. - After decomposition, one of decomposed relation Ri must contain a CK to be lossless.