7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

Slides:



Advertisements
Similar presentations
Introduction to Database System Wei-Pang Yang, IM.NDHU, Final Test-1 Example: Banking Database 1. branch 2. customer 客戶 ( 存款戶, 貸款戶 ) 5. account.
Advertisements

Installment 7 Tables With No Column Presented by rexmen 2001 資管所.林彥廷.
1/22/20091 Study the methods of first, second, third, Boyce-Codd, fourth and fifth normal form for relational database design, in order to eliminate data.
Wei-Pang Yang, Information Management, NDHU More on Normalization Unit 18 More on Normalization ( 表格正規化探討 ) 18-1.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Normalisation The theory of Relational Database Design.
第七章 抽樣與抽樣分配 蒐集統計資料最常見的方式是抽查。這 牽涉到兩個問題: 抽出的樣本是否具有代表性?是否能反應出母體的特徵?
: A-Sequence 星級 : ★★☆☆☆ 題組: Online-judge.uva.es PROBLEM SET Volume CIX 題號: Problem D : A-Sequence 解題者:薛祖淵 解題日期: 2006 年 2 月 21 日 題意:一開始先輸入一個.
3Com Switch 4500 切VLAN教學.
Reference, primitive, call by XXX 必也正名乎 誌謝 : 部份文字取於前輩 TAHO 的文章.
1 Web of Science 利用指引 單元二 瀏覽與處理查詢結果. 2 瀏覽檢索結果 查出的結果,預設以時間排列, 使用者可改變結果的排列方式: 還可以依被引用次數、相關度、 第一作者、刊名、出版年等排序 回到前先查的結果畫面 點選想看資料的完整書目 本館訂購範圍的期刊 全文,便可直接連結.
: OPENING DOORS ? 題組: Problem Set Archive with Online Judge 題號: 10606: OPENING DOORS 解題者:侯沛彣 解題日期: 2006 年 6 月 11 日 題意: - 某間學校有 N 個學生,每個學生都有自己的衣物櫃.
貨幣創造與控制 CHAPTER 27 學習本章後,您將能: C H A P T E R C H E C K L I S T 解釋銀行如何藉由放款而創造貨幣 1 解釋中央銀行如何影響貨幣數量 2.
Lecture Note of 9/29 jinnjy. Outline Remark of “Central Concepts of Automata Theory” (Page 1 of handout) The properties of DFA, NFA,  -NFA.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆 資料分析與表達.
Monte Carlo Simulation Part.2 Metropolis Algorithm Dept. Phys. Tunghai Univ. Numerical Methods C. T. Shih.
2009fallStat_samplec.i.1 Chap10 Sampling distribution (review) 樣本必須是隨機樣本 (random sample) ,才能代表母體 Sample mean 是一隨機變數,隨著每一次抽出來的 樣本值不同,它的值也不同,但會有規律性 為了要知道估計的精確性,必需要知道樣本平均數.
Introduction to Java Programming Lecture 17 Abstract Classes & Interfaces.
: The largest Clique ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11324: The largest Clique 解題者:李重儀 解題日期: 2008 年 11 月 24 日 題意: 簡單來說,給你一個 directed.
Chapter 20 塑模動態觀點:狀態圖 Statechart Diagram. 學習目標  說明狀態圖的目的  定義狀態圖的基本記號  展示狀態圖的建構  定義活動、內部事件及遞延事件的狀態 圖記號.
: Happy Number ★ ? 題組: Problem Set Archive with Online Judge 題號: 10591: Happy Number 解題者:陳瀅文 解題日期: 2006 年 6 月 6 日 題意:判斷一個正整數 N 是否為 Happy Number.
Part 6 Chapter 15 Normalization of Relational Database Csci455 r 1.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
: Count DePrimes ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11408: Count DePrimes 解題者:李育賢 解題日期: 2008 年 9 月 2 日 題意: 題目會給你二個數字 a,b( 2 ≦ a ≦ 5,000,000,a.
: Multisets and Sequences ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 11023: Multisets and Sequences 解題者:葉貫中 解題日期: 2007 年 4 月 24 日 題意:在這個題目中,我們要定義.
公司加入市場的決定. 定義  平均成本 = 總成本 ÷ 生產數量 = 每一單位產量所耗的成本  平均固定成本 = 總固定成本 ÷ 生產數量  平均變動成本 = 總變動成本 ÷ 生產數量.
:Nuts for nuts..Nuts for nuts.. ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 10944:Nuts for nuts.. 解題者:楊家豪 解題日期: 2006 年 2 月 題意: 給定兩個正整數 x,y.
資料結構實習-一 參數傳遞.
公用品.  該物品的數量不會因一人的消費而受到 影響,它可以同時地被多人享用。 角色分配  兩位同學當我的助手,負責:  其餘各人是投資者,每人擁有 $100 , 可以投資在兩種資產上。  記錄  計算  協助同學討論.
: Beautiful Numbers ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 11472: Beautiful Numbers 解題者:邱經達 解題日期: 2011 年 5 月 5 日 題意: 若一個 N 進位的數用到該.
Section 4.2 Probability Models 機率模式. 由實驗看機率 實驗前先列出所有可能的實驗結果。 – 擲銅板:正面或反面。 – 擲骰子: 1~6 點。 – 擲骰子兩顆: (1,1),(1,2),(1,3),… 等 36 種。 決定每一個可能的實驗結果發生機率。 – 實驗後所有的實驗結果整理得到。
函式 Function Part.2 東海大學物理系‧資訊教育 施奇廷. 遞迴( Recursion ) 函式可以「呼叫自己」,這種動作稱為 「遞迴」 此程式的執行結果相當於陷入無窮迴圈, 無法停止(只能按 Ctrl-C ) 這給我們一個暗示:函式的遞迴呼叫可以 達到部分迴圈的效果.
JAVA 程式設計與資料結構 第二十章 Searching. Sequential Searching Sequential Searching 是最簡單的一種搜尋法,此演 算法可應用在 Array 或是 Linked List 此等資料結構。 Sequential Searching 的 worst-case.
演算法 8-1 最大數及最小數找法 8-2 排序 8-3 二元搜尋法.
: Expect the Expected ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11427: Expect the Expected 解題者:李重儀 解題日期: 2008 年 9 月 21 日 題意:玩一種遊戲 (a game.
逆向選擇和市場失調. 定義  資料不對稱 在交易其中,其中一方較對方有多些資料。  逆向選擇 出現在這個情況下,就是當買賣雙方隨意在 市場上交易,與比較主動交易者作交易為佳 。
845: Gas Station Numbers ★★★ 題組: Problem Set Archive with Online Judge 題號: 845: Gas Station Numbers. 解題者:張維珊 解題日期: 2006 年 2 月 題意: 將輸入的數字,經過重新排列組合或旋轉數字,得到比原先的數字大,
Chapter 10 m-way 搜尋樹與B-Tree
演算法課程 (Algorithms) 國立聯合大學 資訊管理學系 陳士杰老師 Course 7 貪婪法則 Greedy Approach.
: Finding Paths in Grid ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11486: Finding Paths in Grid 解題者:李重儀 解題日期: 2008 年 10 月 14 日 題意:給一個 7 個 column.
著作權所有 © 旗標出版股份有限公司 第 14 章 製作信封、標籤. 本章提要 製作單一信封 製作單一郵寄標籤.
幼兒行為觀察與記錄 第八章 事件取樣法.
: How many 0's? ★★★☆☆ 題組: Problem Set Archive with Online Judge 題號: 11038: How many 0’s? 解題者:楊鵬宇 解題日期: 2007 年 5 月 15 日 題意:寫下題目給的 m 與 n(m
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. 肆 資料分析與表達.
Week 6 Lecture Normalization
Entity-Relationship (ER) Data Model 概念資料模式 (Based on Chapter 3 in Fundamentals of Database Systems by Elmasri and Navathe, Ed. 4)
6.8 Case Study: E-R for Supplier-and-Parts Database
Logical Database Design Unit 7 Logical Database Design.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Logical Database Design ( 補 ) Unit 7 Logical Database Design ( 補 )
Wei-Pang Yang, Information Management, NDHU Normalization Unit 7 Normalization ( 表格正規化 ) 7-1.
Chapter 13 Further Normalization II: Higher Normal Forms.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Further Normalization II: Higher Normal Forms Prof. Yin-Fu Huang CSIE, NYUST Chapter 13.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
Further Normalization I
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
Introduction to Database System Wei-Pang Yang, IM.NDHU, Final Test-1 Example: Banking Database 1. branch 2. customer 客戶 ( 存款戶, 貸款戶 ) 5. account.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
1 Functional Dependencies and Normalization Chapter 15.
Chapter 2: Introduction to Relational Model
Normalization.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
Advanced Database System
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Unit 7 Normalization (表格正規化).
Example: Banking Database
Presentation transcript:

7-1 Unit 4 Relational Database Design 英文版: Chap 7 “Relational Database Design” 中文版:第 6 章 “ 關聯式資料庫設計 ”

4-2 Contents  4.1 Introduction  4.2 Functional Dependency  4.3 First, Second, and Third Normal Forms (1NF, 2NF, 3NF)  4.4 Boyce/Codd Normal Form (BCNF)  4.5 Fourth Normal Form (4NF)  4.6 Fifth Normal Form (5NF)  4.7 The Entity-Relationship ModelThe Entity-Relationship Model

Introduction  Logical Database Design vs. Physical Database Design  Problem of Normalization  Normal Forms

4-4  請查出本公司所採購,價格最貴的零件之相關資料, 任取前二名。  (SP, PN, city, price, qty, weight, color) select S.SN, P.PN, city, price, qty, weight, color from S, P, SP where S.SN=SP.SN and P.PN=SP.PN order by price desc limit 20;

4-5 Logical Database Design  Logical database design vs. Physical database design  Logical Database Design Normalization Semantic Modeling, eg. E-R model  Problem of Normalization Given some body of data to be represented in a database, how to decide the suitable logical structure they should have? what relations should exist? what attributes should they have? 正確性等特質 v.s. 運作效率 分割表格,考量內外要素

4-6 Problem of Normalization S1, Smith, 20, London, P1, Nut, Red, 12, 300 S1, Smith, 20, London, P2, Bolt, Green, 17, 200. S4, Clark, 20, London, P5, Cam, Blue, 12, 400 Normalization S S# SNAME STATUS CITY s1.. London.. P SP P# S# P# QTY... S' S# SNAME STATUS S1 Smith. S P SP' P# S# CITY P# QTY S1 London P1 300 S1 London P ( 異常 ) Redundancy Update Anomalies! or E.g. 2: (‘A-Rod’,’R’,'R'…) 資料隨著球員表, 不必記在每場比賽 Dimension 最大的 集合,保險起見, 資料可能不會全部 用到,但總有用到 之時。

4-7 Normal Forms  A relation is said to be in a particular normal form if it satisfies a certain set of constraints. 1NF: A relation is in First Normal Form (1NF) iff it contains only atomic values. universe of relations (normalized and un-normalized) 1NF relations (normalized relations) 2NF relations 3NF relations 4NF relations BCNF relations 5NF relations Fig. 4.1: Normal Forms

Functional Dependency  Functional Dependency (FD)  Fully Functional Dependency (FFD)

4-9 Functional Dependency  Functional Dependency Def: Given a relation R, R.Y is functionally dependent on R.X iff each X-value has associated with it precisely one Y-value (at any time). Note: X, Y may be the composite attributes.  Notation: R.X R.Y read as "R.X functionally determines R.Y" R X Y E.g. : (ID,Name,…) 身份證字號決定姓名也就決定; 但也許 (ID, time) 才是這個表格的 Primary Key i.e. 同樣的 ID 會重複 出現,而計入 (ID, time) 就不會 重複出現 回顧: Primary Key, Candidate Key 有 F.D. 性質 但就今天的課程而言,這是結果論。 (Key  FD)

4-10 Functional Dependency (cont.) S S.S# S.SNAME S.S# S.STATUS S.S# S.CITY S.STATUS S.CITY FD Diagram: S# SNAME CITY STATUS S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30 Paris S4 Clark 20 London S5 Adams 30 Athens S Note: Assume STATUS is some factor of Supplier and no any relationship with CITY.

4-11 Functional Dependency (cont.) P SP P# PNAME CITY_P COLOR WEIGHT S# P# QTY  If X is a candidate key of R, then all attributes Y of R are functionally dependent on X. (i.e. X Y) P# PNAME COLOR WEIGHT CITY_P P1 Nut Red 12 London P2 Bolt Green 17 Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London P5 Cam Blue 12 Paris P6 Cog Red 19 London P S# P# QTY S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S1 P6 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P2 200 S4 P4 300 S4 P5 400 SP

4-12 Fully Functional Dependency (FFD)  Def: Y is fully functionally dependent on X iff (1) Y is FD on X (2) Y is not FD on any proper subset of X. SP' (S#, CITY, P#, QTY) S# P# QTY FFD S# CITY P# QTY S1 London P1 300 S1 London P2 200 … …. …... S# P# CITY FD not FFD S# CITY FD S# CITY … ….. S#city S1 London SP' Naturally FFD, Since S# is not a Composite attribute.

4-13 Fully Functional Dependency (cont.) 1. Normally, we take FD to mean FFD. 2. FD is a semantic notion. S# CITY Means: each supplier is located in precisely one city. 3. FD is a special kind of integrity constraint. CREATE INTEGRITY RULE SCFD CHECK FORALL SX FORALL SY (IF SX.S# = SY.S# THEN SX.CITY = SY.CITY); 4. FDs considered here applied within a single relation. SP.S# S.S# is not considered! FD (FFD) 只適用於討論單一表格 內部屬性 (attribute) 間的性質 但是 2NF 之後,這樣的指定 變成多餘 ( 復習下頁, SX=S’, SY=SP’)

4-14 Problem of Normalization S1, Smith, 20, London, P1, Nut, Red, 12, 300 S1, Smith, 20, London, P2, Bolt, Green, 17, 200. S4, Clark, 20, London, P5, Cam, Blue, 12, 400 Normalization S S# SNAME STATUS CITY s1.. London.. P SP P# S# P# QTY... S' S# SNAME STATUS S1 Smith. S P SP' P# S# CITY P# QTY S1 London P1 300 S1 London P ( 異常 ) Redundancy Update Anomalies! or E.g. 2: (‘A-Rod’,’R’,'R'…) 資料隨著球員表, 不必記在每場比賽 Dimension 最大的 集合,保險起見, 資料可能不會全部 用到,但總有用到 之時。 Recall:

First, Second, and Third Normal Forms (1NF, 2NF, 3NF)

4-16 Normal Forms: 1NF  Def: A relation is in 1NF iff all underlying simple domains contain atomic values only. S# STATUS CITY (P#, QTY) S1 20 London {(P1, 300), (P2, 200),..., (P6, 100)} S2 10 Paris {(P1, 300), (P2, 400)} S3 10 Paris {(P2, 200)} S4 20 London {(P2, 200), (P4, 300), (P5, 400)} S # STATUS CITY P# QTY S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400 Key:(S#,P#), Normalized 1NF FIRST Suppose 1. CITY is the main office of the supplier. 2. STATUS is some factor of CITY fact

4-17 1NF Problem: Update Anomalies! Update If suppler S1 moves from London to Paris, then 6 tuples must be updated! Insertion Cannot insert a supplier information if it doesn't supply any part, because that will cause a null key value. Deletion Delete the information that "S3 supplies P2", then the fact "S3 is located in Paris" is also deleted. S# STATUS CITY P# QTY..... S3 20 Paris P S5 30 Athens NULL NULL FIRST S # STATUS CITY P# QTY S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400 Key:(S#,P#), Normalized 1NF FIRST 1NF 還必須改進之處 Insert(S1,20,”Seattle”,P7,60) : inconsistency 不一起改會有 inconsistency 的問題,如人 - 住址

4-18 1NF Problem: Update Anomalies! (cont.) Suppose 1. CITY is the main office of the supplier. 2. STATUS is some factor of CITY Primary key of FIRST: (S#, P#) FD diagram of FIRST: S# P# CITY STATUS QTY FF D x FFD FFD FD: 1. S# STATUS 2. S# CITY 3. CITY STATUS 4. (S#, P#) QTY FFD primary key (S#, P#) STATUS primary key (S#, P#) CITY S # STATUS CITY P# QTY S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400 Key:(S#,P#), Normalized 1NF FIRST 此處 “CITY  STATUS” 必須到 講解 2NF/3NF 才會用到,但因第一次出現 FD diagram 故全畫出來

4-19 Normal Form: 2NF  Def: A relation R is in 2NF iff (1) R is in 1NF (i.e. atomic ) (2) Non-key attributes are FFD on primary key. (e.g. QTY, STATUS, CITY in FIRST) FIRST is in 1NF, but not in 2NF (S#, P#) STATUS, and (S#, P#) CITY FFD Decompose FIRST into: SECOND (S#, STATUS, CITY): primary key: S# S# CITY STATUS FD: 1. S# STATUS 2. S# CITY 3. CITY STATUS S# P# CITY STATUS QTY FFD x FFD FFD SP (S#, P#, QTY): P rimary key: (S#, p#) S# P# QTY FD: 4. (S#, P#) QTY 其他屬性必須 FFD 於 Primary Key ,不可有 只是 FD 而非 FFD, 以免暗藏關聯 (S#  CITY) Primary Key 只存在一次, 必與 其他屬性 FD 其他課本可能只寫 FD 但他們定義的 FD 就是 FFD

4-20 Normal Form: 2NF (cont.) S# STATUS CITY S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens SECOND (in 2NF) S# P# QTY) S1 P1 300 S1 P2 200 S1 P3 400 S1 P4 200 S1 P5 100 S2 P1 300 S2 P2 400 S3 P2 200 S4 P4 300 S4 P5 400 SP (in 2NF) S # STATUS CITY P# QTY S1 20 London P1 300 S1 20 London P2 200 S1 20 London P3 400 S1 20 London P4 200 S1 20 London P5 100 S1 20 London P6 100 S2 10 Paris P1 300 S2 10 Paris P2 400 S3 10 Paris P2 200 S4 20 London P2 200 S4 20 London P4 300 S4 20 London P5 400 FIRST Update: S1 moves from London to Paris Insertion: (S5 30 Athens) Deletion Delete "S3 supplies P2 200", then the fact "S3 is located in Paris" is NOT deleted. 2NF 的效果

4-21 Normal Form: 2NF (cont.)  A relation in 1NF can always be reduced to an equivalent collection of 2NF relations.  The reduction process from 1NF to 2NF is non-loss decomposition. FIRST(S#, STATUS, SCITY, P#, QTY) projections natural joins SECOND(S#, STATUS, CITY), SP(S#,P#,QTY)  The collection of 2NF relations may contain “more” information than the equivalent 1NF relation. (S5, 30, Athens) 2nd SP 1st

4-22 Problem: Update Anomalies in SECOND! Update Anomalies in SECOND UPDATE: if the status of London is changed from 20 to 60, then two tuples must be updated DELETE: delete supplier S5, then the fact "the status of Athens is 30" is also deleted! INSERT: cannot insert the fact "the status of Rome is 50"! Why: S.S# S.STATUS S.S# S. CITY S.CITY S.STATUS cause a transitive dependency S# STATUS CITY S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens SECOND (in 2NF) S# CITY STATUS FD: 1. S# STATUS 2. S# CITY 3. CITY STATUS 2NF 還是有必須改進之處

4-23 Normal Forms: 3NF  Def : A relation R is in 3NF iff (1) R is in 2NF (2) Every non-key attribute is non-transitively dependent on the primary key. e.g. STATUS is transitively on S# (i.e., non-key attributes are mutually independent) SP is in 3NF, but SECOND is not! S# P# QTY SP FD diagram S# STATUS CITY S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens SECOND (not 3NF) S# CITY STATUS SECOND FD diagram 其他屬性不可以輾轉關聯於 Primary Key 但因每個屬性都已關聯於 Primary Key ,故而 意即其他屬性之間不可私訂關聯

4-24 Normal Forms: 3NF (cont.)  Decompose SECOND into: SC(S#, CITY) primary key : S# FD diagram: CS(CITY, STATUS): primary key: CITY FD diagram: S# CITY STATUS S# CITY S1 London S2 Paris S3 Paris S4 London S5 Athens SC (in 3NF) CITY STATUS Athens 30 London 20 Paris 10 Rome 50 CS (in 3NF) S# STATUS CITY S1 20 London S2 10 Paris S3 10 Paris S4 20 London S5 30 Athens SECOND S# STATUS CITY

4-25 Normal Forms: 3NF (cont.)  Note: (1) Any 2NF diagram can always be reduced to a collection of 3NF relations. (2) The reduction process from 2NF to 3NF is non-loss decomposition. (3) The collection of 3NF relations may contain "more information" than the equivalent 2NF relation.

4-26 Good and Bad Decomposition  Consider S# CITY STATUS transitive FD S# CITY STATUS Good ! (Rome, 50) can be inserted. ① Decomposition A: SC: CS: S# CITY S# STATUS Bad ! (Rome, 50) can not be inserted unless there is a supplier located at Rome. ② Decomposition B: SC: CS: ③ Decomposition C: S# -> status city -> status ‘Good’ or ‘Bad’ is dependent on item’s semantic meaning!! S# STATUS CITY S# STATUS CITY Suppose 1. CITY is the main office of the supplier. 2. STATUS is some factor of CITY Decomposition B: S#  CITY, S#  STATUS 那就仍合在一起就好了 當作 CITY  STATUS 不存在

4-27 Good and Bad Decomposition (cont.)  Independent Projection: (by Rissanen '77) Def: Projections R1 and R2 of R are independent iff (1) Any FD in R can be reduced from those in R1 and R2. (2) The common attribute of R1 and R2 forms a candidate key for at least one of R1 and R2. Decomposition A: SECOND(S#, STATUS, CITY)  (R) (1) FD in SECOND (R): S# CITY CITY STATUS S#  STATUS FD in SC and CS R1: S# CITY R2: CITY STATUS (2) Common attribute of SC and CS is CITY, which is the primary key of CS.  SC and CS are independent  Decomposition A is good! SC (S#, CITY) CS (CITY, STATUS) Projection R1 R R2 (R1) (R2) { S# STATUS (1) 原有 FD 關係延續到分割後的表格 (2) 分割後的表格之共同屬性,至少為 其中一個表格的 candidate key

4-28 Good and Bad Decomposition (cont.) Decomposition B: SECOND(S#, STATUS, CITY) (1) FD in SC and SS: S# CITY S#  STATUS  SC and SS are dependent  decomposition B is bad! though common attr. S# is primary key of both SC, SS. Decomposition C: SECOND(S#, STATUS, CITY) (1) FD in SS and CS S#  STATUS CITY  STATUS (2) Common attribute STATUS is not only a candidate key of either SS or CS.  Decomposition C is not only a bad decomposition, but also an invalid decomposition, since it is not non-loss. SC (S#, CITY) SS (S#, STATUS) { CITY STATUS SS (S#, STATUS) CS (CITY, STATUS) { S# CITY Loss 非 P-key 關連的 Loss 由 P-key 關連的

4-29 Atomic Relation  Def: Atomic Relation -- A relation that cannot be decomposed into independent components. SP (S#, P#, QTY) is an atomic relation, while S (S#, SNAME, STATUS, CITY) is not.

4-30  Break

Boyce/Codd Normal Form (BCNF) Problems of 3NF: Do not deal with the cases: A relation has multiple candidate keys, Those candidate keys were composite, The candidate keys are overlapped.

4-32 Example –S: student –J: subject –T: teacher –Meaning of a tuple: student S is taught subject J by teacher T. –Suppose –For each subject, each student of that subject is taught by only one teacher. i.e. (S, J)  T –Each teacher teaches only one subject. i.e. T  J –Candidate keys (S, J) and (S, T) –FD diagram SJT(S, J, T) S J T Smith Math. Prof. White Smith Physics Prof. Green Jones Math. Prof. White Jones Physics Prof. Brown S J T

4-33 BCNF  Def: A relation R is in BCNF iff every determinant is a candidate key. A B: A determines B, and A is a determinant. [only one candidate key] SP (S#, P#, QTY): in 3NF & BCNF SC (S#, CITY): in 3NF & BCNF CS (CITY, STATUS): in 3NF & BCNF S# CITY STATUS S# P# QTY SECOND(S#, STATUS, CITY): not in 3NF & not in BCNF S# CITY STATUS 2NF 1NF 3NF BCNF in 3NF, not in BCNF e.g.3, e.g.4 (P.7-33)

4-34 BCNF (cont.)  [two disjoint (nonoverlapping) candidate keys] S(S#, SNAME, STATUS, CITY) Assume : (1) CITY, STATUS are independent (2) SNAME is a candidate key S#, SNAME (determinants) are candidate keys. S is in BCNF (also in 3NF). S# SNAME CITY STATUS 3NF BCNF 3NF but not BCNF ‧ e.g.2

4-35 BCNF (cont.) [overlapping candidate keys -1] SSP (S#, SNAME, P#, QTY) key in SSP: (S#, P#), (SNAME, P#) FD in SSP 1. S# SNAME 2. SNAME S# 3. {S#, P#} QTY 4. {SNAME, P#} QTY P# S# SNAME QTY SSP S# SName P# QTY - in 3NF nonkey attribute is FFD on primary key and mutually independent. e.g. QTY only - not in BCNF S# is a determinant but not a candidate key. S# SNAME Decompose: SS (S#, SNAME): in BCNF SP (S#, P#, QTY): in BCNF S# SNAME S# P# QTY

4-36 BCNF (cont.) [overlapping candidate keys-2] SJT(S, J, T) –S: student –J: subject –T: teacher –meaning of a tuple: student S is taught subject J by teacher T. –Suppose –For each subject, each student of that subject is taught by only one teacher. i.e. (S, J)  T –Each teacher teaches only one subject. i.e. T  J S J T Smith Math. Prof. White Smith Physics Prof. Green Jones Math. Prof. White Jones Physics Prof. Brown – Candidate keys (S, J) and (S, T) – FD diagram S J T

4-37 BCNF (cont.) – In 3NF, no nonkey attribute. – not in BCNF, T  J but T is not a candidate key – update anomalies occur! e.g. (delete "Jones is studying Physics" the fact "Brown teaches Physics" is also deleted!) Decompose 1: Decompose 2: ST (S, T) TJ(T, J) S T Smith Prof. White Smith Prof. Green Jones Prof. White Jones Prof. Brown T J Prof. White Math. Prof. Green Physics Prof. Brown physics ST in BCNF T J S J T J – Is this decomposition Good or Bad? – In Rissanen's sense, ST(S, T) and TJ(T, J) are not independent! the FD: (S,T) T cannot be deduced from FD: T J The two objectives: decomposing a relation into BCNF, and decomposing it into independent components may be in conflict! S J T Smith Math. Prof. White Smith Physics Prof. Green Jones Math. Prof. White Jones Physics Prof. Brown

4-38 BCNF (cont.) [overlapping candidate keys-3] EXAM(S, J, P); S: student, J: subject, P: position. –meaning of a tuple: student S was examined in subject J and achieved position P in the class. –suppose no two students obtained the same position in the same subject. i.e. (S, J)  P and (J, P)  S –FD diagram: –candidate keys: (S,J) and (J, P), overlap key: J. –in BCNF ! J P S S J P A DBMS 5 B DBMS 8 A Network 1 EXAM

4-39 Why Normal Form?  Avoid update anomalies  Consider the SSP(S#, SNAME, P#, QTY) Common sense will tell us SS(S#, SNAME) & SP(S#, P#, QTY) is a better design.  The concepts of FD, 1NF, 2NF, 3NF and BCNF to formalize common sense.  Mechanization is possible! i.e., we can write a program to do the work of normalization for us!

Fourth Normal Form (4NF)

4-41 Un-Normalized Relation COURSE TEACHER TEXT Physics {Prof. Green, {Basic Mechanics, Prof. Brown} Principle of Optics} Math. {Prof. Green} {Basic Mechanics, Vector Analysis, Trigonometry} CTX Math Text  meaning of a record: the specified course can be taught by any of the specified teachers and uses all of the specified texts as references.  Assume: - For a given course, there exists any number of teachers and any number of texts. - Teachers and texts are independent. - A given teacher or a given text can be associated with any number of courses.

4-42 Un-normalized Relation (cont.)  Note: No FD exists in this relation! Normalized COURSE TEACHER TEXT Physics(c) Prof. Green(t1) Basic Mechanics (x1) physics(c) Prof. Green(t1) Principle of Optics (x2) physics(c) Prof. Brown(t2) Basic Mechanics (x1) physics(c) prof. Brown(t2) Principles of Optics(x2) Math prof. Green Basic Mechanics Math prof. Green Vector Analysis Math prof. Green Trigonometry C T X Function

4-43 Un-normalized Relation (cont.)  Meaning of a tuple: course C can be taught by teacher T and uses text X as a reference. primary key: (COURSE, TEACHER, TEXT)  Check: in 1NF (simple domain contains atomic value only) in 2NF (Nonkey attributes are FFD on primary key, no key attributes) in 3NF (Nonkey attributes are mutually independent.) in BCNF (Every determinant is a candidate key) COURSE TEACHER TEXT Physics(c) Prof. Green(t1) Basic Mechanics (x1) physics(c) Prof. Green(t1) Principle of Optics (x2) physics(c) Prof. Brown(t2) Basic Mechanics (x1) physics(c) prof. Brown(t2) Principles of Optics(x2) Math prof. Green Basic Mechanics Math prof. Green Vector Analysis Math prof. Green Trigonometry

4-44 Un-normalized Relation (cont.)  Problem: a good deal of redundancy! property: if (c, t1, x1), (c, t2, x2) both appear then (c, t1, x2), (c, t2, x1) both appear also! reason: No FD, but has MVD! the decomposition cannot be made on the basis of FD. COURSE TEACHER Physics Prof. Green Physics Prof. Brown Math Prof. Green CT: COURSE TEXT Physics Basic Mechanics Physics Principles of Optics Math Basic Mechanics Math Vector Analysis Math Trigonometry CX: Physics Math Not FD! COURSE TEACHER TEXT Physics(c) Prof. Green (t1) Basic Mechanics (x1) physics(c) Prof. Green (t1) Principle of Optics (x2) physics(c) Prof. Brown (t2) Basic Mechanics (x1) physics(c) prof. Brown (t2) Principles of Optics (x2) Math prof. Green Basic Mechanics Math prof. Green Vector Analysis Math prof. Green Trigonometry intuitively decomposed

4-45 MVD ( Multi-Valued Dependencies)  Def: Given R(A, B, C), the multivalued dependence (MVD) R.A R.B holds in R iff the set of B-values matching a given (A-value, C-value) pair is R, depend only on A-value, and is independent of C-value. COURSE TEACHER, COURSE TEXT  Thm: Given R(A, B, C), the MVDR.A R.B holds iff the MVD R.A R.C also holds. Notation: R.A R.B | R.C COURSE TEACHER | TEXT 1. FD is a special case of MVD all FD's are also MVD's 2. MVDs (which are not also FD's) can exist only if the relation R has at least 3 attributes. physicsBasic Mechanics Brown Green,, AB C {{ {{ MVD FD ‧ ‧

4-46 Norma Forms: 4NF  Problem of CTX: involves MVD's that are not also FD's.  Def: A relation R is in 4NF iff whenever there exists an MVD in R, say A R, then all attributes of R are also FD on A. i.e. R is in 4NF iff (i) R is in BCNF, (ii) all MVD's in R are in fact FD's. i.e. R is in 4NF iff (i) R is in BCNF, (ii) no MVD's in R. CTX (COURSE, TEACHER, TEXT) COURSE TEACHER COURSE TEXT S (S#, SNAME, STATUS, CITY) S# STATUS SNAME CITY not in 4NF no MVD which is not FD in 4NF

4-47 Norma Forms: 4NF (cont.)  Thm: Relation R(A, B, C) can be no loss decomposed into R1(A, B) and R2(A, C) iff A B | C holds in R. CTX (COURSE, TEACHER, TEXT) COURSE TEACHER | TEXT CT (COURSE, TEACHER) CX (COURSE, TEXT) no MVD in 4NF

Fifth Normal Form (5NF)

4-49 A Surprise  There exist relations that cannot be nonloss-decomposed into two projections, but can be decomposed into three or more.  Def: n-decomposable (for some n > 2) the relation can be nonloss-decomposed into n projections, but not into m projection for any m < n.  SPJ (S#, P#, J#); S: supplier, P: part, J: project. Suppose in real world if (a) Smith supplies monkey wrenches, and (b) Monkey wrenches are used in Manhattan project, and (c) Smith supplies Manhattan project. then (d) Smith supplies Monkey wenches to Manhatan project. i.e. If (s1, p1, j2), (s2, p1, j1), (s1, p2, j1) appear in SPJ Then (s1, p1, j1) appears in SPJ also. – no MVD in 4NF

4-50 A Surprise (cont.)  update problem of SPJ If (S2, P1, J1) is to be inserted then (S1, P1, J1) must also be inserted If (S1, P1, J1) is to be deleted, then one of the following must also be deleted (i) (S1, P1, J2): means S1 no longer supplies P1. (ii) (S1, P2, J1): means S1 no longer supplies J1. (iii) (S2, P1, J1): means J1 no longer needs P1. SPJ: S# P# J# S1 P1 J2 S1 P2 J1 SPJ: S# P# J# S1 P1 J2 S1 P2 J1 S2 P1 J1 S1 P1 J1

4-51 A Surprise (cont.)  SPJ is not 2-decomposable, but is 3-decomposable! SPJ S# P# J# S1 P1 J2 S1 P2 J1 S2 P1 J1 S1 P1 J1 S# P# S1 P1 S1 P2 S2 P1 P# J# P1 J2 P2 J1 P1 J1 J# S# J2 S1 J1 S1 J1 S2 SP PJ JS S# P# J# S1 P1 J2 S1 P1 J1 S1 P2 J1 S2 P1 J2 S2 P1 J1 join over P# spurious join over (J#, S#) ORIGINAL SPJ

4-52 Join Dependency (JD)  Def: A Relation R satisfies the join dependency (JD) * (X, Y,..., Z) iff R is equal to the join of its projections on X, Y,..., Z, where X, Y,..., Z are subsets of the set of attributes of R.  SPJ satisfies the JD *(SP, PJ, JS) i.e. SPJ is 3-decomposable.  MVD is a special case of JD. Thm: R (A, B, C) can be nonloss-decomposed into R 1 (A, B) and R 2 (A, C) iff A B|C holds. Thm: R (A, B, C) satisfies the JD *(AB, AC) iff A B|C holds.  JD's are the most general form of dependency possible, so long as we concentrate on the dependencies that deal with a relation being decomposed via projection and recomposed via join.

4-53 Norma Forms: 5NF  Def: A relation R is in 5NF (or PJ/NF) iff every JD in R is a consequence of the candidate keys of R.  Suppose S# and SNAME are candidate keys of S (S#, SNAME, STATUS, CITY). * ((S#, SNAME, STATUS), (S#, CITY)) is a consequence of S# (a candidate key of S) * ((S#, SNAME), (S#, STATUS), (SNAME, CITY)) is a consequence of the candidate keys S# and SNAME.

4-54 Norma Forms: 5NF (cont.)  Consider SPJ (S#, P#, J#), the candidate key of SPJ is (S#, P#, J#). However, there exists a JD *((S#, P#), (P#, J#), (J#, S#)) which is not a consequence of (S#, P#, J#) SPJ not in 5NF! decomposed: SP (S#, P#), PJ (P#, J#), JS (J#, S#): ( no JD in them all in 5NF!) Note: 1. Discovering all the JD's is a nontrivial operation. 2. Intuitive meaning of JD may not be obvious. 3. A relation in 4NF but not in 5NF is a pathological case, and likely to be rare in practice.

4-55 Concluding Remarks  The technique of non-loss decomposition is an aid to logical database design.  The overall processes of Normalization: step1: eliminate non-full dependencies. step2: eliminate any transitive FDs. step3: eliminate those FDs in which the determinant is not a candidate key. step4: eliminate any MVDs that are not FDs. step5: eliminate any JDs that are not a consequence of candidate keys.  General objective: reduce redundancy, and then avoid certain update anomalies.  Normalization Guidelines are only guidelines. Sometime there are good reasons for not normalizing all the way.

4-56 Concluding Remarks (cont.) NADDR (NAME, STREET, CITY, STATE, ZIP) NAME STREET CITY STATE ZIP not in 5NF (in which NF?) decompose NSZ (NAME, STREET, ZIP) ZCS (ZIP, CITY, STATE) NAME STREET ZIP ZIP CITY STATE

4-57 Concluding Remarks (cont.)  However, (1) STREET, CITY, STATE are almost required together. (2) so ZIP do not change very often, such a decomposition seems unlikely to be worthwhile.  Not all redundancies can be eliminate by projection.  Research topic: decompose relations by other operator. e.g. restriction. Projection

The Entity/Relationship Model  Proposed by Peter Pin-Shan Chen. “The Entity- Relationship Model,” in ACM Trans. on Database System Vol. 1, No. 1, pp.9-36,  E-R Model contains: Semantic Concepts and E/R Diagram.

4-59 Entity/Relationship Diagram  Entity :  Property:  Relationship: Regular Entity Weak Entity Base Derived Multi Composite Key Regular Weak total partial Department Employee Dependent Supplier Part Project Department_Emp SP SPJ Part_Struct Proj_Work Proj_Mngr Emp_Dependent Emp# Salary Ename First Last MI S# Sname Status City Qty M M M M M M M M M M M Exp Imp... P# Qty...

4-60 Semantic Concepts  Entity: a thing which can be distinctly identified. e.g. S, P Weak Entities: an entity that is existence-dependent on some other entity. e.g. Dependent Regular Entities: an entity that is not weak.  Property: Simple or Composite Property key: unique value Single or Multi-valued: i.e. repeating data Missing: unknown or not applicable. Base or Derived: e.g. "total quantity"

4-61 Semantic Concepts (cont.)  Relationship: association among entities. participant: the entity participates in a relationship. e.g. Employee, Dependent degree of relationship: number of participants. total participant: every instance is used. e.g. Dependent partial participant: e.g. Employee Note: An E/R relationship can be one-to-one, one-to- many, many-to-one, or many-to-many EmployeeDependent E1 E2 E3 E4 Dependent M 1 Employee

4-62 Semantic Concepts (cont.)  Subtype: Any given entity can be a subtype of other entity. e.g. An example type hierarchy: Programmer Employee Application Programmer Programmer Employee Programmer Application Programmer System Programmer

4-63 Transfer E-R Diagram to SQL Definition ER-Diagram SQL Definition (1) Regular Entities  Base Relations  e.g. Department  DEPT Employee  EMP Supplier  S Part  P Project  J  attributes: properties.  primary key: key property. CREATE TABLE EMP (EMP# primary key (EMP#));. (2) Properties  attributes  multivalued property: normalized first.

4-64 Transfer E-R Diagram to SQL Definition (cont.) (3) Many-to-Many Relationships  Base Relations Consider  SUPP-PART  SP – foreign keys: key attributes of the participant entity. – primary key: (1) combination of foreign keys, or (2) introduce a new attribute. – attributes: primary key  foreign keys  properties ID S# P# QTY S S# SP P QTY P# M CREATE TABLE SP ( S#,..., P#,..., Qty,..., FOREIGN KEY (S#) REFERENCES S NULL NOT ALLOWED DELETE OF S RESTRICTED UPDATE OF S.S# CASCADES FOREIGN KEY (P#) REFERENCES P PRIMARY KEY (S#, P#) )

4-65 Transfer E-R Diagram to SQL Definition (cont.) (4) Many-to-One or One-to-Many or One-to-One Relationships → 不用給 Base-Relation, 但 …. e.g. Consider  DEPT_EMP(Department_Employee) 1° no any new relation is necessary. 2° Introduce a foreign key in "many" side relation (eg. EMP) that reference the "one" side relation (eg. DEPT). CREATE TABLE EMP ( EMP#,... DEPT#, FOREIGN KEY (DEPT#) REFERENCES DEPT ); Department (DEPT) DEPT_EMP EMP# DEPT# Employee (EMP) M 1 DEPT#

4-66 Transfer E-R Diagram to SQL Definition (cont.) (5) Weak Entities  Base Relation + Foreign Key foreign key: key property of its dependent entity primary key: (1) combination of foreign key and its own primary key (2) introduce a new attribute e.g. CREATE TABLE DEPENDENT ( EMP#, … name, FOREIGN KEY (EMP#) REFERENCES EMP NULL NOT ALLOWED DELETE OF EMP CASCADES UPDATE OF EMP.EMP# CASCADES PRIMARY KEY ( EMP#, DEPENDENT_NAME ) ); Dependent EMP EMP# name M 1 EMP# e.g. EMP#e.g. name

4-67 Transfer E-R Diagram to SQL Definition (cont.) (6) Supertypes and Subtypes  Base Relation + Foreign Key CREATE TABLE EMP ( EMP#,... DEPT#,... SALARY, PRIMARY KEY (EMP#),...); CREATE TABLE PGMR ( EMP#,... LANG#,... SALARY, PRIMARY KEY (EMP#) FOREIGN KEY (EMP#) REFERENCES EMP......); EMP PGMR