Presentation is loading. Please wait.

Presentation is loading. Please wait.

DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba.

Similar presentations


Presentation on theme: "DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba."— Presentation transcript:

1 DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba Asper School of Business 3500 DBMS Bob Travica Updated 2015

2 DBSYSTEMS 2 of 23 Normalization  The process of putting data into the format of relational databases or organizing data into correctly designed tables.  Tables should be designed so that  a) problems (anomalies) with insertion, deletion and modification of data are avoided  b) redundancy is reduced  c) data quality is preserved (completeness, consistency)

3 DBSYSTEMS 3 of 23 Relational Database Terminology  Relational database: A collection of tables (relations). Tables store atomic data.  Table: A collection of columns (attributes, properties, fields) describing an entity (class). Table is also a collection of rows (records) each with the same number of columns.  Each row represent an object (an instance of a class). EmployeeIDTaxpayerIDLastNameFirstNameHomePhoneAddress 12512888-22-5552CartomAbdul(603) 323-9893252 South Street 15293222-55-3737VenetiaanRoland(804) 888-6667937 Paramaribo Ln 22343293-87-4343JohnsonJohn(703) 222-9384234 Main Street 29387837-36-2933StenheimSusan(410) 330-98378934 W. Maple Attributes/ Properties Rows/Objects Entity (Class): Employee Table: Employee

4 DBSYSTEMS 4 of 23 Relational Database Terminology – Primary Key  Every table has a primary key (key) – an attribute that uniquely identifies each row (e.g., EmployeeID on previous slide)  Primary key can span more than one column combined (combined, composite, concatenated) key. Note: Watch for data types (e.g., number vs. text) and naming rules (arbitrary but consistent). OrderItem OrderIDItemIDQuantity 1 229 2 1 253 4 2 229 1 2 555 4  Primary key can be generated automatically by DBMS – surrogate key.  Other attributes are called non-key columns. A non-key depends on key.

5 DBSYSTEMS 5 of 23 Relational Database Shorthand Notation Customer(CustomerID, LastName, FirstName, Address, City, State, ZipPostalCode, TelephoneNumber) * Table name Non-key columns Primary key is underlined Note: Telephone number can be used as a “backup key.” Shorthand notation is good for analysis but not for official diagrams. Do not use it in your assignments and exams.

6 DBSYSTEMS 6 of 23 Class Diagram to Schema Customer Order Salesperson Item OrderItem 1 * 1 * 1 1 * * Tables Diagram – Schema (Normalized) Class Diagram (Non-Normalized) Customer Order Salesperson Item 1 * 1 * * * OrderItem Association class (ItemOrdered, OrderDetail, etc.) Another new detail: Foreign keys shown in a complete schema. places serves contains

7 DBSYSTEMS 7 of 23 Customer(CustomerID, Name, Address, City, Phone) Salesperson(EmployeeID, Name, DateHired) Order(OrderID, OrderDate, CustomerID, EmployeeID) OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description, ListPrice) Shorthand Notation for Normalized Tables Diagram – Foreign Key Foreign Key (FK) = Attribute that is a (primary) key in another table (e.g., CustomerID in Order). Logic & naming of OrderItem: Replacing the Order-Item M:M relationship with two 1:M relationships. Also common name: OrderDetail. The OrderItem key is a combination of FKs (OrderID+ItemID).

8 DBSYSTEMS 8 of 23

9 DBSYSTEMS 9 of 23 Video Store Transaction Processing System (VSTPS): Classes, Columns & Business Rules  Customer table  Key: CustomerID  Attributes: Name Address Phone  Video table  Key: VideoID  Attributes : Title RentalFee Rating…  RentalTransaction table  Key: TransactionID  Attributes : CustomerID Date  VideoRented table  Key: TransactionID + VideoID  Attributes: Copy# Master Data (“Static”)— Market & Inventory Entities (don’t change often) Transaction Data (“Dynamic” ) — Operations Entities (change more often)

10 DBSYSTEMS 10 of 23 Business Rules and Class Diagram for VSTPS Business Rules: A customer can have many rental transactions, each being for a specific customer. A transaction can include many video titles, and a title is in many transactions. A transaction can include just one copy of a video title. CustomerVideoTitle RentalTransaction 1 ** * has includes ? VideoRented

11 DBSYSTEMS Schema for VSTPS 11 of 23 Customer(CustomerID, LastName, FirstName, Address, City, …) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee) RentalTransaction(TransID, RentDate, CustomerID) Transaction data You can draw a normalized schema based on knowledge of multiplicity and data analysis you already have! 1 1 1 * * *

12 DBSYSTEMS 12 of 23 How to get to those four tables using normalization logic? Why not simple design for recording rentals: VideoRental Poor design because: Master data (Customer, Video) repeat for each transaction - high redundancy. VideoRental(Rec#, CustomerID, LastName, FirstName,… VideoID, Title, RentalFee, Copy#, Date) Deletion of transaction data causes deletion of master data and reverse – deletion anomaly: Cannot delete target data but more (or less) than wanted. A new customer can’t be added without adding a new video and reverse – insertion anomaly: Data can’t be added without corrupting other data. To change customer name, all records must be rewritten – update anomaly: Data can’t be updated only in a single master record. Conclusion: From the normalization perspective, data must be properly designed in order to avoid CRUD* anomalies and reduce redundancy. Why Normalize – Avoiding Data Anomalies Test:

13 DBSYSTEMS 13 of 23 Normalization A process of splitting a chunk of data to arrive at clear master and transactional classes. Each many-to-many relationship must be replaced by 2 one-to-many relationships. CustomerVideo * rents * RentalTransaction 1 ** * has includes 1. CustomerVideo RentalTransaction 1 * * * has contains VideoRented (copy#) 1 includes * 1 is rented * 2. How to track copies of a same video?

14 DBSYSTEMS 14 of 23 Normalization Process  Interview users, understand output needed. Put data into a large table (RentalForm).  Pick out attributes.  Find repeating groups (sections).  Look for potential keys.  Identify computed values. RentalForm(TransID, RentDate, (CustomerID, Name, Address, City, State, …), (VideoID, Copy#, Title, RentalFee)) Focus is on logic not really using such process in practice.

15 DBSYSTEMS 15 of 23 Problems with Repeating Groups (Sections) RentalForm(TransID, RentDate, (CustomerID, Phone, Name, Address, City, State, …), (VideoID, Copy#, Title, Rent)) TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/023Washington502-777-757595 Easy Street122001: A Space Odyssey$1.50 14/18/02 3Washington502-777-757595 Easy Street63Clockwork Orange$1.50 24/30/02 7Lasater615-888-447467 S. Ray Drive81Hopscotch$1.50 24/30/02 7Lasater615-888-447467 S. Ray Drive21Apocalypse Now$2.00 24/30/02 7Lasater615-888-447467 S. Ray Drive61Clockwork Orange$1.50 Repeating Groups Repeating groups cause -high redundancy -update anomaly (must run through all records) -insertion anomaly as errors in data (fake CustomerID if new video added) - deletion anomaly (can’t delete simply what is needed) If there are repeating sections, the table is not in the first normal form (1NF).

16 DBSYSTEMS 16 of 23 First Normal Form (1NF)  1NF: A table is in 1NF if it does not have repeating sections.  Normalization Procedure:  Remove repeating sections by splitting the initial table into new tables.  Preserve associations between the initial table and new tables by replicating the initial key. RentalTransaction(TransID, RentDate) Video(TransID, VideoID, Copy#, Title, RentalFee) Customer(TransID, CustomerID, Phone, Name, Address, City, State, ZipCod) New Reminder of initial table

17 DBSYSTEMS 17 of 23 Problems with First Normal Form  There are problems in the relationship between the key and non-keys.  Concept of Functional Dependence:  An attribute depends on another attribute if the change of its value is caused by a change of the other attribute.  The key column must be sufficient for determining values of the non- key columns. TransIDVideoIDCopy#TitleRentalFee 1122001: A Space Odyssey$1.50 163Clockwork Orange$1.50 281Hopscotch$1.50 221Apocalypse Now$2.00 261Clockwork Orange$1.50 Video  Problems apply only to tables with combined keys! (A single-key table in 1NF is also in 2NF.)

18 DBSYSTEMS 18 of 23 Problems with First Normal Form (cont.)  If any non-key column depends just on a part of the key there is partial functional dependence and the table is not in 2NF. VideoID is sufficient for predicting titles and rental fees. Therefore, there is Partial Functional Dependence between the combined key and Title and RentalFee. ** Copy# depends on full key (TransID + VideoID) -- Full Functional Dependency on the key. * Video(TransID, VideoID, Copy#, Title, RentalFee) Combined determine Sufficient to determine

19 DBSYSTEMS 19 of 23 Second Normal Form (2NF)  2NF: A table is in 2NF if it is (a) is 1NF and (b) non-key columns depend on the entire key.  Normalization Procedure:  Move TransID and Copy# into a new table VideoRented.  Preserve the association between Video and VideoRented by replicating VideoID in table VideoRented. Video(TransID, VideoID, Copy#, Title, RentalFee) move replicate VideoRented(TransID, VideoID, Copy#) New Video(VideoID, Title, RentalFee) Resulting Video table * X X

20 DBSYSTEMS 20 of 23 Table Customer must also be brought into 2NF by moving TransID into table RentalTransaction (already there) and replicating CustomerID (see Slide 15). Customer(TransID, CustomerID, Phone, Name, Address, City, State,…) RentalTransaction(TransID, RentDate, CustomerID) movereplicate Completed Resulting Customer table Customer(CustomerID, LastName, FirstName, Address, City, …) Finalize 2NF… X

21 DBSYSTEMS 21 of 23 Third Normal Form (3NF)  Problems with 3NF: If any non-key depends on some other non-key there is transitive dependence and the table is not in 3NF.  3 NF: Table is in 3NF if it is (a) in 2NF, and (b) each non-key attribute depends on the key only (or the key and nothing but the key).  Our design is already in 3NF! Check it below: Customer(CustomerID, LastName, FirstName, Address, City, …) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee) RentalTransaction(TransID, RentDate, CustomerID)

22 DBSYSTEMS 22 of 23 Table in 2NF: Sale(SaleID, CustomerID, SalespersonID, SalespersonRank…) 3NF Example Solution – split table into 2 tables : : Sale(SaleID, CustomerID, SalespersonID) Salesperson(SalespersonID, SalespersonRank) Violation of 3NF: SalespersonRank (non-key) is dependent on SalespersonID, not SaleID. Forms beyond the 3rd are very rare and therefore reaching 3NF is sufficient for most of practical purposes. When we say “create schema”, we mean “create tables that are in 3NF”.

23 DBSYSTEMS 23 of 23 Simplified Schema for VSTPS Using Different Key Design Customer(CustomerID, LastName, FirstName, Address, City, …)Video(VideoID, Title, RentalFee) RentalTransaction(TransID, CustomerID, VideoID, RentDate) Note: Video key can be made unique: VideoID = 85.1 (decimal place designates a copy), or 85c1 (text type), or use a bar code for each video and copy (ItemID). 1 1 * *

24 DBSYSTEMS 24 of 23 Summary of Normal Forms (Must know by heart!) 1) If a table has repeating sections, there is huge redundancy, different classes are mixed together, and all anomalies occur. Split the table, so that classes are clearly differentiated. Result: 1NF. 2) If a table has a combined key, non-key columns may depend on just a part of the primary key, and so there is partial functional dependency. Split the table so that in new tables non-keys depend on the entire key. Result: 2NF. 3) If a non-key depends on another non-key, there is transitive dependency. Split the table so that in new tables each non-key depends on the key and nothing but the key. Result: 3NF. 1NF: A table is in 1NF if it does not have repeating sections. 2NF: A table is in 2NF if it is in 1NF and non-key columns depend on the entire key. 3NF: A table is in 3NF if it is in 2NF and all non-key columns depend on the key only.


Download ppt "DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba."

Similar presentations


Ads by Google