Presentation is loading. Please wait.

Presentation is loading. Please wait.

Get data properly tabled!

Similar presentations


Presentation on theme: "Get data properly tabled!"— Presentation transcript:

1 Get data properly tabled!
University of Manitoba Asper School of Business 3500 DBMS Bob Travica Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications Updated 2019

2 Normalization – Technical Concept
The process of putting data into the format of relational databases or organizing data into correctly designed tables. Tables should be designed so that a) problems (anomalies) with insertion, deletion and modification of data are avoided b) redundancy is reduced c) data quality is preserved (accuracy, completeness, consistency)

3 Normalization – Practitioner’s Concept
Tables should be designed so that business entities and their classes are clearly differentiated into separate tables master data are separated from transactional data tables are properly connected all variation in data is traceable. Example: Student <> Course <> Registration You are already familiar with this!

4 Relational Database Terminology
Relational database: A collection of tables (relations). Tables store atomic data. Table: A collection of columns (attributes, properties, fields) describing an entity (class). Table is also a collection of rows (records) each with the same number of columns. Each row represents an object (an instance of a class). EmployeeID TaxpayerID LastName FirstName HomePhone Address Cartom Abdul (603) South Street Venetiaan Roland (804) Paramaribo Ln Johnson John (703) Main Street Stenheim Susan (410) W. Maple Attributes/ Properties Rows/Objects Entity (Class): Employee Table: Employee

5 Relational Database Terminology – Primary Key
Every table has a primary key (key) – an attribute that uniquely identifies each row (e.g., EmployeeID on previous slide) Primary key can span more than one column combined (combined, composite, concatenated) key. OrderItem OrderID ItemID Quantity Other attributes are called non-key columns. A non-key depends on key. Primary key can be generated automatically by DBMS – surrogate key. Note: Watch for data types (e.g., number vs. text) and naming rules (arbitrary but consistent).

6 Relational Database Shorthand Notation
Primary key is underlined Non-key columns Table name Customer(CustomerID, LastName, FirstName, Address, City, State, ZipPostalCode, TelephoneNumber) * Note: Telephone number can be used as a “backup key”, “secondary key”. But the question today is, which tel. number (landline, cell; which cell number?) May need a separate table for tel. numbers. * Shorthand notation is good for analysis but not for official diagrams. Do not use it in your assignments and exams.

7 Transformation of Class Diagram into Schema
When Class diagram is good, 80% of normalization is done! Class becomes table Association class becomes table with 2 associations FKs are created (complement PKs to support associations) Multiplicity shown via PK-FK links (FK = “many” side) Association names disappear Class Diagram (Non-Normalized) Customer Order Salesperson Item 1 * OrderItem places handles contains Customer Order Salesperson Item OrderItem 1 * Schema – Normalized Tables Diagram

8 Shorthand Notation for Schema:
Foreign Key Creation Customer(CustomerID, Name, Address, City, Phone) Salesperson(EmployeeID, Name, DateHired) Order(OrderID, OrderDate, CustomerID, EmployeeID) OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description, ListPrice) Foreign Key (FK) = Attribute that is a (primary) key in another table (e.g., CustomerID in Order); encircled. Logic & naming of OrderItem: Replacing the Order-Item M:N relationship with two 1:M relationships. Also common name: OrderDetail. The OrderItem key is a combination of FKs (OrderID+ItemID).

9 NORMALIZATION Process GET IN TABLES!

10 Video Store * Transaction Processing System (VSTPS): Classes
Transaction Data (“Dynamic” ) — Operations Entities (change more often) Master Data (“Static”)— Market & Inventory Entities (don’t change often) Customer table Key: CustomerID Attributes: Name Address Phone Video table Key: VideoID Attributes : Title RentalFee Rating… RentalTransaction table Key: TransactionID Attributes : CustomerID Date VideoRented table Key: TransactionID + VideoID Attributes: Copy# * Video refers to any storage tech, such as blu-ray disks, DVD games, etc.

11 Business Rules and Class Diagram for VSTPS
Business Rules for multiplicity: A customer can have many rental transactions, each being for a specific customer. A transaction can include many video titles, and a title is in many transactions. A transaction can include only one copy of a video title (specific copy number). Customer VideoTitle RentalTransaction 1 * has includes VideoCopy Rented

12 Schema for VSTPS You can create a schema almost entirely based on your knowledge of data analysis and multiplicity; Table 3 may be a bit tricky to figure out. 1 Tbl 1 Customer(CustomerID, LastName, FirstName, Address, City, …) * Tbl 2 RentalTransaction(TransID, RentDate, CustomerID) 1 Transaction data * Tbl 3 VideoRented(TransID, VideoID, Copy#) * 1 Tbl 4 Video(VideoID, Title, RentalFee)

13 Normalization Process
To arrive at a class diagram and then schema, start with the system requirements task (talk with users, understand output that exists and that is needed in the future). Which classes does this electronic form indicate? What are the possible class associations, attributes, keys? Classes indicated (min.): CUSTOMER TRANSACTION VIDEO These classes translate into 3 tables. Put together, it is a big chunk of data shown below in a mock-up (incorrect) table. RentalForm(TransID, RentDate, ((CustomerID, Name, Address, City, State, …), (VideoID, Copy#, Title, RentalFee))

14 Why Normalize – Avoiding Data Anomalies
How to get to proper tables using normalization logic? Why not use that one table RentalForm? RentalForm(TransID, RentDate, ((CustomerID, Name, Address, City, State, …), (VideoID, Copy#, Title, RentalFee)) Poor design because: Master data (Customer, Video) repeat for each transaction - high redundancy. Deletion of transaction data causes deletion of master data and reverse – deletion anomaly: Cannot delete target data but more (or less) than wanted. A new customer can’t be added without adding a new video and reverse – insertion anomaly: Data can’t be added without corrupting (faking) other data. To change customer name, all records must be rewritten – update anomaly: Data can’t be updated just in a single master record. Conclusion: From the normalization perspective, data must be properly designed in order to avoid CRUD* anomalies and reduce redundancy. * Data Reading (retrieval) is not included in anomalies but it is apparent that the single class design above causes major retrieval problems.

15 Normalization Process
A process of creating tables that are free from data anomalies. Practically, create master and transactional tables, and connect them via PK-FK associations. Each M:N relationship must be replaced by two 1:M relationships. A split & relate process. RentalForm (Customer, Video, RentalTransaction) Split in 3 tables A rental transaction contains M videos, each video rented M times. Tracking physical copies is necessary. RentalTransaction Customer Video 1 * Relating data preserved via keys (e.g., CustomerID exported from RentalForm to Customer, and to Video). M, N = “Many”

16 Normalization Process (cont.)
- To track each copy of a video video, RentalTransaction is further split. - New table VideoRented stays connected by exporting RentalTranscationID into VideoRented. - Changes in multiplicity (encircled 1s). One rental transaction can include M videos, but just 1 copy of each. 1 RentalTransaction * VideoRented (copy#) * 1 A rental transaction contains a particular copy of each video rented. * 1 Customer Video

17 Three Normal Forms There are three normal forms we study:
First normal form (1NF) Second normal form (2NF) Third normal form (3NF) The goal of the normalization process is to arrive at 3NF that suffers no data anomaly. Tables in 3NF are normalized (except for some very rare cases). Another way of thinking about normalization: In each normalized table, there is a full functional dependence of the values of the non-key columns on the values of the key column (PK).

18 Functional Dependence Between the Key and Other Columns
The key column must be sufficient for determining values of the non-key columns. An attribute depends on another attribute if the change of its value is caused by a change of that other attribute’s value. From this perspective, the goal of the normalization process is to establish the Full Functional Dependence between the key and non-key columns, driving out other dependencies (Partial Functional Dependence, and Transitive Dependence). determines Key Non-key columns Full Functional Dependence (Key determines non-key values) determines Non-key columns Key Partial Functional Dependence (Part of Key determines some non-key values) determines Key Non-key columns Transitive Dependence (Non-key attribute determines another non-key)

19 First Normal Form (1NF) ?*
1NF: A table is in 1NF if it does not have repeating sections. Normalization Procedure: Remove repeating sections by splitting the initial table into new tables. Similar to recognizing classes and transforming them into tables. Ask after removing repeating sections: Can values of non-key attributes be predicted form the key values? RentalTransaction(TransID, RentDate) + CustomerID (FK) Video(VideoID, Copy#, Title, RentalFee) Customer(CustomerID, Phone, Name, Address, City, State) New tables Reminder of table RentalForm ?* *CopyID causes Title and Rental to repeat for each copy. New tables Video(VideoID, Title, RentalFee) Reminder of table Video VideoRented(VideoID, Copy#) + TransID (FK)

20 Anomalies with Repeating Sections
RentalForm(TransID, RentDate, ((CustomerID, Phone, Name, Address, City, State, …), (VideoID, Copy#, Title, RentFee)) Repeating sections TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title RentFee 1 4/18/02 3 Washington Easy Street : A Space Odyssey $1.50 1 4/18/02 3 Washington Easy Street 6 3 Clockwork Orange $1.50 2 4/30/02 7 Lasater S. Ray Drive 8 1 Hopscotch $1.50 2 4/30/02 7 Lasater S. Ray Drive 2 1 Apocalypse Now $2.00 2 4/30/02 7 Lasater S. Ray Drive 6 1 Clockwork Orange $1.50 Repeating groups cause high redundancy update anomaly (must run through all records to make the update) insertion anomaly causes wrong data (fake CustomerID if new video added) - deletion anomaly (can’t delete just wanted data but also unwanted data) If there are repeating sections, the table is not in the first normal form (1NF).

21 Second Normal Form (2NF)
2NF: A table is in 2NF if it is (a) is 1NF and (b) non-key columns depend on the entire key. The 2NF test applies only to tables with the key that has two or more attributes – combined (concatenated) key. Suppose that table Video is like below. Combined to determine (fits 2NF) Video(TransID, VideoID, Copy#, Title, RentalFee) Sufficient to determine (violates 2NF) The Copy# can be predicted from a combination of TransID and VideoID – Full Functional Dependence. The Title and RentalFee can be predicted from VideoID. Therefore, there is Partial Functional Dependence – violation of 2NF.

22 Second Normal Form (2NF), cont’d
If any non-key column depends just on a part of the key there is partial functional dependence and the table is not in 2NF. Solution: Split tables and get results as on slide 16. Video(VideoID, Title, RentalFee) VideoRented(TransID, VideoID, Copy#) *Neither TransID nor VideoID on their own can predict Copy# of a rented video title. **It is true that TransID suffices for predicting values of all other attributes, but it is not really needed for predicting Title and RentalFee because these are predictable from VideoID.

23 Third Normal Form (3NF) 3 NF: Table is in 3NF if it is (a) in 2NF, and (b) each non-key attribute depends on the key only (on the key and nothing but the key). If any non-key depends on some other non-key, there is Transitive Dependence and the table is not in 3NF. Our design is already in 3NF! Check it: Customer(CustomerID, LastName, FirstName, Address, City, …) VideoRented(TransID, VideoID, Copy#) Video(VideoID, Title, RentalFee) RentalTransaction(TransID, RentDate, CustomerID) Why do have do split transaction data so finely and not just have one class as shown below? RentalTransaction(TransID, VideoID, Copy#, RentDate, CustomerID) T1, VA, Co1, DATE1, C1 T1, VB, Co1, DATE1, C1 T1, VA, Co2, DATE1, C1 x Run the anomalies test: Data insertion: New transaction records can be inserted whether by adding more videos to the same transaction, or adding new transactions for the same or different customers. So there is no insertion anomaly. Data deletion: If a transaction entered needs to be deleted, the user would have to go through each video record on one transaction in order to delete it, instead of just deleting one transaction record that would pull the deletion of associated records referencing video titles (and their copies). So there is the deletion anomaly. Data update: If a date is incorrectly entered in one or more transaction records, each record would have to be accessed to ensure the date is correct. So there is the update anomaly. See footnote

24 3NF Violation Examples Tables in 2NF but there is transitive dependence: RentalTransaction(TransID, RentDate, CustomerID, CustomerStanding) Solution: put CustomerStanding in tbl Customer Sale(SaleID, CustomerID, SalespersonID, SalespersonRank…) Solution: split Sale(SaleID, CustomerID, SalespersonID) Salesperson(SalespersonID, SalespersonRank) Forms beyond the 3rd are very rare and therefore reaching 3NF is sufficient for most of practical purposes. When we say “create schema”, we mean “create tables that are in 3NF”.

25 Summary of Normal Forms (Must know by heart!)
1) If a table has repeating sections, there is huge redundancy, different classes are mixed together, and all anomalies occur. Split the table, so that classes are clearly differentiated. Result: 1NF. 1NF: A table is in 1NF if it does not have repeating sections. 2) If a table has a combined key, non-key columns may depend on just a part of the primary key, and so there is partial functional dependency. Split the table so that in new tables non-keys depend on the entire key. Result: 2NF. 2NF: A table is in 2NF if it is in 1NF and non-key columns depend on the entire combined key. 3) If a non-key depends on another non-key, there is transitive dependency. Split the table so that in new tables each non-key depends on the key and nothing but the key. Result: 3NF. 3NF: A table is in 3NF if it is in 2NF and all non-key columns depend on the key only.


Download ppt "Get data properly tabled!"

Similar presentations


Ads by Google