Download presentation
Presentation is loading. Please wait.
1
Database Design Using Normalization
David M. Kroenke and David J. Auer Database Processing: Fundamentals, Design, and Implementation Chapter Four: Database Design Using Normalization
2
Chapter Objectives To design updatable databases to store data received from another source To use SQL to access table structure To understand the advantages and disadvantages of normalization To understand denormalization To design read-only databases to store data from updateable databases KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
3
Chapter Objectives To recognize and be able to correct common design problems: The multivalue, multicolumn problem The inconsistent values problem The missing values problem The general-purpose remarks column problem KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
4
Chapter Premise We have received one or more tables of existing data.
The data is to be stored in a new database. QUESTION: Should the data be stored as received, or should it be transformed for storage? KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
5
How Many Tables? SKU_DATA (SKU, SKU_Description, Buyer) BUYER (Buyer, Department) Where SKU_DATA.Buyer must exist in BUYER.Buyer Should we store these two tables as they are, or should we combine them into one table in our new database? KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
6
Normal Forms Review 1NF 2NF Eliminate repeating groups.
Make a separate table for each set of related attributes, and give each table a primary key. 2NF Eliminate redundant data. Each attribute must be functionally dependent on the primary key. If an attribute depends on only part of a multi-valued key, remove it to a separate table. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
7
Normal Forms Review 3NF Eliminate columns not dependent on key.
If attributes do not contribute to a description of the key, remove them to a separate table. Any transitive dependencies are moved into a smaller table. BCNF Every determinant in the table is a candidate key. If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables. All normal forms are additive, in that if a model is in 3NF, it is by definition also in 2NF and 1NF. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
8
Another Example KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
9
Putting a Relation into BCNF: EQUIPMENT_REPAIR
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
10
Step 1 Is the Table in 1NF? A quick scan of the table suggests it is in 1NF. Even though a primary key is not identified, one could be determined. [Remember since no 2 rows can be identical in a relation, a candidate for the primary key can always be a composite key made up of all the attributes.] KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
11
Identify Functional Dependencies
EQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost, RepairNumber, RepairDate, RepairAmount) FD: ItemNumber (Type, AcquisitionCost) RepairNumber (ItemNumber, Type, AcquisitionCost, RepairDate, RepairAmount) KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
12
2 NF Look for a composite primary key [or candidate key]
The PK for this table could be a composite of all the attributes So the best place to start here would be to assess the determinants of the functional dependencies Hint: another way to look at this is to evaluate whether you see possible different entities KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
13
Identify Functional Dependencies
EQUIPMENT_REPAIR (ItemNumber, Type, AcquisitionCost, RepairNumber, RepairDate, RepairAmount) FD: ItemNumber (Type, AcquisitionCost) RepairNumber (ItemNumber, Type, AcquisitionCost, RepairDate, RepairAmount) Is there a determinate key that is not a candidate key? KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
14
Put into Tables ItemNumber is not a candidate key so
Move it and its attributes to a new table ITEM(ItemNumber,Type, AcquisitionCost) The determinate becomes the primary key Leave a foreign key in the original table REPAIR (ItemNumber, RepairNumber, RepairDate, RepairAmount) KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
15
Tables KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
16
3 NF Look for transitive dependencies
There are no transitive dependencies All functional dependencies have been taken care of KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
17
BCNF All determinates are candidate keys
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
18
What Does a Database Do? Stores information in a highly organized manner Manipulates information in various ways, some of which are not available in other applications or are easier to accomplish with a database Models some real world process or activity through electronic means Often called modeling a business process Often replicates the process only in appearance or end result KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
19
The Design Process Identify the purpose of the database
Review existing data Make a preliminary list of fields Make a preliminary list of tables and enter fields Identify the key fields Draft the table relationships Enter sample data and normalize the data/tables Review and finalize the design [HANDOUT: EXERCISE 1] KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
20
1. Identify purpose of the DB
Clients can tell you what information they want but have no idea what data they need. “We need to keep track of inventory” “We need an order entry system” “I need monthly sales reports” “We need to provide our product catalog on the Web” Be sure to Limit the Scope of the database. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
21
1. Continued Quite often, the stated intention implies data needs far beyond the client’s knowledge. Be sure to offer or question extension of the design to other areas. Example: Tracking inventory implies adjusting inventory in stock every time there is a sale, thus implying that some method of tracking sales is also needed. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
22
1. Continued Client may say “We have a database already for that”, which implies that you the designer may need to tap into the existing DB in some manner. Or client may say “We don’t have the budget for that this year; just do the inventory tracking part and we’ll keep track of sales manually.” thus limiting the scope of your design KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
23
2. Review Existing Data Electronic Manual Legacy database(s)
Spreadsheets Web forms Manual Paper forms Receipts and other printed output KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
24
3. Make Preliminary Field List
Make sure fields exist to support needs Ex. if client wants monthly sales reports, you need a date field for orders. Ex. To group employees by division, you need a division identifier Make sure values are atomic Ex. First and Last names stored separately Ex. Addresses broken down to Street, City, State, etc. Do not store values that can be calculated from other values Ex. “Age” can be calculated from “Date of Birth” KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
25
4. Make Preliminary Tables (and insert the fields into them)
Each table holds info about one subject Don’t worry about the quantity of tables Look for logical groupings of information Use a consistent naming convention KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
26
Naming Conventions Rules of thumb
Table names must be unique in DB; should be plural Field names must be unique in the table(s) Clearly identify table subject or field data Be as brief as possible Avoid abbreviations and acronyms Use less than 30 characters, Use letters, numbers, underscores (_) Do not use spaces or other special characters Uniqueness of field names applies to the table they are in; fields in different tables can have the same name and linked fields usually should so they are easily identified KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
27
5. Identify the Key Fields
Primary Key(s) Can never be Null; must hold unique values Automatically indexed in most RDBMSs Values rarely (if ever) change Try to include as few fields as possible Multi-field Primary Key Combination of two or more fields that uniquely identify an individual record Candidate Key Field or fields that qualify as a primary key Important in Third and Boyce-Codd Normal Forms KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
28
6. Identify Table Relationships
Based on business rules being modeled Examples: “each customer can place many orders” “all employees belong to a department” “each TA is assigned to one course” Historical note: “Relational” as in “Relational Database” has nothing to do with “relationship” as in “table relationships”. Codd was a mathematician, and devised his rules for modern databases based on mathematical set theory. In set theory, when two groups of numbers have a correspondence of some kind, this is called a “relation”, and Codd named this type of database “relational” because the database storage structure follows some of the same rules as mathematical sets, not because we relate tables together. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
29
7. Normalization Normal Forms (NF): design standards based on database design theory Normalization is the process of applying the NFs to table design to eliminate redundancy and create a more efficient organization of DB storage. Each successive NF applies an increasingly stringent set of rules Much of what we’ll talk about now and much that you’ve already run into in your own experience will tell you that common sense can avoid many of these problems. At the very least, some of the earlier steps in the design process will obviate or prevent the occurrence of these problems later in the process. But the normal forms are your safety net. If you aren’t sure about whether something belongs in a table or not, run it through the normal forms to find out. Sometimes the problem isn’t in the table you’re currently analyzing, but in one at which you’ve already looked. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
30
8. Finalizing the Design Double-check to ensure good, principle-based design Evaluate design in light of business model and determine desired deviations from design principles Process efficiency Security concerns KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
31
Design and Normalization Process Summary
Watch for repeating values and fields Check against the Normal Forms Make new tables when necessary Re-check all tables against the NFs Remember the business rules Use common sense, but check anyway! KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
32
Assessing Table Structure
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
33
Counting Rows in a Table
To count the number of rows in a table use the SQL COUNT(*) built-in aggregate function : KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
34
Reasons for Counting Rows
There are various reasons why you might need to know the row count of various database structures (tables etc), including: Determine if an application has loaded data Estimating how long a query might take to run Estimating how long update statistics might take to run Estimating how long create index might take to run Deciding why a query plan has chosen a particular join type KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
35
Examining the Columns To determine the number and type of columns in a table, use an SQL SELECT statement. To limit the number of rows retrieved, use the SQL TOP {NumberOfRows} function: KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
36
Checking Validity of Assumed Referential Integrity Constraints I
Given two tables with an assumed foreign key constraint: SKU_DATA (SKU, SKU_Description, Buyer) BUYER (Buyer, Department) Where SKU_DATA.Buyer must exist in BUYER.Buyer KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
37
Checking Validity of Assumed Referential Integrity Constraints II
To find any foreign key values that violate the foreign key constraint An empty set for the query result indicates that no foreign key values violate the foreign key constraint KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
38
Assessing Assumed Constraints
Placing constraints on how and when and where data can be entered Done after or along with table design Part of design process because many constraints are established at the database and table levels KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
39
Referential Integrity
True relational databases support Referential Integrity: every non-null foreign key value must match an existing primary key value. In other words, every record in a related table must have a matching record in the primary table. Preserves the validity of foreign key values. Enforced at database level. Why is this important? Referential Integrity helps ensure that the database contains valid and usable values and records by preserving the connection between tables. Without it, table relationships quickly become meaningless and queries return unreliable results. The most common problem in the absence of referential integrity is the creation of orphan records: the primary key value is changed, causing the matching of the related records to fail. Default in most RDBMSs is for RefInt to be turned off, probably because the software can’t tell from the table design whether you want it turned on or not. So, what happens when you want to change the value on one side of a set of related records? RefInt in its absolute form won’t allow this, so… KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
40
Levels of Enforcement Referential Integrity enforced at database level because it affects relationship between two tables. Many other business rules enforced at field and table level to ensure data integrity. Business rule implementation should be documented: how and where it is enforced in the design. Some rules can’t be enforced at table or field level; must be enforced in the application level. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
41
Testing of Business Rules
Always test business rule implementation What happens when rule is met? What happens when rule is violated? Not much good as a data entry constraint if it doesn’t constrain properly Good application or interface design will provide feedback when user violates a constraint or rule KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
42
Type of Database Updateable database, or read-only database?
If updateable database, we normally want tables in BCNF. If read-only database, we may not use BCNF tables. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
43
Designing Updatable Databases
Updatable databases are typically the operational databases of a company, such as the online transaction processing (OLTP) system discussed for Cape Codd Outdoor Sports at the beginning of Chapter 2. If you are constructing an updatable database, then you need to be concerned about modification anomalies and inconsistent data. Consequently, you must carefully consider normalization principles. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
44
Normalization: Advantages and Disadvantages
Why do we say reduce data duplication rather than eliminate data duplication? The answer is that we cannot eliminate all duplicated data because we must duplicate data in foreign keys. We cannot eliminate Buyer, for example, from the SKU_DATA table because we would then not be able to relate BUYER and SKU_DATA rows. Values of Buyer are thus duplicated in the BUYER and SKU_DATA tables. This observation leads to a second question: If we only reduce data duplication, how can we claim to eliminate inconsistent data values? Data duplication in foreign keys will not cause inconsistencies because referential integrity constraints prohibit them. As long as we enforce such constraints, the duplicate foreign key values will cause no inconsistencies. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
45
Non-Normalized Table: EQUIPMENT_REPAIR
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
46
Normalized Tables: ITEM and REPAIR
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
47
Copying Data to New Tables
To copy data from one table to another, use the SQL INSERT statement: KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
48
Final Steps In Chapters 7 and 8, you will learn how to:
Remove unneeded tables after the data is copied, using the SQL DROP TABLE statement. Create the referential integrity constraint, using the SQL ALTER TABLE statement. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
49
Choosing Not To Use BCNF
BCNF is used to control anomalies from functional dependencies. There are times when BCNF is not desirable. The classic example is ZIP codes: ZIP codes almost never change. Any anomalies are likely to be caught by normal business practices. Not having to use SQL to join data in two tables will speed up application processing. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
50
Multivalued Dependencies
Anomalies from multivalued dependencies are very problematic. Always place the columns of a multivalued dependency into a separate table (4NF). KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
51
Designing Read-Only Databases
The extracted sales data that we used for Cape Codd Outdoor Sports in Chapter 2 is a small, but typical example of a read-only database. Read-only databases are used in business intelligence (BI) systems for producing information for assessment, analysis, planning, and control, as we discussed for Cape Codd Outdoor Sports in Chapter 2. Read-only databases are commonly used in a data warehouse, which we also introduced in Chapter 2. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
52
Read-Only Databases Read-only databases are nonoperational databases using data extracted from operational databases. They are used for querying, reporting, and data mining applications. They are never updated (in the operational database sense—they may have new data imported from time to time). KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
53
Denormalization For read-only databases, normalization is seldom an advantage. Application processing speed is more important. Denormalization is the joining of the data in normalized tables prior to storing the data. The data is then stored in nonnormalized tables. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
54
Normalized Tables KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
55
Denormalizing the Data
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
56
Customized Tables I Read-only databases are often designed with many copies of the same data, but with each copy customized for a specific application. Consider the PRODUCT table: KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
57
Customized Tables II KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
58
Common Design Problems
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
59
The Multivalue, Multicolumn Problem
The multivalue, multicolumn problem occurs when multiple values of an attribute are stored in more than one column: EMPLOYEE (EmployeeNumber, EmployeeLastName, Auto2_LicenseNumber, Auto3_LicenseNumber) This is another form of a multivalued dependency. Solution = like the 4NF solution for multivalued dependencies, use a separate table to store the multiple values. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
60
Inconsistent Values I Inconsistent values occur when different users, or different data sources, use slightly different forms of the same data value: Different codings: SKU_Description = 'Corn, Large Can' SKU_Description = 'Can, Corn, Large' SKU_Description = 'Large Can Corn‘ Different spellings: Coffee, Cofee, Coffeee KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
61
Inconsistent Values II
Particularly problematic are primary or foreign key values. To detect: Use referential integrity check already discussed for checking keys. Use the SQL GROUP BY clause on suspected columns. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
62
Inconsistent Values III
KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
63
Missing Values A missing value or null value is a value that has never been provided. In a database table, a null value appears in upper case letters as NULL. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
64
Null Values Null values are ambiguous:
May indicate that a value is inappropriate; DateOfLastChildbirth is inappropriate for a male. May indicate that a value is appropriate but unknown; DateOfLastChildbirth is appropriate for a female, but may be unknown. May indicate that a value is appropriate and known, but has never been entered; DateOfLastChildbirth is appropriate for a female, and may be known but no one has recorded it in the database. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
65
Checking for Null Values
Use the SQL IS NULL operator to check for null values: KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
66
The General-Purpose Remarks Column
A general-purpose remarks column is a column with a name such as: Remarks Comments Notes It often contains important data stored in an inconsistent, verbal, and verbose way. A typical use is to store data on a customer’s interests. Such a column may: Be used inconsistently Hold multiple data items KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
67
The General-Purpose Remarks Column: Hidden Foreign Key Data
In a typical situation, the data for the foreign key may have been recorded in the Remarks column. 'Wants to buy a Piper Seneca II‘ 'Owner of a Piper Seneca II‘ 'Possible buyer for a turbo Seneca'. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
68
End of Presentation: Chapter Four
David Kroenke and David Auer Database Processing Fundamentals, Design, and Implementation (14th Edition) End of Presentation: Chapter Four KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
69
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America. KROENKE AND AUER - DATABASE PROCESSING, 14th Edition © 2016 Pearson Prentice Hall
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.