Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jerry Post Copyright © 2007 1 Database Management Systems Chapter 3 Data Normalization.

Similar presentations


Presentation on theme: "Jerry Post Copyright © 2007 1 Database Management Systems Chapter 3 Data Normalization."— Presentation transcript:

1 Jerry Post Copyright © 2007 1 Database Management Systems Chapter 3 Data Normalization

2 DATABASE 2 Objectives  Why is database design important?  What is a table and how do you choose keys?  What are the fundamental rules of database normalization?  How do you begin analyzing a form to create normalized tables?  How do you create a design in first normal form?  What is second normal form?  What is third normal form?  What problems exist beyond third normal form?  How do business rules change the database design?  What problems arise when converting a class diagram to normalized tables?  What tables are needed for the Sally’s Pet Store?  How do you combine tables from multiple forms and many developers?  How do you record the details for all of the columns and tables?

3 DATABASE 3 Why Normalization?  Need standardized data definition  Advantages of DBMS require careful design  Define data correctly and the rest is much easier  It especially makes it easier to expand database later  Method applies to most models and most DBMS  Similar to Entity-Relationship  Similar to Objects (without inheritance and methods)  Goal: Define tables carefully  Save space  Minimize redundancy  Protect data

4 DATABASE 4 Definitions  Relational database: A collection of tables.  Table: A collection of columns (attributes) describing an entity.  Individual objects are stored as rows of data in the table.  Property (attribute): a characteristic or descriptor of a class or entity.  Every table has a primary key.  The smallest set of columns that uniquely identifies any row  Primary keys can span more than one column (concatenated keys)  We often create a primary key to insure uniqueness (e.g., CustomerID, Product#,...) called a surrogate key (see slide #8). EmployeeIDTaxpayerIDLastNameFirstNameHomePhoneAddress 12512888-22-5552CartomAbdul(603) 323-9893252 South Street 15293222-55-3737VenetiaanRoland(804) 888-6667937 Paramaribo Lane 22343293-87-4343JohnsonJohn(703) 222-9384234 Main Street 29387837-36-2933StenheimSusan(410) 330-98378934 W. Maple Employee Properties Rows/Objects Class: Employee Primary key

5 DATABASE 5 Keys  Primary key  Every table (object) must have a primary key  Uniquely identifies a row (one-to-one)  Concatenated (or composite) key  Multiple columns needed for primary key  Identify repeating relationships (1 : M or M : N)  Key columns are underlined  First step  Collect user documents  Identify possible keys: unique or repeating relationships

6 DATABASE 6 Notation Table name Primary key is underlined Table columns Customer(CustomerID, Phone, Name, Address, City, State, ZipCode) CustomerIDPhoneLastNameFirstNameAddressCityStateZipcode 1502-666-7777JohnsonMartha125 Main StreetAlvatonKY42122 2502-888-6464SmithJack873 Elm StreetBowling GreenKY42101 3502-777-7575WashingtonElroy95 Easy StreetSmith’s GroveKY42171 4502-333-9494AdamsSamuel746 Brown DriveAlvatonKY42122 5502-474-4746RabitzVictor645 White AvenueBowling GreenKY42102 6616-373-4746SteinmetzSusan15 Speedway DrivePortlandTN37148 7615-888-4474LasaterLes67 S. Ray DrivePortlandTN37148 8615-452-1162JonesCharlie867 Lakeside DriveCastalian SpringsTN37031 9502-222-4351ChavezJuan673 Industry Blvd.CaneyvilleKY42721 10502-444-2512RojoMaria88 Main StreetCave CityKY42127

7 DATABASE 7 Identifying Key Columns Orders OrderItems OrderIDDateCustomer 83675-5-046794 83685-6-049263 OrderIDItemQuantity 83672292 83672534 83678761 83685554 83682291 Each order has only one customer. So Customer is not part of the key. Each order has many items. Each item can appear on many orders. So OrderID and Item are both part of the key.

8 DATABASE 8 Surrogate Keys  Real world keys sometimes cause problems in a database.  Example: Customer  Avoid phone numbers: people may not notify you when numbers change.  Avoid SSN (privacy and most businesses are not authorized to ask for verification, so you could end up with duplicate values)  Often best to let the DBMS generate unique values  Access: AutoNumber  SQL Server: Identity  Oracle: Sequences (but require additional programming)  Drawback: Numbers are not related to any business data, so the application needs to hide them and provide other look up mechanisms.

9 DATABASE 9 Common Order System Customer Order Salesperson Item OrderItem 1 * 1 * 1 1 * * Customer(CustomerID, Name, Address, City, Phone) Salesperson(EmployeeID, Name, Commission, DateHired) Order(OrderID, OrderDate, CustomerID, EmployeeID) OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description, ListPrice)

10 DATABASE 10 Database Normalization Rules  1.Each cell in a table contains atomic (single-valued) data.  2.Each non-key column depends on all of the primary key columns (not just some of the columns).  3.Each non-key column depends on nothing outside of the key columns.

11 DATABASE 11 Atomic Values for Phone Numbers CustomerIDLastNameFirstNamePhoneFaxCellPhone 15023JonesMary222-3034222-4094223-0984 63478SanchezMiguel030-9693403-4094 94552O’ReillyMadelline849-4948292-3332139-3831 45791SteinMarta294-4421 49004BriseMer764-5103

12 DATABASE 12 Repeating Values for Phone Numbers CustomerIDLastNameFirstNamePhone 15023JonesMary222-3034 222-4094 223-0984 63478SanchezMiguel030-9693 403-4094 94552O’ReillyMadeline849-4948 292-3332 139-3831 339-4040 45791SteinMarta294-4421 49004BriseMer764-5103

13 DATABASE 13 Repeating Values for Phone Numbers CustomerIDLastNameFirstName 15023JonesMary 63478SanchezMiguel 94552O’ReillyMadeline 45791SteinMarta 49004BriseMer CustomerIDPhoneTypePhone 15023Land222-3034 15023Fax222-4094 15023Cell223-0984 63478Land030-9693 63478Fax403-4094 94552Land849-4948 94552Fax292-3332 94552Cell139-3831 94552Laptop339-4040 45791Land294-4421 49004Land764-5103

14 DATABASE 14 Simple Form Customer IDCompany Name City Contact LastName, FirstName Phone

15 DATABASE 15 Initial Design Customer(CustomerID, CompanyName, City) Contact(ContactID, CustomerID, LastName, FirstName)

16 DATABASE 16 Client Billing Example Client Billing Client(ClientID, Name, Address, BusinessType) Partner(PartnerID, Name, Speciality, Office, Phone) PartnerAssignment(PartnerID, ClientID, DateAcquired) Billing(ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled) Each partner can be assigned many clients. Each client can be assigned to many partners.

17 DATABASE 17 Client Billing--Different Rules Client(ClientID, Name, Address, BusinessType) Partner(PartnerID, Name, Speciality, Office, Phone) X PartnerAssignment(PartnerID, ClientID, DateAcquired) Billing(ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled) combine If each client is assigned to only one partner. then PartnerID needs not be a primary key. Combine Client and PartnerAssignment tables as below, since they have the same key. Client(ClientID, Name, Address, BusinessType, PartnerID, DataAcquired)

18 DATABASE 18 Client Billing--New Assumptions ClientIDPartnerIDDate/TimeItemDescriptionHoursAmountBilled 1159638-4-04 10:03967Stress analysis2$500 2959678-5-04 11:15754New Design3$750 1159638-8-04 09:30967Stress analysis2.5$650 Billing More realistic assumptions for a large firm. Each Partner may work with many clients. Each client may work with many partners. Each partner and client may work together many times. The identifying feature is the date/time of the service. What happens if you do not include Date/Time as a key?

19 DATABASE 19 Sample Database for Sales Sale IDDate Customer First Name Last Name Address City, State ZIPCode ItemIDDescriptionList PriceQuantityQOHValue Total A sample check out screen QOH: Quantity On Hand

20 DATABASE 20 Initial Objects Initial ObjectPrimary KeySample Properties CustomerCustomerIDName Address Phone ItemItemIDDescription List Price Quantity On Hand SaleSaleIDSale Date SaleItemsSaleID + ItemIDQuantity

21 DATABASE 21 Initial Form Evaluation Sale ID Date Customer First Name Last Name Address City, State ZIPCode ItemIDDescriptionList Price QuantityQOHValue Total SaleForm(SaleID, SaleDate, CustomerID, FirstName, LastName, Address, City, State, ZIPCode, (ItemID, Description, ListPrice, Quantity, QuantityOnHand) ) Identify potential keys. Identify repeating groups.

22 DATABASE 22 Problems with Repeating Sections SaleForm(SaleID, SaleDate, CustomerID, FirstName, LastName, Address, City, State, ZIPCode, (ItemID, Description, ListPrice, Quantity, QuantityOnHand) ) SaleIDDateCIDFirstNameLastNameAddressCityStateZIPItemIDDescriptionListPriceQuantityQOH 118517/1515023MaryJones111 ElmChicagoIL6060115 27 32 Air Tank Regulator Mask 1557 192.00 251.00 65.00 211211 15 5 6 118527/1563478MiguelSanchez222 OroMadrid15 33 Air Tank Mask 2020 192.00 91.00 4141 15 3 118537/1615023MaryJones111 ElmChicagoIL6060141 75 Snorkel 71 Wet suit-S 44.00 215.00 2121 15 3 118547/1794552MadelineO’Reilly333 TamDublin75 32 57 Wet suit-S Mask 1557 Snorkel 95 215.00 65.00 83.00 211211 3 6 17 Repeating section Not atomicDuplication

23 DATABASE 23 First Normal Form SaleForm(SaleID, SaleDate, CustomerID, FirstName, LastName, Address, City, State, ZIPCode, (ItemID, Description, ListPrice, Quantity, QuantityOnHand) ) SaleForm2(SaleID, SaleDate, CustomerID, FirstName, LastName, Address, City, State, ZIPCode) SaleLine(SaleID, ItemID, Description, ListPrice, Quantity, QuantityOnHand) Split SaleForm into SaleForm2 and SaleLine

24 DATABASE 24 Current Design

25 DATABASE 25 Multiple Repeating: Independent Groups FormA(Key1, Simple Columns, (Group1, A, B, C), (Group2, X, Y) ) MainTable(Key1, Simple Columns) Group1(Key1, Group1, A, B, C)Group2(Key1, Group2, X, Y)

26 DATABASE 26 Nested Repeating Sections  Nested: Table (Key1, aaa... (Key2, bbb... (Key3, ccc...) ) )  First Normal Form (1NF)  Table1(Key1, aaa...)  Table2(Key1, Key2, bbb..)  Table3(Key1, Key2, Key3, ccc...) Table (Key1,... (Key2,... (Key3,...) ) ) Table1(Key1,...)TableA (Key1,Key2...(Key3,...) ) Table2 (Key1, Key2...)Table3 (Key1, Key2, Key3,...)

27 DATABASE 27 First Normal Form Problems (Data) SaleLine(SaleID, ItemID, Description, ListPrice, Quantity, QuantityOnHand) SaleIDItemIDDescriptionListPriceQuantityQOH 1185115Air Tank192.00215 1185127Regulator251.0015 1185132Mask 155765.0016 1185215Air Tank192.00415 1185233Mask 202091.0013 1185341Snorkel 7144.00215 1185375West suit-S215.0013 1185475Wet suit-S215.0023 1185432Mask 155765.0016 1185457Snorkel 9583.00117 Duplication for columns that depend only on ItemID not on both SaleID & ItemID (the primary key). Every time an item is sold, the clerk has to reenter its description, list price, and quantity on hand. Also, if an item has not yet been sold, what is its ListPrice? (no item information yet)

28 DATABASE 28 Second Normal Form Definition  Each non-key column must depend on the primary key.  Only applies to concatenated keys  Some columns only depend on part of the key  Split those into a new table.  Dependence (definition)  If given a value for the key you always know the value of the property in question, then that property is said to depend on the key.  If you change part of a key, and the questionable property does not change, then the table is not in 2NF. (not depends on the primary keys) SaleLine(SaleID, ItemID, Description, ListPrice, Quantity, QuantityOnHand) Depends on both SaleID and ItemID Depend only on ItemID

29 DATABASE 29 Second Normal Form Example SaleLine(SaleID, ItemID, Description, ListPrice, Quantity, QuantityOnHand) SaleItems(SaleID, ItemID, Quantity) Item(ItemID, Description, ListPrice, QuantityOnHand) Split SaleLine into SaleItems and Item

30 DATABASE 30 Second Normal Form Example (Data) SaleIDItemIDQuantity 11851152 11851271 11851321 11852154 11852331 11853412 11853751 11854752 11854321 11854571 ItemIDDescriptionListPriceQOH 15Air Tank192.0015 27Regulator251.005 32Mask 155765.006 33Mask 202091.003 41Snorkel 7144.0015 57Snorkel 9583.0017 75Wet suit-S215.003 77Wet suit-M215.007 SaleItems(SaleID, ItemID, Quantity) Item(ItemID, Description, ListPrice, QuantityOnHand)

31 DATABASE 31 Second Normal Form in DB Design

32 DATABASE 32 Second Normal Form Problems (Data) SaleIDDateCustomerIDFirstNameLastNameAddressCityStateZIP 118517/1515023MaryJones111 ElmChicagoIL60601 118527/1563478MiguelSanchez222 OroMadrid 118537/1615023MaryJones111 ElmChicagoIL60601 118547/1794552MadelineO’Reilly333 TamDublin SaleForm2(SaleID, SaleDate, CustomerID, FirstName, LastName, Address, City, State, ZIPCode) Duplication

33 DATABASE 33 Third Normal Form Definition SaleForm2(SaleID, SaleDate, CustomerID, FirstName, LastName, Address, City, State, ZIPCode) Depend on SaleID Depend on CustomerID

34 DATABASE 34 Third Normal Form Example SaleForm2(SaleID, SaleDate, CustomerID, FirstName, LastName, Address, City, State, ZIPCode) SaleIDDateCustomerID 118517/1515023 118527/1563478 118537/1615023 118547/1794552 CustomerIDFirstNameLastNameAddressCityStateZIP 15023MaryJones111 ElmChicagoIL60601 63478MiguelSanchez222 OroMadrid 94552MadelineO’Reilly333 TamDublin Sale(SaleID, SaleDate, CustomerID) Customer(CustomerID, FirstName, LastName, Address, City, State, ZIPCode) Split SaleForm2 into Sale and Customer

35 DATABASE 35 Third Normal Form Tables

36 DATABASE 36 Third Normal Form Tables Customer(CustomerID, FirstName, LastName, Address, City, State, ZIPCode) Sale(SaleID, SaleDate, CustomerID) SaleItems(SaleID, ItemID, Quantity) Item(ItemID, Description, ListPrice, QuantityOnHand)

37 DATABASE 37 3NF Rules/Procedure  Split out repeating sections  Be sure to include a key from the parent section in the new piece so the two parts can be recombined.  Verify that the keys are correct  Is each row uniquely identified by the primary key?  Are one-to-many and many-to-many relationships correct?  Check “many” for keyed columns and “one” for non-key columns.  Make sure that each non-key column depends on the whole primary key and nothing but the key.  No hidden dependencies.

38 DATABASE 38 Checking Your Work (Quality Control)  Look for one-to-many relationships.  Many side should be keyed (underlined).  e.g., VideosRented(TransID, VideoID,...).  Check each column and ask if it should be 1 : 1 or 1: M.  If add a key, renormalize.  Verify no repeating sections (1NF)  Check 3NF  Check each column and ask:  Does it depend on the whole key and nothing but the key?  Verify that the tables can be reconnected (joined) to form the original tables (draw lines).  Each table represents one object.  Enter sample data -- look for replication.

39 DATABASE 39 Boyce-Codd Normal Form (BCNF)  Business rules. (See the auto shop table in next slide) a. Each employee may have many specialties. b. Each specialty has many managers. c. Employee has only one manager for each specialty. d. Each manager has only one specialty. (have hidden dependency: Manager  Specialty in the table) Employee-Specialty(EID, Specialty, Manager) Employee(EID, Manager) Manager(Manager, Specialty) c d ab Split Employee-Specialty into Employee and Manager

40 DATABASE 40 Employee-Specialty Table (auto shop) EmployeeIDSpecialtyManager 0654body / paintingJohn 1493brakeRobert 1493tireBill 3920air conditioningDavid 4782brakeRobert 4782engineEric 4782transmissionKen 7395body / paintingSteve 8157electrical systemPeter

41 DATABASE 41 Fourth Normal Form (Keys)  Business rules.  Each employee has many specialties.  Each employee has many tools.  Tools and specialties are unrelated (hidden dependency). EmployeeTasks(EID, Specialty, ToolID) in 3rd normal form EmployeeSpecialty(EID, Specialty) EmployeeTools(EID, ToolID)

42 DATABASE 42 EmployeeTasks Table (auto shop) EmployeeIDSpecialtyToolID 0654body / paintingt3603 1493braket2309 1493tiret1472 3920air conditioningt3681 4782braket5705 4782enginet6830 4782transmissiont8924 7395body / paintingt7552 8157electrical systemt6350

43 DATABASE 43 Domain-Key Normal Form (DKNF) EmployeeTask(EmployeeID, TaskID, Tool) Business rules. Each employee performs many tasks with many tools. Each Tool can be used for many tasks, or Each task has a standard list of tools (but some employees may use others) Need to add: RequiredTools(TaskID, ToolID) But, it is really BCNF: TaskID  ToolID is hidden. But, this dependency might not have been explicitly stated.

44 DATABASE 44 EmployeeTask Table (auto shop) EmployeeIDTaskIDTool 0654ta3401hammer 1493ta5270screw driver 1493ta2793screw driver 3920ta4756pipe wrench 4782ta5270screw driver 4782ta1638wrench 4782ta5729screw driver 7395ta3401hammer 8157ta8294voltage meter

45 DATABASE 45 No Hidden Dependencies  The simple normalization rules:  Remove repeating sections  Each non-key column must depend on the whole key and nothing but the key.  There must be no hidden dependencies.  Solution: Split the table.  Make sure you can rejoin the two pieces to recreate the original data relationships.  For some hidden dependencies within keys, double-check the business assumption to be sure that it is realistic. Sometimes you are better off with a more flexible assumption.

46 DATABASE 46 Data Rules and Integrity  Simple business rules  Limits on data ranges Price > 0 Salary < 100,000 DateHired > 1/12/1995  Choosing from a set Gender = M, F, Unknown Jurisdiction=City, County, State, Federal  Referential Integrity  Foreign key values in one table must exist in the master table.  Rental(RentID, RentDate, C#,…)  C# is a foreign key  C# must exist in the customer table. RentIDRentDateC#… 11731-4-04321 11741-5-04938 11851-8-04337 11901-9-04321 11921-9-04776 Rental C#NamePhone… 321Jones9983- 337Sanchez7738- 938Carson8738- Customer No data for this customer yet!

47 DATABASE 47 SQL Foreign Key (Oracle, SQL Server) CREATE TABLE Order (OIDNUMBER(9) NOT NULL, OdateDATE, CIDNUMBER(9), CONSTRAINT pk_Order PRIMARY KEY (OID), CONSTRAINT fk_OrderCustomer FOREIGN KEY (CID) REFERENCES Customer (CID) ON DELETE CASCADE )

48 DATABASE 48 Effect of Business Rules Key business rules: A player can play on only one team. There is one referee per match.

49 DATABASE 49 Business Rules 1 There is one referee per match. A player can play on only one team. Match(MatchID, DatePlayed, Location, RefID) Score(MatchID, TeamID, Score) Referee(RefID, Phone, Address) Team(TeamID, Name, Sponsor) Player(PlayerID, Name, Phone, DoB, TeamID) PlayerStats(MatchID, PlayerID, Points, Penalties) RefID and TeamID are not keys in the Match and Team tables, because of the one-to-one rules.

50 DATABASE 50 Business Rules 2 There can be several referees per match. A player can play on several teams (substitute), but only on one team per match. Match(MatchID, DatePlayed, Location, RefID) Score(MatchID, TeamID, Score) Referee(RefID, Phone, Address) Team(TeamID, Name, Sponsor) Player(PlayerID, Name, Phone, DoB, TeamID) PlayerStats(MatchID, PlayerID, Points, Penalties) To handle the many-to-many relationship, we need to make RefID and TeamID keys. But if you leave them in the same tables, the tables are not in 3NF. DatePlayed does not depend on RefID. Player Name does not depend on TeamID.

51 DATABASE 51 Business Rules 2: Normalized Match(MatchID, DatePlayed, Location) RefereeMatch(MatchID, RefID) Score(MatchID, TeamID, Score) Referee(RefID, Phone, Address) Team(TeamID, Name, Sponsor) Player(PlayerID, Name, Phone, DoB) PlayerStats(MatchID, PlayerID, TeamID, Points, Penalties) There can be several referees per match. A player can play on several teams (substitute), but only on one team per match.

52 DATABASE 52 Converting a Class Diagram to Normalized Tables Supplier Purchase Order Item Raw Materials Assembled Components Office Supplies Employee Manager 1* 1* * * subtypes 1*

53 DATABASE 53 One-to-Many Relationships Supplier Purchase Order 1* Supplier(SID, Name, Address, City, State, Zip, Phone) Employee(EID, Name, Salary, Address, …) PurchaseOrder(POID, Date, SID, EID) Employee 1* The many side becomes a key (underlined). Each PO has one supplier and employee. (Do not key SID or EID) Each supplier can receive many POs. (Key PO) Each employee can place many POs. (Key PO)

54 DATABASE 54 One-to-Many Sample Data Supplier Employee Purchase Order POIDDateSIDEID 222349-9-20045676221 222359-10-20045676554 222369-10-20047831221 222379-11-20048872335

55 DATABASE 55 Many-to-Many Relationships Purchase Order Item * * PurchaseOrder(POID, Date, SID, EID) POItem(POID, ItemID, Quantity, PricePaid) Item(ItemID, Description, ListPrice) * * 1 1 Each POID can have many Items (key/underline ItemID). Each ItemID can be on many POIDs (key POID). Need the new intermediate table (POItem) because: You cannot put ItemID into PurchaseOrder because Date, SID, and EID do not depend on the ItemID. You cannot put POID into Item because Description and ListPrice do not depend on POID. Purchase Order Item * * POItem 1 1

56 DATABASE 56 Many-to-Many Sample Data POIDDateSIDEID 222349/95676221 222359/105676554 222369/107831221 222379/118872335 POIDItemIDQuantityPrice 2223444409832.00 22234444185125.00 22235444185424.00 2223655582810150.00 2223655598215800.00 Purchase Order Item POItem ItemIDDescriptionListPrice 444098Staples2.00 444185Paper28.00 555828Wire158.00 555982Sheet steel5928.00 888371Brake assembly152.00

57 DATABASE 57 N-ary Associations Employee Name... Component CompID Type Name Product ProductID Type Name * * * Assembly EmployeeID CompID ProductID 1 1 1

58 DATABASE 58 Generalization or Subtypes Item Raw Materials Assembled Components Office Supplies Item(ItemID, Description, ListPrice) RawMaterials(ItemID, Weight, StrengthRating) AssembledComponents(ItemID, Width, Height, Depth) OfficeSupplies(ItemID, BulkQuantity, Discount) Add new tables for each subtype. Use the same key as the generic type (ItemID)--one-to-one relationship. Add the attributes specific to each subtype.

59 DATABASE 59 Subtypes Sample Data ItemIDDescriptionListPrice 444098Staples2.00 444185Paper28.00 555828Wire158.00 555982Sheet steel5928.00 888371Brake assembly152.00 ItemIDWeightStrengthRating 555828572000 55598225788321 ItemIDWidthHeightDepth 888371131.5 Item RawMaterials AssembledComponents OfficeSupplies ItemIDBulkQuantityDiscount 4440982010% 4441851015%

60 DATABASE 60 Composition Bicycle Size Model Type … Wheels Crank Stem Bicycle SerialNumber ModelType WheelID CrankID StemID … Components ComponentID Category Description Weight Cost

61 DATABASE 61 Recursive Relationships Employee Manager Employee Employee(EID, Name, Salary, Address, Manager) 1* Add a manager column that contains Employee IDs. An employee can have only one manager. (Manager is not a key.) A manager can supervise many employees. (EID is a key.)

62 DATABASE 62 Normalization Examples  Possible topics  Auto repair  Auto sales  Department store  Hair stylist  HRM department  Law firm  Manufacturing  National Park Service  Personal stock portfolio  Pet shop  Restaurant  Social club  Sports team

63 DATABASE 63 Multiple Views & View Integration  Collect multiple views  Documents  Reports  Input forms  Create normalized tables from each view  Combine the views into one complete model.  Keep meta-data in a data dictionary  Type of data  Size  Volume  Usage  Example  Federal Emergency Management Agency (FEMA). Disaster planning and relief.  Make business assumptions as necessary, but try to keep them simple.

64 DATABASE 64 The Pet Store: Sales Form Sales(SaleID, Date, CustomerID, Name, Address, City, State, Zip, EmployeeID, Name, (AnimalID, Name, Category, Breed, DateOfBirth, Gender, Registration, Color, ListPrice, SalePrice), (ItemID, Description, Category, ListPrice, SalePrice, Quantity))

65 DATABASE 65 The Pet Store: Purchase Animals AnimalOrder(OrderID, OrderDate, ReceiveDate, SupplierID, Name, Contact, Phone, Address, City, State, Zip, EmployeeID, Name, Phone, DateHired, (AnimalID, Name, Category, Breed, Gender, Registration, Cost), ShippingCost)

66 DATABASE 66 The Pet Store: Purchase Merchandise MerchandiseOrder(PONumber, OrderDate, ReceiveDate, SupplierID, Name, Contact, Phone, Address, City, State, Zip, EmployeeID, Name, HomePhone, (ItemID, Description, Category, Price, Quantity, QuantityOnHand), ShippingCost)

67 DATABASE 67 Pet Store Normalization Sale(SaleID, Date, CustomerID, EmployeeID) SaleAnimal(SaleID, AnimalID, SalePrice) SaleItem(SaleID, ItemID, SalePrice, Quantity) Customer(CustomerID, Name, Address, City, State, Zip) Employee(EmployeeID, Name) Animal(AnimalID, Name, Category, Breed, DateOfBirth, Gender, Registration, Color, ListPrice) Merchandise(ItemID, Description, Category, ListPrice) AnimalOrder(OrderID, OrderDate, ReceiveDate, SupplierID, EmpID, ShipCost) AnimalOrderItem(OrderID, AnimalID, Cost) Supplier(SupplierID, Name, Contact, Phone, Address, City, State, Zip) Employee(EmployeeID, Name, Phone, DateHired) Animal(AnimalID, Name, Category, Breed, Gender, Registration, Cost) MerchandiseOrder(PONumber, OrderDate, ReceiveDate, SID, EmpID, ShipCost) MerchandiseOrderItem(PONumber, ItemID, Quantity, Cost) Supplier(SupplierID, Name, Contact, Phone, Address, City, State, Zip) Employee(EmployeeID, Name, Phone) Merchandise(ItemID, Description, Category, QuantityOnHand)

68 DATABASE 68 Pet Store View Integration Sale(SaleID, Date, CustomerID, EmployeeID) SaleAnimal(SaleID, AnimalID, SalePrice) SaleItem(SaleID, ItemID, SalePrice, Quantity) Customer(CustomerID, Name, Address, City, State, Zip) Employee(EmployeeID, Name, Phone, DateHired) Animal(AnimalID, Name, Category, Breed, DateOfBirth, Gender, Registration, Color, ListPrice, Cost) Merchandise(ItemID, Description, Category, ListPrice, QuantityOnHand) AnimalOrder(OrderID, OrderDate, ReceiveDate, SupplierID, EmpID, ShipCost) AnimalOrderItem(OrderID, AnimalID, Cost) Supplier(SupplierID, Name, Contact, Phone, Address, City, State, Zip) Employee(EmployeeID, Name, Phone, DateHired) Animal(AnimalID, Name, Category, Breed, Gender, Registration, Cost) MerchandiseOrder(PONumber, OrderDate, ReceiveDate, SID, EmpID, ShipCost) MerchandiseOrderItem(PONumber, ItemID, Quantity, Cost) Supplier(SupplierID, Name, Contact, Phone, Address, City, State, Zip) Employee(EmployeeID, Name, Phone) Merchandise(ItemID, Description, Category, QuantityOnHand)

69 DATABASE 69 Pet Store Class Diagram

70 DATABASE 70 Rolling Thunder Integration Example Bicycle Assembly form. The main EmployeeID control is not stored directly, but the value is entered in the assembly column when the employee clicks the column.

71 DATABASE 71 Initial Tables for Bicycle Assembly BicycleAssembly ( SerialNumber, Model, Construction, FrameSize, TopTube, ChainStay, HeadTube, SeatTube, PaintID, PaintColor, ColorStyle, ColorList, CustomName, LetterStyle, EmpFrame, EmpPaint, BuildDate, ShipDate, (Tube, TubeType, TubeMaterial, TubeDescription), (CompCategory, ComponentID, SubstID, ProdNumber, EmpInstall, DateInstall, Quantity, QOH) ) Bicycle (SerialNumber, Model, Construction, FrameSize, TopTube, ChainStay, HeadTube, SeatTube, PaintID, ColorStyle, CustomName, LetterStyle, EmpFrame, EmpPaint, BuildDate, ShipDate) Paint (PaintID, ColorList) BikeTubes (SerialNumber, TubeID, Quantity) TubeMaterial (TubeID, Type, Material, Description) BikeParts (SerialNumber, ComponentID, SubstID, Quantity, DateInstalled, EmpInstalled) Component (ComponentID, ProdNumber, Category, QOH)

72 DATABASE 72 Rolling Thunder: Purchase Order

73 DATABASE 73 RT Purchase Order: Initial Tables PurchaseOrder (PurchaseID, PODate, EmployeeID, FirstName, LastName, ManufacturerID, MfgName, Address, Phone, CityID, CurrentBalance, ShipReceiveDate, (ComponentID, Category, ManufacturerID, ProductNumber, Description, PricePaid, Quantity, ReceiveQuantity, ExtendedValue, QOH, ExtendedReceived), ShippingCost, Discount) PurchaseOrder (PurchaseID, PODate, EmployeeID, ManufacturerID, ShipReceiveDate, ShippingCost, Discount) Employee (EmployeeID, FirstName, LastName) Manufacturer (ManufacturerID, Name, Address, Phone, Address, CityID, CurrentBalance) City (CityID, Name, ZipCode) PurchaseItem (PurchaseID, ComponentID, Quantity, PricePaid, ReceivedQuantity) Component (ComponentID, Category, ManufacturerID, ProductNumber, Description, QOH)

74 DATABASE 74 Rolling Thunder: Transactions

75 DATABASE 75 RT Transactions: Initial Tables ManufacturerTransactions (ManufacturerID, Name, Phone, Contact, BalanceDue, (TransDate, Employee, Amount, Description) ) Manufacturer (ManufacturerID, Name, Phone, Contact, BalanceDue) ManufacturerTransaction (ManufacturerID, TransactionDate, EmployeeID, Amount, Description)

76 DATABASE 76 Rolling Thunder: Components

77 DATABASE 77 RT Components: Initial Tables ComponentForm (ComponentID, Product, BikeType, Category, Length, Height, Width, Weight, ListPrice,Description, QOH, ManufacturerID, Name, Phone, Contact, Address, ZipCode, CityID, City, State, AreaCode) Component (ComponentID, ProductNumber, BikeType, Category, Length, Height, Width,Weight, ListPrice, Description, QOH, ManufacturerID) Manufacturer (ManufacturerID, Name, Phone, Contact, Address, ZipCode, CityID) City (CityID, City, State, ZipCode, AreaCode)

78 DATABASE 78 RT: Integrating Tables POMfr(ManufacturerID, Name, Address, Phone, CityID, CurrentBalance), MfgMfr(ManufacturerID, Name, Phone, Contact, BalanceDue) CompMfr(ManufacturerID, Name, Phone, Contact, Address, ZipCode, CityID) Duplicate Manufacturer tables: Note that each form can lead to duplicate tables. Look for tables with the same keys, but do not expect them to be named exactly alike. Find all of the data and combine it into one table. Manufacturer(ManufacturerID, Name, Contact, Address, Phone, Address, CityID, |ZipCode, CurrentBalance)

79 DATABASE 79 RT Example: Integrated Tables Bicycle(SerialNumber, Model, Construction, FrameSize, TopTube, ChainStay, HeadTube, SeatTube,PaintID, ColorStyle, CustomName, LetterStyle, EmpFrame, EmpPaint, BuildDate, ShipDate) Paint(PaintID, ColorList) BikeTubes(SerialNumber, TubeID, Quantity) TubeMaterial(TubeID, Type, Material, Description) BikeParts(SerialNumber, ComponentID, SubstID, Quantity, DateInstalled, EmpInstalled) Component(ComponentID, ProductNumber, BikeType, Category, Length, Height, Width, Weight, ListPrice, Description, QOH, ManufacturerID) PurchaseOrder(PurchaseID, PODate, EmployeeID, ManufacturerID, ShipReceiveDate, ShippingCost, Discount) PurchaseItem(PurchaseID, ComponentID, Quantity, PricePaid, ReceivedQuantity) Employee(EmployeeID, FirstName, LastName) Manufacturer(ManufacturerID, Name, Contact, Address, Phone, CityID, ZipCode, CurrentBalance) ManufacturerTransaction(ManufacturerID, TransactionDate, EmployeeID, Amount, Description, Reference) City(CityID, City, State, ZipCode, AreaCode)

80 DATABASE 80 CustomerID Phone FirstName LastName Address ZipCode CityID BalanceDue Customer CustomerID TransDate EmployeeID Amount Description Reference CustomerTrans StoreID StoreName Phone ContacFirstName ContactLastName Address Zipcode CityID RetailStore State TaxRate StateTaxRate SerialNumber CustomerID ModelType PaintID FrameSize OrderDate StartDate ShipDate ShipEmployee FrameAssembler Painter Construction WaterBottle CustomName LetterStyleID StoreID EmployeeID TopTube ChainStay HeadTubeAngle SeatTueAngle ListPrice SalePrice SalesTax SaleState ShipPrice FramePrice ComponentList Bicycle CityID ZipCode City State AreaCode Population1990 Population1980 Country Latitude Longitude Customer ModelType Description ComponentID ModelType Paint EmployeeID TaxpayerID LastName FirstName HomePhone Address ZipCode CityID DateHired DateReleased CurrentManager SalaryGrade Salary Title WorkArea Employee SerialNumber TubeID Quantity BicycleTube ModelType MSize TopTube ChainStay TotalLength GroundClearance HeadTubeAngle SeatTubeAngle ModelSize LetterStyle Description LetterStyle PurchaseID EmployeeID ManufacturerID TotalList ShippingCost Discount OrderDate ReceiveDate AmountDue PurchaseOrder SerialNumber TubeName TubeID Length BikeTubes SerialNumber ComponentID SubstituteID Location Quantity DateInstalled EmployeeID BikeParts PurchaseID ComponentID PricePaid Quantity QuantityReceived PurchaseItem ManufacturerID ManufacturerName ContactName Phone Address ZipCode CityID BalanceDue Manufacturer CompGroup GroupName BikeType Year EndYear Weight Groupo ComponentID ManufacturerID ProductNumber Road Category Length Height Width Weight Year EndYear Description ListPrice EstimatedCost QuantityOnHand Component ManufacturerID TransactionDate EmployeeID Amount Description Reference ManufacturerTrans TubeID Material Description Diameter Thickness Roundness Weight Stiffness ListPrice Construction TubeMaterial GroupID ComponentID GroupCompon ComponentName AssemblyOrder Description ComponentName PaintID ColorName ColorStyle ColorList DateIntroduced DateDiscontinued Rolling Thunder Tables

81 DATABASE 81 View Integration (FEMA Example 1) Team Roster Team#Date FormedLeader Home BaseNameFaxPhone Response time (days)Address, C,S,ZHome phone Total Salary Team Members/Crew IDNameHome phoneSpecialtyDoBSSNSalary  This first form is kept for each team that can be called on to help in emergencies.

82 DATABASE 82 View Integration (FEMA Example 2)  Major problems are reported to HQ to be prioritized and scheduled for correction. Disaster NameHQ LocationOn-Site Problem Report Local AgencyCommander Political Contact Date ReportedAssigned Problem#Severity Problem Description Reported By:SpecialtySpecialty Rating Verified By:SpecialtySpecialty Rating SubProblem Details Total Est. Cost Sub Prob#CategoryDescriptionActionEst. Cost

83 DATABASE 83 View Integration (FEMA Example 3)  On-site teams examine buildings and file a report on damage at that location. Location Damage AnalysisDate Evaluated LocationID, AddressTeam LeaderTitleRepair Priority Latitude, LongitudeCellular PhoneDamage Description Item Loss Total Estimated Damage Total RoomDamage Descrip.Damage% ItemValue$Loss

84 DATABASE 84 View Integration (FEMA Example 3a)  Location Analysis(LocationID, MapLatitude, MapLongitude, Date, Address, Damage, PriorityRepair, Leader, LeaderPhone, LeaderTitle, (Room, Description, PercentDamage, (Item, Value, Loss)))

85 DATABASE 85 View Integration (FEMA Example 4)  Teams file task completion reports. If a task is not completed, the percentage accomplished is reported as the completion status. Task Completion ReportDate Disaster NameDisaster RatingHQ Phone Problem#SupervisorDate Total Expenses Problem#SupervisorDate Total Expenses SubProblemTeam#Team SpecialtyCompletionStatusCommentExpenses

86 DATABASE 86 View Integration (FEMA Example 4a)  TasksCompleted(Date, DisasterName, DisasterRating, HQPhone, (Problem#, Supervisor, (SubProblem, Team#, CompletionStatus, Comments, Expenses))

87 DATABASE 87 DBMS Table Definition  Enter Tables  Columns  Keys  Data Types Text Memo Number  Byte  Integer, Long  Single, Double Date/Time Currency AutoNumber (Long) Yes/No OLE Object  Descriptions  Column Properties  Format  Input Mask  Caption  Default  Validation Rule  Validation Text  Required & Zero Length  Indexed  Relationships  One-to-One  One-to-Many  Referential Integrity  Cascade Update/Delete  Define before entering data

88 DATABASE 88 Table Definition in Access Key Numeric Subtypes or text length

89 DATABASE 89 Graphical Table Definition in Oracle

90 DATABASE 90 Graphical Table Definition in SQL Server

91 DATABASE 91 SQL Table Definition CREATE TABLE Animal ( AnimalID INTEGER, Name VARCHAR2(50), Category VARCHAR2(50), Breed VARCHAR2(50), DateBorn DATE, Gender VARCHAR2(50)CHECK (Gender='Male' Or Gender='Female' Or Gender='Unknown' Or Gender Is Null), Registered VARCHAR2(50), Color VARCHAR2(50), ListPrice NUMBER(38,4) DEFAULT 0, Photo LONG RAW, ImageFile VARCHAR2(250), ImageHeight INTEGER, ImageWidth INTEGER, CONSTRAINT pk_Animal PRIMARY KEY (AnimalID), CONSTRAINT fk_BreedAnimal FOREIGN KEY (Category,Breed) REFERENCES Breed(Category,Breed) ON DELETE CASCADE, CONSTRAINT fk_CategoryAnimal FOREIGN KEY (Category) REFERENCES Category(Category) ON DELETE CASCADE );

92 DATABASE 92 Oracle Databases  For Oracle and SQL Server, it is best to create a text file that contains all of the SQL statements to create the table.  It is usually easier to modify the text table definition.  The text file can be used to recreate the tables for backup or transfer to another system.  To make major modifications to the tables, you usually create a new table, then copy the data from the old table, then delete the old table and rename the new one. It is much easier to create the new table using the text file definition.  Be sure to specify Primary Key and Foreign Key constraints.  Be sure to create tables in the correct order—any table that appears in a Foreign Key constraint must first be created. For example, create Customer before creating Order.  In Oracle, to substantially improve performance, issue the following command once all tables have been created: Analyze table Animal compute statistics;

93 DATABASE 93 Data Volume  Estimate the total size of the database.  Current.  Future growth.  Guide for hardware and software purchases.  For each table.  Use data types to estimate the number of bytes used for each row.  Multiply by the estimated number of rows.  Add the value for each table to get the total size.  For concatenated keys (and similar tables).  OrderItems(O#, Item#, Qty)  Hard to “know” the total number of items ordered. Start with the total number of orders. Multiply by the average number of items on a typical order.  Need to know time frame or how long to keep data.  Do we store all customer data forever?  Do we keep all orders in the active database, or do we migrate older ones?

94 DATABASE 94 Data Volume Example Customer(C#, Name, Address, City, State, Zip) Order(O#, C#, Odate) OrderItem(O#, P#, Quantity, SalePrice) Row: 4 + 15 + 25 + 20 + 2 + 10 = 76 Row: 4 + 4 + 8 = 16 Row: 4 + 4 + 4 + 8 = 20  Business rules  Three year retention.  1000 customers.  Average 10 orders per customer per year.  Average 5 items per order.  Customer76 * 100076,000  Order16 * 30,000480,000  OrderItem20 * 150,0003,000,000  Total3,556,000

95 DATABASE 95 Appendix: Formal Definitions: Terms FormalDefinitionInformal RelationA set of attributes with data that changes over time. Often denoted R. Table AttributeCharacteristic with a real-world domain. Subsets of attributes are multiple columns, often denoted X or Y. Column TupleThe data values returned for specific attribute sets are often denoted as t[X] Row of data SchemaCollection of tables and constraints/relationships Functional dependency X  YBusiness rule dependency

96 DATABASE 96 Appendix: Functional Dependency Derives from a real-world relationship/constraint. Denoted X  Yfor sets of attributes X and Y Holds when any rows of data that have identical values for X attributes also have identical values for their Y attributes: If t1[X] = t2[X], then t1[Y] = t2[Y] X is also known as a determinant if X is non-trivial (not a subset of Y).

97 DATABASE 97 Appendix: Keys Keys are attributes that are ultimately used to identify rows of data. A key K (sometimes called candidate key) is a set of attributes (1)With FD K  Uwhere U is all other attributes in the relation (2)If K’ is a subset of K, then there is no FD K’  U A set of key attributes functionally determines all other attributes in the relation, and it is the smallest set of attributes that will do so (there is no smaller subset of K that determines the columns.)

98 DATABASE 98 Appendix: First Normal Form A relation is in first normal form (1NF) if and only if all attributes are atomic. Atomic attributes are single valued, and cannot be composite, multi- valued or nested relations. Example: Customer(CID, Name: First + Last, Phones, Address) CIDName: First + LastPhonesAddress 111Joe Jones111-2223 111-3393 112-4582 123 Main

99 DATABASE 99 Appendix: Second Normal Form A relation is in second normal form (2NF) if it is in 1NF and each non-key attribute is fully functionally dependent on the primary key. K  Aifor each non-key attribute Ai That is, there is no subset K’ such that K’  Ai Example: OrderProduct(OrderID, ProductID, Quantity, Description) OrderIDProductIDQuantityDescription 32151Blue Hose 32162Pliers 33151Blue Hose

100 DATABASE 100 Appendix: Transitive Dependency Given functional dependencies: X  Y and Y  Z, the transitive dependency X  Z must also hold. Example: There is an FD between OrderID and CustomerID. Given the OrderID key attribute, you always know the CustomerID. There is an FD between CustomerID and the other customer data, because CustomerID is the primary key. Given the CustomerID, you always know the corresponding attributes for Name, Phone, and so on. Consequently, given the OrderID (X), you always know the corresponding customer data by transitivity.

101 DATABASE 101 Appendix: Third Normal Form A relation is in third normal form if and only if it is in 2NF and no non- key attributes are transitively dependent on the primary key. That is, K  Aifor each attribute, (2NF) and There is no subset of attributes X such that K  X  Ai Example: Order(OrderID, OrderDate, CustomerID, Name, Phone) OrderIDOrderDateCustomerIDNamePhone 325/5/20041 Jones 222-3333 335/5/20042Hong444-8888 345/6/20041Jones222-3333

102 DATABASE 102 Appendix: Boyce-Codd Normal Form A relation is in Boyce-Codd Normal Form (BCNF) if and only if it is in 3NF and every determinant is a candidate key (or K is a superkey). That is, K  Aifor every attribute, and there is no subset X (key or nonkey) such that X  Ai where X is different from K. EIDSpecialityManagerID 32Drill1 33Weld2 34Drill1 Example: Employees can have many specialties, and many employees can be within a specialty. Employees can have many managers, but a manager can have only one specialty: Mgr  Specialty EmpSpecMgr(EID, Specialty, ManagerID) FD ManagerID  Specialty is not currently a key.

103 DATABASE 103 Appendix: Multi-Valued Dependency A multi-valued dependency (MVD) exists when there are at least three attributes in a relation (A, B, and C; and they could be sets), and one attribute (A) determines the other two (B and C) but the other two are independent of each other. That is, A  B and A  C but B and C have no FDs Example: Employees have many specialties and many tools, but tools and specialties are not directly related.

104 DATABASE 104 Appendix: Fourth Normal Form A relation is in fourth normal form 4NF if and only if it is in BCNF and there are no multi-valued dependencies. That is, all attributes of R are also functionally dependent on A. If A   B, then all attributes of R are also functionally dependent on A: A  Aifor each attribute. Example: EmpSpecTools(EID, Specialty, ToolID) EmpSpec(EID, Specialty) EmpTools(EID, ToolID)

105 DATABASE 105 DBMS Chapter 3 Homework  Due: 2/25/2009  Pages in textbook: 157 - 166  Review Questions: 1, 2, 3, 6, 10  Exercises: 4, 6  Note: Use Access to draw the class diagram for each exercise


Download ppt "Jerry Post Copyright © 2007 1 Database Management Systems Chapter 3 Data Normalization."

Similar presentations


Ads by Google