Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 2141 – Intro to Database Systems

Similar presentations


Presentation on theme: "CSCI 2141 – Intro to Database Systems"— Presentation transcript:

1 CSCI 2141 – Intro to Database Systems
Summary

2 The Design Process Normalization process produces good tables from poor ones Ensure that the proposed entities meet the required normal form (normally 3NF) before creating table structures Normalization should hence be part of the design process Common for experienced designers to make mistakes These mistakes can come to light during normalization checks Database design is an iterative process You normally might not get it right the first time

3 The Design Process First step – Producing an Entity-Relationship Diagram (ERD) Creating an ERD is an iterative process Identify relevant entities, their attributes and relationships Create an ERD Examine the ERD to identify additional entities and attributes ERD provides the macro view of an organizations data requirements and operations Second step – Normalize the tables Normalization focuses on specific entities and its attributes Provides a micro view of entities within the ERD Go to step 1 and repeat if required

4 Example Let us examine the operations of a contracting company
The company manages many projects Each project requires the services of many employees An employee may be assigned to several different projects Some employees are not assigned to any project and perform other duties (e.g. executive secretary) Some employees are part of a labor pool, to be shared by all project teams Each employee has a single primary job classification that determines the hourly billing rate Many employees can have the same job classification

5 Example Initial ERD PROJECT (PROJ_NUM, PROJ_NAME)
EMPLOYEE (EMP_NUM, EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOURS) Identify the normal forms of the above tables

6 Example PROJECT is in 3NF EMPLOYEE is in 2NF
Single attribute PK in 1NF => 2NF Transitive dependency JOB_DESCRIPTION → JOB_CHG_HOURS Converting Employee to 3NF results in another table JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOURS) Redraw modified ERD

7 Example At this time EMPLOYEE and PROJECT have a many to many relationship This needs to be broken down Add an ASSIGNMENT entity

8 Example Note the use of surrogate PK in several tables
Design properly reflects requirements, and all tables are in 3NF

9 Improving the Database Design
Primary Key assignments Using integers as PK instead of characters or strings Fewer chances of making mistakes, hence lesser referential integrity errors Naming conventions Associate attributes with entities using entity prefixes E.g. EMP_ID or EMPLOYEE_ID or EMP_NUM JOB_DESC or JOB_DESCRIPTION

10 Improving the Database Design
Attribute atomicity Make attributes atomic Atomic attribute cannot be further sub-divided E.g. LNAME, FNAME, INITIAL instead of NAME ST_NUM, ST_NAME, CITY, STATE, COUNTRY, ZIP instead of ADDRESS What about DAY, MONTH, YEAR instead of DATE? Evaluate benefits of programming overhead against need Not usually needed as processing date is easy, and functions available in DB to get date, month and year without much additional overhead Separating date every time you want to store it and then recombining every time you want to use adds an additional overhead and complexity to your program, especially since date functions to extract day, month and year etc. are available in SQL However, you might decide to use a trigger to save the birth year in a separate column every time a birth date is entered, in a denormalized DB (simple GROUP BY YEAR available in that case)

11 Improving the Database Design
Identify new attributes Consider the real world implementation and future extension E.g. Employee table may need EMPLOYEE_HIREDATE if organization has decisions based on job longevity (bonus?) SALARY, INSURANCE, BENEFITS payments (if required) Identify new relationships Employee – Manager relationship, Better to have a new table or simply a “manages” relation between Employee and Project table?

12 Improving the Database Design
PK refinement for data granularity What does PROJ_CHG_HOURS represent Daily, weekly, monthly, yearly or life-time total? Different ways to store PROJ_CHG_HOURS Evaluate benefit of one against the other In first, it represents the total accumulated charges for the project for that employee In the second, we can store charges for individual components of a project

13 Improving the Database Design
Maintain historical accuracy See tables in 3NF Can we calculate total amount charged for an employee on a particular project using these tables? ASSIGN_HOURS x CHG_HOUR Will historical accuracy be maintained if employee details change? ASSIGN_CHG_HOUR needs to be stored in the Assignment table along with ASSIGN_HOURS If a Software Engineer is promoted to Senior Software Engineer, and CHG_HOURS change from $40 to $60 per hour, how do we calculate the charges for a previously completed project?

14 Improving the Database Design
Consider using derived attributes ASSIGN_HOURS x ASSIGN_CHG_HOUR = ASSIGN_CHG Evaluate storing ASSIGN_CHG as the actual charged amount in the table This creates a transitive dependency but maintains historical accuracy, reduces run-time calculation complexity and human-errors Saves reporting time if many transactions need to be reported or summarized, and simplifies writing of application software

15 Improving the Database Design
Using the enhancements suggested, the new DB is as shown:

16 Denormalization Previous example showed us that database design can be improved using Denormalization Denormalization is the process of introducing redundancy in tables to improve performance A good RDBMS design is normalized to eliminate any unnecessary duplications This produces many tables and relationships In order to generate information, data must be put together from many tables Joining several tables takes additional I/O operations and processing logic, thus reducing speed

17 Denormalization Denormalization introduces redundancies
Advantage of higher processing speed must be carefully weighed against the disadvantage of data anomalies In addition, Denormalization may produce larger tables, making updates and indexing slower Consider a ZIP table: ZIP (ZIP_CODE, CITY, STATE) Do we really need to store this in a separate table?

18 Design Exercise 1 Suppose a manufacturer produces three high-cost, low-volume products: P1, P2, and P3. Product P1 is assembled with components C1 and C2; product P2 is assembled with components Cl, C3, and C4; and product P3 is assembled with components C2 and C3. Components may be purchased from several vendors, as shown in the following table. Vendor Components Supplied V1 V2 V3 C1, C2 C1, C2, C3, C4 C1, C2, C4

19 Design Exercise 1 Each product has a unique serial number, as does each component. To track product performance, careful records are kept to ensure that each product’s components can be traced to the component supplier. Products are sold directly to final customers; that is, no wholesale operations are permitted. The sales records include the customer identification and the product serial number. Using the preceding information, do the following: Write the business rules governing the production and sale of the products. Create an ER diagram capable of supporting the manufacturer’s product/component tracking requirements.

20 Design Exercise 1 PRODUCT COMPONENTS Business Rule P C1 C2 P C C3 C4 P C2 C3 1. A component can be part of several products, and a product is made up of several components. VENDOR COMPONENTS SUPPLIED Business Rule V C1 C2 V C1 C2 C3 C4 V C1 C C4 2. A component can be supplied by several vendors, and a vendor supplies several components.

21 Design Exercise 1 – Initial ERD
M:N relationships between PRODUCT and COMPONENT and between COMPONENT and VENDOR that have been converted through the composite entities PROD_COMP and COMP_VEND

22 Design Exercise 1 – Sample Data
Any problems that you can notice here? Component C1 in Product P1 is supplied by which Vendor? V1, V2 or V3?

23 Design Exercise 1 – Problems in ERD
We used default optionalities Various PRODUCTs do not necessarily contain all available COMPONENTs, All VENDORs do not supply all COMPONENTs Business Rule 3: Each product has a unique serial number, as does each component. To keep track of product performance, careful records are kept to ensure that each product's components can be traced to the component supplier. Currently, each product represents a product line Currently, each product represents a collection of products belonging to the same product type or line, rather than a specific product occurrence with a unique serial number

24 Design Exercise 1 – Solution
One way to produce the tracking capability required by business rule 3 is to use a ternary relationship between PRODUCT, COMPONENT, and VENDOR

25 Ternary Relationship The ER diagram we have just shown represents a many‑to‑many‑to‑many TERNARY relationship, expressed by M:N:P. This ternary relationship indicates that: A product is composed of many components and a component appears in many products. A component is provided by many vendors and a vendor provides many components. A product contains components of many vendors and a vendor's components appear in many products.

26 Design Exercise 1 – Solution
Assigning attributes to the SERIALS entity, we may draw the dependency diagram shown below: P_SERIAL partial dependency C_SERIAL PROD_TYPE COMP_TYPE VEND_CODE

27 Design Exercise 1 – Solution
P_SERIAL PROD_TYPE C_SERIAL COMP_TYPE VEND_CODE Table name: P_SERIAL Table name: C_SERIAL Table name: SERIAL The Normalized Dependency Diagrams

28 Design Exercise 1 – Solution


Download ppt "CSCI 2141 – Intro to Database Systems"

Similar presentations


Ads by Google