Presentation on theme: "5/10/2015 1 Normalizing Your Database and Why you WANT to do it! INFYS540 Lesson 7 Chapter 5 Appendix."— Presentation transcript:
5/10/2015 1 Normalizing Your Database and Why you WANT to do it! INFYS540 Lesson 7 Chapter 5 Appendix
5/10/2015 2 Why do we make our “databases” in spreadsheets? We use a few massive tables –“Lots of tables make the database complex” –Discomfort with databases and multiple tables Because we “think it’s simple” –Skip organizing the data into relational tables –Go straight to designing forms NAME POSITIONSPOUSECHILDRENPHONE JonesChiefGloria,Karen3274 SmithClerkBetty3241 JonesChiefMary Glorai, Karen3296
5/10/2015 3 Data Redundancy Problems Redundancy breeds errors –Same data defined in multiple places is BAD –Spelling/typographical error prone –Lack of data integrity Inability to perform simple queries Inflexibility and inscalability Impossible to MAINTAIN!
5/10/2015 4 Shared Data Poorly organized data prevents sharing that data with other “databases” Think of all the “databases” that lists your name, department, etc.: Messiah College Phone List Database Students Using College Networked Computers Students Using Dining Facility Students Using Nursing Facility
5/10/2015 5 Relational Database PROJECTCHIEF Project Project Chief Computing 333-22-1111 Intranet 987-65-4321 Contracting 123-45-6789 CAT 333-22-1111 DEPARTMENTS DeptDept. DirectorRoom MLD181-94-5676B115 C2G987-65-4321123 M&B123-45-6789147 EMPLOYEES LNameFNameSSN Dept JonesMike123-45-6789 M&B SmithTony987-65-4321 C2G LeeBruce 567-89-1234 MLD DoodleYankee333-22-1111 M&B 1 1 What is a candidate key? What is a primary key? What is a foreign key?
5/10/2015 6 Database Management System Computer program designed to help a user store and retrieve data –Access, Oracle, DB2
5/10/2015 8 First Things First Purpose of the DB Who will use it What type of tasks What are the data sources What output is required
5/10/2015 9 Data Modeling Determine Data Requirements Entity Class something that can be identified in environment each entity class is a separate table each entity becomes a separate row in a table Attributes property or characteristic of entity each characteristic of an entity class become a column each characteristic of an entry become an entry in table Keys one or more attributes that uniquely identified an entity Constraints values or rules the DBMS must enforce
5/10/2015 10 Example Employee SSN L Name F Name Rank Spouse Children Office Phone# Home Phone# Office Room# Dept Dept. Chief EmpProj Project Name Employee SSN Function Must know all constraints on data –project name is unique –only one chief per project –employees can have more than one phone# –employees can have only one office –many employees can use the same office
5/10/2015 11 Purpose of Normalization Take advantage of the powerful tools available in a DBMS There are five levels of Normalization –The higher the Normal Form the “better” and more efficient the database –But, increasing the levels of Normal Form takes time and effort –For most applications, 3rd Normal Form will solve most potential problems with a DB
5/10/2015 12 Normalizing Database Process of creating well-structured tables. Improve performance, integrity of data 5-step process (w/ 2 rules) to achieve Third Normal Form (3NF) First two steps put DB into a form so you can normalize it
5/10/2015 13 Rule #1 in Databases Never design redundant data into a Database duplicate data is not consistent duplicate data wastes space
5/10/2015 14 Step 1. Primary Keys A primary key is one or more data fields (columns) that uniquely identify each record in the table What would the primary key be below? –“table of employees, assigned to a department.” EMPLOYEES LNameFNameSSNDept JonesMike123-45-6789Math SmithTony987-65-4321 M&B LeeBruce 567-89-1234Science
5/10/2015 15 Step 1. Primary Keys Answer: The SSN It is the only “guaranteed” unique column in the table. Names are easily repeated. EMPLOYEES LNameFNameSSNDept JonesMike123-45-6789Math SmithTony987-65-4321 M&B LeeBruce 567-89-1234Science
5/10/2015 16 Step 1. Primary Keys Now try the following example: “A table of projects assigned to employees, listing the project name and the employee’s function on the project.” EmpProj CounterSSNProjectFunction 1123-45-6789DiningDesigner 2123-45-6789ComputingDesigner 3987-65-4321ContractingDesigner 4444-55-6666IntranetWebmaster 5222-99-7777DiningOverwatch A Counter --The MS Access Default Key
5/10/2015 17 Step 1. Primary Keys It is the combination of the SSN and the Project fields. Why? EMPLOYEES’ PROJECTS CounterSSNProjectFunction 1123-45-6789DiningDesigner 2123-45-6789ComputingDesigner 3987-65-4321ContractingDesigner 4444-55-6666IntranetWebmaster 5222-99-7777DiningOverwatch
5/10/2015 18 Step 1. Primary Keys Because, you can have the following: EMPLOYEES’ PROJECTS CounterSSNProjectFunction 1123-45-6789DiningDesigner 2123-45-6789DiningDesigner 3987-65-4321IntranetDesigner 4444-55-6666IntranetWebmaster 5222-99-7777DiningOverwatch Redundant records! (Redundancy = BAD)
5/10/2015 19 Rule #2 about Databases NEVER Use a Counter as a Primary Key
5/10/2015 20 Step 2: Eliminate Many-to-Many Relationships What is wrong with the following table? “a table of personnel authorized access to a project” PROJECTS QUERY ACCESS ProjectAccess_1Access_2Access_3 Dining222-99-7777181-94-5676 Computing222-99-7777181-94-5676 Intranet987-65-4321818-49-6765123-45-6789
5/10/2015 21 Step 2: Eliminate Many-to-Many Relationships Here’s essentially what this table looks like within the Access relationships diagram: Projects: Project Project Chief Department Access_1 Access_2 Access_3 Employees: SSN Last Name First Name.... has access to info about
5/10/2015 22 Step 2: Eliminate Many-to-Many Relationships Here’s how you model it in a database: –Break it up into two one-to-many relationships Projects: Project Project Chief Department.... Employees: SSN Last Name First Name.... Access to Project Info: Project SSN 1 1
5/10/2015 23 Step 2: Eliminate Many-to-Many Relationships How to do it: –The primary key of the new table is the composite of the primary keys of the existing tables. Primary key of Projects = Project Name Primary key of Employees = SSN New table primary key of Project Name and SSN
5/10/2015 24 Step 2: Eliminate Many-to-Many Relationships –No artificial restrictions on number of people with access –You can add attributes about the types of access granted –You can easily query who has access to information about each project EMPLOYEE LNameFNameSSN JonesMike123-45-6789 SmithTony987-65-4321 LeeBruce 567-89-1234 DoodleYankee333-22-1111 PROJ QUERY ACCESS ProjectSSN Dining 222-99-7777 Dining 181-94-5676 Computing 222-99-7777 Computing 181-94-5676 Intranet 987-65-4321 Intranet 818-49-6765 Intranet 123-45-6789 PROJECT Project ProjectChief Dept Computing 333-22-1111 MATH Intranet 987-65-4321 M&B Contracting 123-45-6789 M&B CAT 333-22-1111 Admin
5/10/2015 25 What is wrong with the following? “A table of PCs, which are loaded with many different applications, and assigned to a user.” PCSerial#LoadedSoftwareAssigned 10291Word, Powerpoint, ccMailJones 10301 Word, Powerpoint, Lotus NotesSmith 10311 Word, LotusNotes, Borland C++ Hacker
5/10/2015 26 “Atomic” - the data occupying a field cannot be further broken down. –i.e., no multi-data entries –i.e., “No attributes can have more than one value for a single instance of an entity” PCSerial#LoadedSoftwareAssigned 10291Word, Powerpoint, ccMailJones If not atomic, updating is complex and error prone If not atomic, can not easily query the database Step 3: Achieving 1NF: All Data must be Atomic
5/10/2015 27 Step 3 Answer PCSerial#LoadedSoftwareAssigned 10291Word Jones 10291Powerpoint Jones 10291ccMailJones 10301 Word Smith 10301 Powerpoint Smith 10301 LotusNotesSmith 10311 WordHacker 10311LotusNotes Hacker 10311Borland C++ Hacker
5/10/2015 28 Step 3. Achieving 1NF: All Data must be Atomic Another source of redundancy: calculated fields TotalYTD Age DaysRemaining Solution: Use a Query! Remove all calculated fields from table and create a query...then use the query whenever you need up-to-date data
5/10/2015 29 Step 4. Achieving 2NF: Eliminate Partial Dependencies What is a partial dependency? –Look at the table. What’s redundant? –“A table of functions an employee is assigned to for a project, and the project chief.” EMPLOYEES’ PROJECTS SSNProjectFunctionProject Chief 123-45-6789DiningDesigner222-99-7777 123-45-6789ComputingDesigner333-88-5656 123-45-6789IntranetMember987-65-4321 987-65-4321IntranetDesigner987-65-4321 444-55-6666IntranetWebmaster987-65-4321 222-99-7777DiningOverwatch222-99-7777
5/10/2015 30 Step 4. Achieving 2NF: Eliminate Partial Dependencies Function depends on the entire primary key: SSN and Project. ProjectChief is dependent on just a portion of the primary key EMPLOYEES’ PROJECTS SSNProjectFunctionProjectChief 123-45-6789DiningDesigner222-99-7777 123-45-6789ComputingDesigner333-88-5656 123-45-6789IntranetMember987-65-4321 987-65-4321IntranetDesigner987-65-4321 444-55-6666IntranetWebmaster987-65-4321 222-99-7777DiningOverwatch222-99-7777
5/10/2015 31 Step 4. Achieving 2NF: Eliminate Partial Dependencies Why is this bad? –Well, what’s wrong with the following? EMPLOYEES’ PROJECTS SSNProjectFunctionProject Chief 123-45-6789DiningDesigner222-99-7777 123-45-6789ComputingDesigner333-88-5656 123-45-6789IntranetMember987-65-4321 987-65-4321IntranetDesigner987-65-4321 444-55-6666IntranetWebmaster222-99-7777 222-99-7777DiningOverwatch222-99-7777
5/10/2015 32 Step 4. Achieving 2NF: Eliminate Partial Dependencies A partial dependency (PD) occurs when a non- key field depends on only a part of the primary key, and not the whole primary key. PDs are a relation. So, we need a new table..... EMPLOYEES’ PROJECTS SSNProjectFunctionProject Chief 123-45-6789DiningDesigner222-99-7777 123-45-6789ComputingDesigner333-88-5656 123-45-6789IntranetMember987-65-4321 987-65-4321IntranetDesigner987-65-4321 444-55-6666IntranetWebmaster987-65-4321 222-99-7777DiningOverwatch222-99-7777
5/10/2015 33 Step 4. Achieving 2NF: Eliminate Partial Dependencies Here’s how it should look...... EMPLOYEES’ PROJECTS SSNProjectFunction 123-45-6789DiningDesigner 123-45-6789ComputingDesigner 123-45-6789IntranetMember 987-65-4321IntranetDesigner 444-55-6666IntranetWebmaster 222-99-7777DiningOverwatch PROJECTS ProjectProject Chief Dining222-99-7777 Computing333-88-5656 Intranet987-65-4321
5/10/2015 34 Step 5: Achieving 3NF: Eliminate Transitive Dependencies What is wrong with the following table? PROJECTS ProjectProject ChiefDept.Dept. DirectorRoom Dining222-99-7777Admin181-94-5676B115 Computing333-88-5656Admin181-94-5676B115 Intranet987-65-4321M&B818-49-6765123 Contracting187-87-8787M&B818-49-6765123 CAT333-22-1111Grounds123-45-6789147
5/10/2015 35 Step 5: Achieving 3NF: Eliminate Transitive Dependencies We have fields dependent on a non-key field: –The Director and Room fields clearly relate to the Dept., and have nothing to do with the project. (Dept is a “determinant” that is not a candidate key) PROJECTS Project Project ChiefDept.Dept. DirectorRoom Dining 222-99-7777Admin181-94-5676B115 Computing 333-88-5656Admin181-94-5676B115 Intranet 987-65-4321M&B818-49-6765123 Contracting 187-87-8787M&B818-49-6765123 CAT 333-22-1111GRND123-45-6789147
5/10/2015 36 Step 5: Achieving 3NF: Eliminate Transitive Dependencies A transitive dependency occurs when a non-key field depends on another non-key field. Why is this bad?. –A typo appeared in the Contracting line. A database without the transitive dependency would not have allowed this to happen. PROJECTS ProjectProject ChiefDept.Dept. DirectorRoom Dining222-99-7777Admin181-94-5676B115 Computing333-88-5656Admin181-94-5676B115 Intranet987-65-4321M&B818-49-6765123 Contracting187-87-8787M&B818-49-6765 124 CAT333-22-1111GRND123-45-6789147
5/10/2015 37 Step 5: Achieving 3NF: Eliminate Transitive Dependencies How to do it: a. Which fields are dependent on a non-key field in the table? (Director, Room) b. Which fields are these dependent on? (Dept) c. Create a new table with (b) as the primary key. d. Put (a) in the new table. e. Remove (a) from the old table.
5/10/2015 38 Step 5: Achieving 3NF: Eliminate Transitive Dependencies Here are the new tables. PROJECTS ProjectProject ChiefDept. Dining222-99-7777Admin Computing333-88-5656Admin Intranet987-65-4321M&B Contracting187-87-8787M&B CAT333-22-1111GRND DEPARTMENTS Dept. NameDept. DirectorRoom Admin181-94-5676B115 M&B818-49-6765123 GRND123-45-6789147
5/10/2015 39 Data Analysis: Normalization An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity. An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it. An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes.
5/10/2015 40 Common Sense Test Sometimes it is not worth normalizing a table –example: zip codes is a functional dependency –city/state are attributes of the zip code and not a person’s address –may not want to normalize a table if it is significantly easier to process as is duplicates are not important
5/10/2015 41 Conclusion Rule1: Never design redundant data into a database Rule2: Never use a counter as Primary Key Identify proper primary keys (1NF) Break up many-to-many relationships (1NF) 1NF: Break all data into atomic components 2NF: Identify/eliminate partial dependencies 3NF: Eliminate transitive dependencies Common sense test
5/10/2015 42 What is a Good Data Model? –A good data model is simple. As a general rule, the data attributes that describe an entity should describe only that entity. –A good data model is essentially non- redundant. This means that each data attribute, other than foreign keys, describes at most one entity. –A good data model should be flexible and adaptable to future needs. We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs.
5/10/2015 43 Database Design Introduction –The design of any database will usually involve the DBA and database staff. They will handle the technical details and cross- application issues. –It is useful for the systems analyst to understand the basic design principles for relational databases.
5/10/2015 44 Goals and Prerequisites to Database Design –The data model may have to be divided into multiple data models to reflect database distribution and database replication decisions. Data distribution refers to the distribution of either specific tables, records, and/or fields to different physical databases. Data replication refers to the duplication of specific tables, records, and/or fields to multiple physical databases. –Each sub-model or view should reflect the data to be stored on a single server.
5/10/2015 45 The Database Schema –The design of a database is depicted as a special model called a database schema. A database schema is the physical model or blueprint for a database. It represents the technical implementation of the logical data model. A relational database schema defines the database structure in terms of tables, keys, indexes, and integrity rules. A database schema specifies details based on the capabilities, terminology, and constraints of the chosen database management system.
5/10/2015 46 The Database Schema –Transforming the logical data model into a physical relational database schema rules and guidelines: 1Each fundamental, associative, and weak entity is implemented as a separate table. –The primary key is identified as such and implemented as an index into the table. –Each secondary key is implemented as its own index into the table. –Each foreign key will be implemented as such. –Attributes will be implemented with fields. These fields correspond to columns in the table.
5/10/2015 47 The Database Schema –Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) –The following technical details must usually be specified for each attribute. Data type. Each DBMS supports different data types, and terms for those data types. Size of the Field. Different DBMSs express precision of real numbers differently. NULL or NOT NULL. Must the field have a value before the record can be committed to storage? Domains. Many DBMSs can automatically edit data to ensure that fields contain legal data. Default. Many DBMSs allow a default value to be automatically set in the event that a user or programmer submits a record without a value.
5/10/2015 48 The Database Schema –Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) 2Supertype/subtype entities present additional options as follows: –Most CASE tools do not currently support object-like constructs such as supertypes and subtypes. –Most CASE tools default to creating a separate table for each entity supertype and subtype. –If the subtypes are of similar size and data content, a database administrator may elect to collapse the subtypes into the supertype to create a single table. 3Evaluate and specify referential integrity constraints.
5/10/2015 49 Data and Referential Integrity –There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity. –Key Integrity: Every table should have a primary key (which may be concatenated). –The primary key must be controlled such that no two records in the table have the same primary key value. –The primary key for a record must never be allowed to have a NULL value.
5/10/2015 50 Data and Referential Integrity –Domain Integrity: Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values. –Referential Integrity: A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table.
5/10/2015 51 Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: –No restriction. Any record in the table may be deleted without regard to any records in any other tables. –Delete:Cascade. A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table. –Delete:Restrict. A deletion of a record in the table must be disallowed until any matching records are deleted from a related table. –Delete:Set Null. A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL.
5/10/2015 52 Database Design Roles –Some database shops insist that no two fields have exactly the same name. This presents an obvious problem with foreign keys –A role name is an alternate name for a foreign key that clearly distinguishes the purpose that the foreign key serves in the table. –The decision to require role names or not is usually established by the data or database administrator.
5/10/2015 53 Database Prototypes –Prototyping is not an alternative to carefully thought out database schemas. –On the other hand, once the schema is completed, a prototype database can usually be generated very quickly. –Most modern DBMSs include powerful, menu-driven database generators that automatically create a DDL and generate a prototype database from that DDL. A database can then be loaded with test data that will prove useful for prototyping and testing outputs, inputs, screens, and other systems components.
5/10/2015 54 Database Capacity Planning –A database is stored on disk. The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available. –Database capacity planning can be calculated with simple arithmetic as follows. 1For each table, sum the field sizes. –This is the record size for the table. 2For each table, multiply the record size times the number of entity instances to be included in the table. –This is the table size.
5/10/2015 55 Database Capacity Planning –Database capacity planning can be calculated with simple arithmetic as follows. (continued) 3Sum the table sizes. –This is the database size. 4Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above. –This is the anticipated database capacity.
5/10/2015 56 Database Structure Generation –CASE tools are frequently capable of generating SQL code for the database directly from a CASE-based database schema. This code can be exported to the DBMS for compilation. Even a small database model can require 50 pages or more of SQL data definition language code to create the tables, indexes, keys, fields, and triggers. Clearly, a CASE tool’s ability to automatically generate syntactically correct code is an enormous productivity advantage. Furthermore, it almost always proves easier to modify the database schema and re-generate the code, than to maintain the code directly.
5/10/2015 57 The Next Generation of Database Design Introduction Relational database technology is widely deployed and used in contemporary information system shops. One new technology is slowly emerging that could ultimately change the landscape dramatically – object database management systems. The heir apparent to relational DBMSs, object database management systems store true objects, that is, encapsulated data and all of the processes that can act on that data. Because relational database management systems are so widely used, we don’t expect this change to happen quickly. It is expected that these vendors will either build object technology into their existing relational DBMSs, or they will create new, object DBMSs and provide for the transition between relational and object models.
5/10/2015 58 Summary Introduction Conventional Files Versus the Database Database Concepts for the Systems Analyst Data Analysis for Database Design File Design Database Design The Next Generation of Database Design