Presentation on theme: "5/10/2015 1 Normalizing Your Database and Why you WANT to do it! INFYS540 Lesson 7 Chapter 5 Appendix."— Presentation transcript:
5/10/ Normalizing Your Database and Why you WANT to do it! INFYS540 Lesson 7 Chapter 5 Appendix
5/10/ Why do we make our “databases” in spreadsheets? We use a few massive tables –“Lots of tables make the database complex” –Discomfort with databases and multiple tables Because we “think it’s simple” –Skip organizing the data into relational tables –Go straight to designing forms NAME POSITIONSPOUSECHILDRENPHONE JonesChiefGloria,Karen3274 SmithClerkBetty3241 JonesChiefMary Glorai, Karen3296
5/10/ Data Redundancy Problems Redundancy breeds errors –Same data defined in multiple places is BAD –Spelling/typographical error prone –Lack of data integrity Inability to perform simple queries Inflexibility and inscalability Impossible to MAINTAIN!
5/10/ Shared Data Poorly organized data prevents sharing that data with other “databases” Think of all the “databases” that lists your name, department, etc.: Messiah College Phone List Database Students Using College Networked Computers Students Using Dining Facility Students Using Nursing Facility
5/10/ Relational Database PROJECTCHIEF Project Project Chief Computing Intranet Contracting CAT DEPARTMENTS DeptDept. DirectorRoom MLD B115 C2G M&B EMPLOYEES LNameFNameSSN Dept JonesMike M&B SmithTony C2G LeeBruce MLD DoodleYankee M&B 1 1 What is a candidate key? What is a primary key? What is a foreign key?
5/10/ Database Management System Computer program designed to help a user store and retrieve data –Access, Oracle, DB2
5/10/ First Things First Purpose of the DB Who will use it What type of tasks What are the data sources What output is required
5/10/ Data Modeling Determine Data Requirements Entity Class something that can be identified in environment each entity class is a separate table each entity becomes a separate row in a table Attributes property or characteristic of entity each characteristic of an entity class become a column each characteristic of an entry become an entry in table Keys one or more attributes that uniquely identified an entity Constraints values or rules the DBMS must enforce
5/10/ Example Employee SSN L Name F Name Rank Spouse Children Office Phone# Home Phone# Office Room# Dept Dept. Chief EmpProj Project Name Employee SSN Function Must know all constraints on data –project name is unique –only one chief per project –employees can have more than one phone# –employees can have only one office –many employees can use the same office
5/10/ Purpose of Normalization Take advantage of the powerful tools available in a DBMS There are five levels of Normalization –The higher the Normal Form the “better” and more efficient the database –But, increasing the levels of Normal Form takes time and effort –For most applications, 3rd Normal Form will solve most potential problems with a DB
5/10/ Normalizing Database Process of creating well-structured tables. Improve performance, integrity of data 5-step process (w/ 2 rules) to achieve Third Normal Form (3NF) First two steps put DB into a form so you can normalize it
5/10/ Rule #1 in Databases Never design redundant data into a Database duplicate data is not consistent duplicate data wastes space
5/10/ Step 1. Primary Keys A primary key is one or more data fields (columns) that uniquely identify each record in the table What would the primary key be below? –“table of employees, assigned to a department.” EMPLOYEES LNameFNameSSNDept JonesMike Math SmithTony M&B LeeBruce Science
5/10/ Step 1. Primary Keys Answer: The SSN It is the only “guaranteed” unique column in the table. Names are easily repeated. EMPLOYEES LNameFNameSSNDept JonesMike Math SmithTony M&B LeeBruce Science
5/10/ Step 1. Primary Keys Now try the following example: “A table of projects assigned to employees, listing the project name and the employee’s function on the project.” EmpProj CounterSSNProjectFunction DiningDesigner ComputingDesigner ContractingDesigner IntranetWebmaster DiningOverwatch A Counter --The MS Access Default Key
5/10/ Step 1. Primary Keys It is the combination of the SSN and the Project fields. Why? EMPLOYEES’ PROJECTS CounterSSNProjectFunction DiningDesigner ComputingDesigner ContractingDesigner IntranetWebmaster DiningOverwatch
5/10/ Step 1. Primary Keys Because, you can have the following: EMPLOYEES’ PROJECTS CounterSSNProjectFunction DiningDesigner DiningDesigner IntranetDesigner IntranetWebmaster DiningOverwatch Redundant records! (Redundancy = BAD)
5/10/ Rule #2 about Databases NEVER Use a Counter as a Primary Key
5/10/ Step 2: Eliminate Many-to-Many Relationships What is wrong with the following table? “a table of personnel authorized access to a project” PROJECTS QUERY ACCESS ProjectAccess_1Access_2Access_3 Dining Computing Intranet
5/10/ Step 2: Eliminate Many-to-Many Relationships Here’s essentially what this table looks like within the Access relationships diagram: Projects: Project Project Chief Department Access_1 Access_2 Access_3 Employees: SSN Last Name First Name.... has access to info about
5/10/ Step 2: Eliminate Many-to-Many Relationships Here’s how you model it in a database: –Break it up into two one-to-many relationships Projects: Project Project Chief Department.... Employees: SSN Last Name First Name.... Access to Project Info: Project SSN 1 1
5/10/ Step 2: Eliminate Many-to-Many Relationships How to do it: –The primary key of the new table is the composite of the primary keys of the existing tables. Primary key of Projects = Project Name Primary key of Employees = SSN New table primary key of Project Name and SSN
5/10/ Step 2: Eliminate Many-to-Many Relationships –No artificial restrictions on number of people with access –You can add attributes about the types of access granted –You can easily query who has access to information about each project EMPLOYEE LNameFNameSSN JonesMike SmithTony LeeBruce DoodleYankee PROJ QUERY ACCESS ProjectSSN Dining Dining Computing Computing Intranet Intranet Intranet PROJECT Project ProjectChief Dept Computing MATH Intranet M&B Contracting M&B CAT Admin
5/10/ What is wrong with the following? “A table of PCs, which are loaded with many different applications, and assigned to a user.” PCSerial#LoadedSoftwareAssigned 10291Word, Powerpoint, ccMailJones Word, Powerpoint, Lotus NotesSmith Word, LotusNotes, Borland C++ Hacker
5/10/ “Atomic” - the data occupying a field cannot be further broken down. –i.e., no multi-data entries –i.e., “No attributes can have more than one value for a single instance of an entity” PCSerial#LoadedSoftwareAssigned 10291Word, Powerpoint, ccMailJones If not atomic, updating is complex and error prone If not atomic, can not easily query the database Step 3: Achieving 1NF: All Data must be Atomic
5/10/ Step 3 Answer PCSerial#LoadedSoftwareAssigned 10291Word Jones 10291Powerpoint Jones 10291ccMailJones Word Smith Powerpoint Smith LotusNotesSmith WordHacker 10311LotusNotes Hacker 10311Borland C++ Hacker
5/10/ Step 3. Achieving 1NF: All Data must be Atomic Another source of redundancy: calculated fields TotalYTD Age DaysRemaining Solution: Use a Query! Remove all calculated fields from table and create a query...then use the query whenever you need up-to-date data
5/10/ Step 4. Achieving 2NF: Eliminate Partial Dependencies What is a partial dependency? –Look at the table. What’s redundant? –“A table of functions an employee is assigned to for a project, and the project chief.” EMPLOYEES’ PROJECTS SSNProjectFunctionProject Chief DiningDesigner ComputingDesigner IntranetMember IntranetDesigner IntranetWebmaster DiningOverwatch
5/10/ Step 4. Achieving 2NF: Eliminate Partial Dependencies Function depends on the entire primary key: SSN and Project. ProjectChief is dependent on just a portion of the primary key EMPLOYEES’ PROJECTS SSNProjectFunctionProjectChief DiningDesigner ComputingDesigner IntranetMember IntranetDesigner IntranetWebmaster DiningOverwatch
5/10/ Step 4. Achieving 2NF: Eliminate Partial Dependencies Why is this bad? –Well, what’s wrong with the following? EMPLOYEES’ PROJECTS SSNProjectFunctionProject Chief DiningDesigner ComputingDesigner IntranetMember IntranetDesigner IntranetWebmaster DiningOverwatch
5/10/ Step 4. Achieving 2NF: Eliminate Partial Dependencies A partial dependency (PD) occurs when a non- key field depends on only a part of the primary key, and not the whole primary key. PDs are a relation. So, we need a new table..... EMPLOYEES’ PROJECTS SSNProjectFunctionProject Chief DiningDesigner ComputingDesigner IntranetMember IntranetDesigner IntranetWebmaster DiningOverwatch
5/10/ Step 4. Achieving 2NF: Eliminate Partial Dependencies Here’s how it should look EMPLOYEES’ PROJECTS SSNProjectFunction DiningDesigner ComputingDesigner IntranetMember IntranetDesigner IntranetWebmaster DiningOverwatch PROJECTS ProjectProject Chief Dining Computing Intranet
5/10/ Step 5: Achieving 3NF: Eliminate Transitive Dependencies What is wrong with the following table? PROJECTS ProjectProject ChiefDept.Dept. DirectorRoom Dining Admin B115 Computing Admin B115 Intranet M&B Contracting M&B CAT Grounds
5/10/ Step 5: Achieving 3NF: Eliminate Transitive Dependencies We have fields dependent on a non-key field: –The Director and Room fields clearly relate to the Dept., and have nothing to do with the project. (Dept is a “determinant” that is not a candidate key) PROJECTS Project Project ChiefDept.Dept. DirectorRoom Dining Admin B115 Computing Admin B115 Intranet M&B Contracting M&B CAT GRND
5/10/ Step 5: Achieving 3NF: Eliminate Transitive Dependencies A transitive dependency occurs when a non-key field depends on another non-key field. Why is this bad?. –A typo appeared in the Contracting line. A database without the transitive dependency would not have allowed this to happen. PROJECTS ProjectProject ChiefDept.Dept. DirectorRoom Dining Admin B115 Computing Admin B115 Intranet M&B Contracting M&B CAT GRND
5/10/ Step 5: Achieving 3NF: Eliminate Transitive Dependencies How to do it: a. Which fields are dependent on a non-key field in the table? (Director, Room) b. Which fields are these dependent on? (Dept) c. Create a new table with (b) as the primary key. d. Put (a) in the new table. e. Remove (a) from the old table.
5/10/ Step 5: Achieving 3NF: Eliminate Transitive Dependencies Here are the new tables. PROJECTS ProjectProject ChiefDept. Dining Admin Computing Admin Intranet M&B Contracting M&B CAT GRND DEPARTMENTS Dept. NameDept. DirectorRoom Admin B115 M&B GRND
5/10/ Data Analysis: Normalization An entity is in first normal form (1NF) if there are no attributes that can have more than one value for a single instance of the entity. An entity is in second normal form (2NF) if it is already in 1NF, and if the values of all non-primary key attributes are dependent on the full primary key – not just part of it. An entity is in third normal form (3NF) if it is already in 2NF, and if the values of its non-primary key attributes are not dependent on any other non-primary key attributes.
5/10/ Common Sense Test Sometimes it is not worth normalizing a table –example: zip codes is a functional dependency –city/state are attributes of the zip code and not a person’s address –may not want to normalize a table if it is significantly easier to process as is duplicates are not important
5/10/ Conclusion Rule1: Never design redundant data into a database Rule2: Never use a counter as Primary Key Identify proper primary keys (1NF) Break up many-to-many relationships (1NF) 1NF: Break all data into atomic components 2NF: Identify/eliminate partial dependencies 3NF: Eliminate transitive dependencies Common sense test
5/10/ What is a Good Data Model? –A good data model is simple. As a general rule, the data attributes that describe an entity should describe only that entity. –A good data model is essentially non- redundant. This means that each data attribute, other than foreign keys, describes at most one entity. –A good data model should be flexible and adaptable to future needs. We should make the data models as application-independent as possible to encourage database structures that can be extended or modified without impact to current programs.
5/10/ Database Design Introduction –The design of any database will usually involve the DBA and database staff. They will handle the technical details and cross- application issues. –It is useful for the systems analyst to understand the basic design principles for relational databases.
5/10/ Goals and Prerequisites to Database Design –The data model may have to be divided into multiple data models to reflect database distribution and database replication decisions. Data distribution refers to the distribution of either specific tables, records, and/or fields to different physical databases. Data replication refers to the duplication of specific tables, records, and/or fields to multiple physical databases. –Each sub-model or view should reflect the data to be stored on a single server.
5/10/ The Database Schema –The design of a database is depicted as a special model called a database schema. A database schema is the physical model or blueprint for a database. It represents the technical implementation of the logical data model. A relational database schema defines the database structure in terms of tables, keys, indexes, and integrity rules. A database schema specifies details based on the capabilities, terminology, and constraints of the chosen database management system.
5/10/ The Database Schema –Transforming the logical data model into a physical relational database schema rules and guidelines: 1Each fundamental, associative, and weak entity is implemented as a separate table. –The primary key is identified as such and implemented as an index into the table. –Each secondary key is implemented as its own index into the table. –Each foreign key will be implemented as such. –Attributes will be implemented with fields. These fields correspond to columns in the table.
5/10/ The Database Schema –Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) –The following technical details must usually be specified for each attribute. Data type. Each DBMS supports different data types, and terms for those data types. Size of the Field. Different DBMSs express precision of real numbers differently. NULL or NOT NULL. Must the field have a value before the record can be committed to storage? Domains. Many DBMSs can automatically edit data to ensure that fields contain legal data. Default. Many DBMSs allow a default value to be automatically set in the event that a user or programmer submits a record without a value.
5/10/ The Database Schema –Transforming the logical data model into a physical relational database schema rules and guidelines: (continued) 2Supertype/subtype entities present additional options as follows: –Most CASE tools do not currently support object-like constructs such as supertypes and subtypes. –Most CASE tools default to creating a separate table for each entity supertype and subtype. –If the subtypes are of similar size and data content, a database administrator may elect to collapse the subtypes into the supertype to create a single table. 3Evaluate and specify referential integrity constraints.
5/10/ Data and Referential Integrity –There are at least three types of data integrity that must be designed into any database - key integrity, domain integrity and referential integrity. –Key Integrity: Every table should have a primary key (which may be concatenated). –The primary key must be controlled such that no two records in the table have the same primary key value. –The primary key for a record must never be allowed to have a NULL value.
5/10/ Data and Referential Integrity –Domain Integrity: Appropriate controls must be designed to ensure that no field takes on a value that is outside of the range of legal values. –Referential Integrity: A referential integrity error exists when a foreign key value in one table has no matching primary key value in the related table.
5/10/ Referential Integrity: Referential integrity is specified in the form of deletion rules as follows: –No restriction. Any record in the table may be deleted without regard to any records in any other tables. –Delete:Cascade. A deletion of a record in the table must be automatically followed by the deletion of matching records in a related table. –Delete:Restrict. A deletion of a record in the table must be disallowed until any matching records are deleted from a related table. –Delete:Set Null. A deletion of a record in the table must be automatically followed by setting any matching keys in a related table to the value NULL.
5/10/ Database Design Roles –Some database shops insist that no two fields have exactly the same name. This presents an obvious problem with foreign keys –A role name is an alternate name for a foreign key that clearly distinguishes the purpose that the foreign key serves in the table. –The decision to require role names or not is usually established by the data or database administrator.
5/10/ Database Prototypes –Prototyping is not an alternative to carefully thought out database schemas. –On the other hand, once the schema is completed, a prototype database can usually be generated very quickly. –Most modern DBMSs include powerful, menu-driven database generators that automatically create a DDL and generate a prototype database from that DDL. A database can then be loaded with test data that will prove useful for prototyping and testing outputs, inputs, screens, and other systems components.
5/10/ Database Capacity Planning –A database is stored on disk. The database administrator will want an estimate of disk capacity for the new database to ensure that sufficient disk space is available. –Database capacity planning can be calculated with simple arithmetic as follows. 1For each table, sum the field sizes. –This is the record size for the table. 2For each table, multiply the record size times the number of entity instances to be included in the table. –This is the table size.
5/10/ Database Capacity Planning –Database capacity planning can be calculated with simple arithmetic as follows. (continued) 3Sum the table sizes. –This is the database size. 4Optionally, add a slack capacity buffer (e.g., 10%) to account for unanticipated factors or inaccurate estimates above. –This is the anticipated database capacity.
5/10/ Database Structure Generation –CASE tools are frequently capable of generating SQL code for the database directly from a CASE-based database schema. This code can be exported to the DBMS for compilation. Even a small database model can require 50 pages or more of SQL data definition language code to create the tables, indexes, keys, fields, and triggers. Clearly, a CASE tool’s ability to automatically generate syntactically correct code is an enormous productivity advantage. Furthermore, it almost always proves easier to modify the database schema and re-generate the code, than to maintain the code directly.
5/10/ The Next Generation of Database Design Introduction Relational database technology is widely deployed and used in contemporary information system shops. One new technology is slowly emerging that could ultimately change the landscape dramatically – object database management systems. The heir apparent to relational DBMSs, object database management systems store true objects, that is, encapsulated data and all of the processes that can act on that data. Because relational database management systems are so widely used, we don’t expect this change to happen quickly. It is expected that these vendors will either build object technology into their existing relational DBMSs, or they will create new, object DBMSs and provide for the transition between relational and object models.
5/10/ Summary Introduction Conventional Files Versus the Database Database Concepts for the Systems Analyst Data Analysis for Database Design File Design Database Design The Next Generation of Database Design