## Presentation on theme: "Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach."— Presentation transcript:

Copyright: ©2005 by Elsevier Inc. All rights reserved. 2 Normalisation Each column can only have single facts. Do this first. Very simply normalization is essentially a two-step process: 1.Put the data into tabular form (by removing repeating groups to new tables). 2.Remove duplicated data to separate tables. Critically: Every time we create a table (in either step), we need to identify its primary key. We did all this in the example in the last lecture (the Drug Expenditure example)

Copyright: ©2005 by Elsevier Inc. All rights reserved. 3 More formally Apart from repeating groups, we are looking at certain relationships between data in the tables. –Which column(s) determine other column(s) –Create tables around the determining column(s) (we call these determining columns determinants)

Copyright: ©2005 by Elsevier Inc. All rights reserved. 4 Determinants We divided the various tables (in step 2) according to determinants. Hospital Number Hospital Name, Contact Person, Hospital Type, Teaching Status where we read as determines or is a determinant of. Determinants can be a combination of two or more columns. Eg: Hospital Number + Operation Number Surgeon Number.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 5 Step 2 of Normalisation Identify any determinants, other than the primary key, and the columns they determine Establish a separate table for each determinant and the columns it determines. The determinant becomes the key of the new table. Name the new tables. Remove the determined columns from the original table. Leave the determinants to provide links between tables.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 6 What are determinants? Look for columns that appear by their names to be identifiers. These may be determinants or components of determinants. Look for columns that appear to describe something other than what the table is about. Then look for other columns that identify this something

Copyright: ©2005 by Elsevier Inc. All rights reserved. 7 Which Determinants were in the Drug Expenditure Example? Hospital Number Hospital Name, Contact Person, Hospital Type, Teaching Status. Others in Operation table: –Hospital Number + Surgeon Number Surgeon Specialty –Operation Code Operation Name, Procedure Group Drug Administration table: –Drug Short Name Drug Name, Manufacturer –Drug Short Name + Method of Administration + Size of Dosage + Unit of Measure Dose Cost

Copyright: ©2005 by Elsevier Inc. All rights reserved. 8 The Final Design The final design we have is in Third Normal Form (3NF). By splitting tables along determinants (or functional dependencies) we can tet the design into 3NF easily. What about Performance? Surely all Those Tables Will Slow Things Down?

Copyright: ©2005 by Elsevier Inc. All rights reserved. 9 Take a moment… Go back and examine the last lecture and see that this is what we did in normalization!

Copyright: ©2005 by Elsevier Inc. All rights reserved. 10 Performance of Normalised Databases There are many tables for what seems to be relatively little data. Thanks to advances in the capabilities of DBMSs, and the increased power of computer hardware, the number of tables is less likely to be an important determinant of performance than it might have been in the past. But, performance is not an issue at this stage (that comes later). We are designing here!

Copyright: ©2005 by Elsevier Inc. All rights reserved. 11 Definitions and a Few Refinements (1) Determinants and Functional Dependency –For each value of the determinant, there can only be one value of some other nominated column(s) in the table at any point in time –The other nominated columns are functionally dependent on the determinant. –The determinant concept is what 3NF is all about; we are simply grouping data items around their determinants.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 12 Definitions and a Few Refinements (2) Primary Keys –A primary key is a nominated column or combination of columns that has a different value for every row in the table. Each table has one (and only one) primary key. Candidate Keys –Sometimes more than one column or combination of columns could serve as a primary key. We refer to such possible primary keys, whether chosen or not, as candidate keys.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 13 Definitions and a Few Refinements (3) A More Formal Definition of Third Normal Form If we define the term non-key column to mean a column that is not part of the primary key, then we can say: –A table is in 3NF if the only determinants of non- key columns are candidate keys. –If we want to be even more formal, we should explicitly exclude trivial determinants: each column is of course a determinant of itself.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 14 Definitions and a Few Refinements (3) Foreign Keys –When removing repeating groups to a new table, we carried the primary key of the original table with us, to cross-reference to the source. –These cross-referencing columns are called foreign keys, and they are our principal means of linking data from different tables. –Note that elsewhere in the data model may include elsewhere in the same table. For example, an Employee table might have a primary key of Employee Number. –A common convention for highlighting the foreign keys in a model is an asterisk, as shown.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 15 Definitions and a Few Refinements (4) Referential Integrity –Imagine the Operation table that uses hospital number to point to the relevant Hospital records. We expect every hospital number in the Operation table to have a matching hospital number in the Hospital table. This is referential integrity. Modern DBMSs provide referential integrity features.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 17 Denormalization and Unnormalization it is sometimes necessary to compromise one data modeling objective to achieve another. Occasionally, we implement database designs that are not fully normalized to achieve some other objective (most often performance). We normalize to achieve: completeness, non- redundancy, flexibility of extending repeating groups, ease of data reuse, and programming simplicity. We sacrifice this when we de-normalize. In many cases, these sacrifices will be prohibitively costly.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 18 You dont need to normalize like this always The past two lectures have shown you what makes a well structured database design shown as tables. Dont do it like this every time! There is the equivalent of a blue-print for data modelling other than the table-like description weve seen. Lets return to the Drug Expenditure design.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 19 Drug Expenditure Database Model as Relations (Tables) OPERATION (Hospital Number*, Operation Number, Operation Code*, Surgeon Number*) SURGEON (Hospital Number*, Surgeon Number, Surgeon Specialty) OPERATION TYPE (Operation Code, Operation Name, Procedure Group) STANDARD DRUG DOSAGE (Drug Short Name*, Method of Administration, Size of Dose, Unit of Measure, Method of Administration, Standard Cost of Dose Cost) DRUG (Drug Short Name, Drug Name, Manufacturer) HOSPITAL (Hospital Number, Hospital Name, Hospital Category, Contact Person) DRUG ADMINISTRATION (Hospital Number*, Operation Number*, Drug Short Name*, Method of Administration*, Size of Dose*, Unit of Measure*, Method of Administration*, Hospital Number*, Operation Number*, Number of Doses)