Presentation is loading. Please wait.

Presentation is loading. Please wait.

Normalization Is the gradual and sequential process of efficiently organizing data in a database that follows the rules listed in the previous slide –

Similar presentations


Presentation on theme: "Normalization Is the gradual and sequential process of efficiently organizing data in a database that follows the rules listed in the previous slide –"— Presentation transcript:

1 Normalization Is the gradual and sequential process of efficiently organizing data in a database that follows the rules listed in the previous slide – Normalization commonly involves following three schemas (in order): First, Second, and Third Normal Form (1NF, 2NF, 3NF) – This is commonly done during early stages on UML class diagrams The goal of normalization is to: – eliminate the duplication of data (which make database large, inefficient, and slow) which in turn prevents data manipulation anomalies and loss of data integrity changes that happen in different places may not be the same – This is done by creating tables and assigning PK for each table, and making sure that each information shows up once in the database It eliminates redundant data (storing the same data in more than one table) and ensuring data dependencies are logical (only storing related data in a table) Normalization reduces the amount of space a database consumes and ensures data is logically stored

2 First Normal Form (1NF) 1NF deals with duplicative data across multiple columns! It sets the very basic rules to make sure that: – Separate tables are created for each group of related data (e.g., IsotopicAge, Fold, Rock) i.e., each table should represent a distinct entity 1.Duplicative (repeating) columns containing the same type of data are removed from the same table There should be no repeated data types: Mineral1, Mineral2, Mineral3 or cellPhone, homePhone, workPhone These should go to a new table 2.All columns must contain a single value, i.e., All attributes must be atomic (e.g., XRF,) not multi-valued. Each cell must only have one value, e.g., XRF, not XRF, REE, Isotope 3.There should be a set of one or more columns that uniquely identify each row, i.e., there should be a primary key

3 Another example: Analysis table InvestigatorAnalysisTypeAddress Hassan BabaieXRF24 Peachtree Center Ave, Atlanta, GA 30303 John WayneXRF, XRD, REE3500 Pacific View Dr, Newport Beach, CA Elizabeth TuckerPetrography1100 Angela Ra, Charlotte, NC, John WayneIsotopic age3500 Pacific View Dr, Newport Beach, CA Investigators submit their samples to an Analyzing company. They company stores the above set of data for the customers What are the problems: – This is not in 1NF – The AnalysisType column does not represent a distinct entity Can’t find out how many people order analysis for XRF. They are all mixed. – The Address column is compound, and needs to move out into another table. City depends on zip zode. – There is no PK

4 Second Normal Form (2NF) 2NF deals with redundancy across multiple rows! Second normal form (2NF) further addresses the concept of removing duplicative data Meet all the requirements of the first normal form (1NF) Identify columns whose data repeat in different places – Remove them to their own table In the next slide, we see that data for Joe Strat is repeated. Solution is to remove the Alum column (with its address and school into their own Table called Alum and School See next slide for more!

5 An improved Analysis Table Now we can query on the type of analysis There are still problems with the structure: There are still redundancies The company can only keep track of three types of analyses; four would not work! Address is still compound; needs to be broken It is difficult to determine the analysis order for each person. – Order in this case depends on non-Pk columns Investig ator Analy sis1 Analysi s2 Analy sis3 ordersAddress Hassan Babaie XRFDepartment of Geosciences, GSU, Atlanta, GA 30303 John Wayne XRFXRDREE3500 Pacific View Dr, Newport Beach, CA Elizabet h Tucker Petro graph y 1100 Angela Ra, Charlotte, NC, John Wayne Isoto pic 3500 Pacific View Dr, Newport Beach, CA

6 Better solution We need to break the table into several tables: – Investigator, Analysis, Order, OrderItems, and Address investiIDlastNamefirstNameaffiliation 1WayneJohnExHollywood 2BabaieHassanGSU AnalysisIDAnalysisType 1XRF 2 NumberStreetCityStatezipCodeCountry 3500Pacific View Dr.Newport BeachCA92662USA 24Peachtree Center Ave AtlantaGA30303 Investigator Table Analysis Table Address Table

7 … Order and OrderItem Tables, partially shown OrderItemIDOrderIDAnalysisIDQty 1112 2221 OrderIDInvestiIDOrderDateDeliveryDate 113/5/19604/30/1960 222/17/20133/12/2013 Order Table OrderItem Table

8 Some improvement Analysis AnalysisID AnalysisType OrderItem OrderItemID OrderID AnalysisID Qty Order OrderID InvestID OrderDate DeliveryDate Investigator InvestID FirstName Address AddressID Number Stree …

9 Third Normal Form (3NF) Third normal form goes one large step further Meet all the requirements of the 2NF No transitive functional dependencies – Remove columns that are not dependent upon the primary key Remove columns that their values depend on columns other than the PK – This means: remove subkeys

10 3NF, cont’d There should be no partial functional dependencies If x  y, i.e., x functionally determines y, and y is functionally dependent on x, then given x, we can find y. – Example, in the Address table, given the nine-digit zip code, we can find city and state because they are functionally dependent on the zip code. The opposite is not true, given a city we cannot find the zip code (Note: some cities have several zip codes) By definition, a super key (primary key) functionally determines all other attributes in the table The zip code is a subkey (not a superkey) because it only determine the city and state part of the Address table not the other attributes

11 To take care of the partial functional dependency issue take 3 steps: – Remove all the attributes that depend on the subkey from the table (e.g., city and State from Address table) – Move them into a new table (e.g., call it ZipLocations with zipCode, city, and state attributes – Keep a copy of the subkey attribute (i.e., zipCode) in the original table as a foreign key The address table now has firstname, last name, street (these 3 make the PK), and zipCode (as FK to the other table). Summary: Subkeys always result in redundant data and must be removed! In other words, remove subsets of data that apply to multiple rows of a table and place them in separate tables – i.e., remove duplicative data – For example, break address into its independent constituents that do not depend on each other Create relationships between these new tables and their predecessors through the use of foreign keys

12 Fourth Normal Form (4NF) Normalizing a database to the 3NF is usually sufficient Finally, fourth normal form (4NF) has one additional requirement Meet all the requirements of the third normal form A relation is in 4NF if it has no multi-valued dependencies


Download ppt "Normalization Is the gradual and sequential process of efficiently organizing data in a database that follows the rules listed in the previous slide –"

Similar presentations


Ads by Google