Presentation is loading. Please wait.

Presentation is loading. Please wait.

Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when specific queries occur frequently, a strict performance.

Similar presentations


Presentation on theme: "Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when specific queries occur frequently, a strict performance."— Presentation transcript:

1 Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when specific queries occur frequently, a strict performance is required and it is not heavily updated -So, denormalize only when there is a very clear advantage to doing so and document carefully the reason for doing so

2 typical denormalization techniques (1)Flatten a repeating group in one table Instead of EMP (E#, Ename) SKILL (E#, Skill) UseEMP (E#, Skill, Ename) when Emp has a smaller # of attributes. - This means use Method 2 of 1NF algorithm. But know the danger of this method as we discussed in MVD.

3 Cont’ (2) Embed stable Code-Interpretation (Reference) Table. Instead of FLIGHT (F#, Departs, From_Code, To_Code) CODE (Code, Airport_Name) Use FLIGHT (F#, Departs, From_AP, From_Code, To_AP, To_Code)

4 Cont’ Combine1:1 or 1:N (a) when N is small and (b) the record on the "one" side is small (thus the amount of redundancy will be small) Instead of SALE (S#, SPName, SaleDate), SALE_ITEMS (S#, Line#, Code, Qty) Use SALE(S#, Line#, SPName, SaleDate, Code, Qty) -- "How many T179's did we sell yeaterday?" can be answered without join. Another example: Order_Item(O#, I#, C#, Cname, I_Desc, Qty, I_Price)

5 Cont’ (4) When the other entity in is not interesting by itself Order(O#, ODate, OShipTerms, PmtTerms, Cname, CAddr) (5) Replicate non-frequently updated attributes to avoid JOIN WORK_ON (ESSN, P_NUM, PName, Hours)

6 Problems of denormalization Makes row longer Makes data transfer longer Needs more memory for memory processing Cause redundancy and expensive update

7 Adding redundant data - Add summary attributes or derived attributes - Redundant relationships can improve performance with the cost of update overhead

8 Schema translation Reduce #of relations for JOIN by using mapped translation Handling null values Combine 1:1 relationships Relax participation constraints Divide the big table into two, if A & B are distinct in R(A, B) Ignore FDs based on co-occurring attributes, which are not updated ZIP --> CITY

9 Primary key - Most frequently used attributes - Prefer small sized attributes (used in indexes, Ref. integrity)

10 Index - Create a set of appropriate indexes optimzing queries (This will be discussed more in physical DB chapters.)

11 Denormalization Databases intended for Online Transaction Processing (OLTP) are typically more normalized than databases intended for On Line Analytical Processing (OLAP). OLTP Applications are characterized by a high volume of small transactions such as updating a sales record at a super market checkout counter. The expectation is that each transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read only" databases. OLAP applications tend to extract historical data that has accumulated over a long period of time. For such databases, redundant or "denormalized" data may facilitate Business Intelligence applications.

12 Denormalization Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be carefully controlled during ETL processing, and users should not be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. Denormalization is also used to improve performance on smaller computers as in computerized cash- registers. Since these use the data for look-up only (e.g. price lookups), no changes are to be made to the data and a swift response is crucial.


Download ppt "Denormalization - Causes redundancy, but fast performance & no referential integrity - Denormalize when specific queries occur frequently, a strict performance."

Similar presentations


Ads by Google