Presentation is loading. Please wait.

Presentation is loading. Please wait.

Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

Similar presentations


Presentation on theme: "Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)"— Presentation transcript:

1 Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)

2 The Road to Denormalization Before transactional data can be loaded into a Data Warehouse, the data must be Denormalized Data Warehouse Transx Data

3 Normalization But before you can understand Denormalization, you must understand Normalization... And to understand Normalization, you must understand Relational Databases I’ve been Denormalized!

4 Relational Databases Collection of linked tables Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique in a parent table (e.g., Customer Number) Foreign Key – column in child table that links to the Primary Key in the parent table

5 Relational DB Example Cust #Cust Name 100Moe 101Larry 102Curly Order #Prod#QtyCust# 1QR221100 2QR2225100 3SB563102 CUSTOMER TABLEORDER TABLE Primary Key Foreign Key “Parent” table... “Child” table...

6 Database Structure & Design 2 Approaches: 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting Conflict I love conflict!

7 Approach #1: Optimize for Data Capture To optimize for data capture, you must: Eliminate redundancy of data (or else wasted space & processing occurs) Ensure data integrity (or else data anomalies) Ensure that changes in data (modifications, deletions, etc. only have to happen in one place) Normalization – process by which a database is optimized for data capture All data “redundancy” is removed from Database Has multiple forms (0, 1st, 2nd, 3rd, et al.)

8 Moving from 0NF to 1NF Rule: Make a separate table for each set of related attributes, and make each field atomic (i.e., cannot be broken apart any further) Cust # CustName 100, 101, 102Moe Howard, Larry Fine, Curly Howard CUSTOMER DATA ONF 1NF Cust #FName LName 100Moe Howard 101Larry Fine 102Curly Howard CUSTOMER TABLE I’M NOT MOVING!

9 Moving from 1NF to 2NF Rule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign) Cust #FNameOrder# 100Moe1 100Moe2 101Larry3 TABLE X 1NF Cust #FName 100 Moe 101 Larry 102 Curly Order #Cust# 1100 2100 3101 CUSTOMER TABLEORDER TABLE 2NF 100Moe Dependency on Primary Key

10 Moving from 2NF to 3NF Rule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column) Cust #CityOrder#ShipTime 100NY12 days 101NY22 days 102LA35 days TABLE X 2NF NY2 days Dependency b/t 2 non-key columns City #CityShipTime 10NY2 days 20LA5 days Cust #City# 10010 10110 10220 SHIP TIME TABLECUSTOMER TABLE 3NF

11 Normalized DB Example 11 MANY database tables ensure against redundant data (and help prevent data integrity issues)

12 Database Structure & Design 2 Approaches: 1. Optimize for Data Capture i.e., Capturing Transactions 2. Optimize for Data Access i.e., Queries & Reporting Conflict

13 Approach #2: Optimize for Data Access (in a separate, read-only Data Warehouse) To optimize for data access, you must: Change the data layout to a different structure Allow data redundancy Reduce the number of table joins (i.e., reduce links among tables by combining tables) Denormalizing – Adding redundancy & reducing joins in a relational database

14 Denormalizing – Most Common Approach Star Schema (Clustering) Fact (core or transaction) Tables in middle of star Dimensional (structural or “lookup”) Tables around “points” of star Order #DateCust#Prod#Loc# 106/15/XX100QR221000 207/19/XX100QR221000 308/30/XX101SR562000 SALES ORDER (FACT) TABLE Cust #CustName 100Moe 101Larry 102Curly CUSTOMER DIMENSION TABLE Prod #ProdName QR22Rake SR56Spade TW43Mulch PRODUCT DIMENSION TABLE Loc #LocName 1000NY 2000LA 3000PGH LOC DIMENSION TABLE DateQuarter 06/29/XX2Bob 06/30/XX2Sue 07/01/XX3 DATE DIMENSION TABLE

15 These 2 tables become the “SALES FACT” table in the Data Warehouse These 3 tables become the “Customer Dimension” These 5 tables become the “Product Dimension” This Date Field helps build the “Date Dimension”

16 Resulting Star Schema Data Warehouse Order #DateCust#Prod#Rep# 106/15/XX100QR221000 207/19/XX100QR221000 308/30/XX101SR562000 SALES ORDER (FACT) TABLE Cust #CustName 100Moe 101Larry 102Curly CUSTOMER DIMENSION Prod #ProdName QR22Rake SR56Spade TW43Mulch PRODUCT DIMENSION DateQuarter 06/29/XX2Bob 06/30/XX2Sue 07/01/XX3Juan DATE DIMENSION Hey, hot stuff!

17 Common (Conformed) Dimensions Denormalizing (continued) Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse Prod#ProdName Stock Date Units QR22Rake 03/23/XX 150 TW43Mulch 04/15/XX 1452 SR56Spade 05/01/XX 997 INVENTORY (FACT) TABLE ORDER TABLE Cust #CustName 100Moe 101Larry 102Curly CUSTOMER DIMENSION Prod #ProdName QR22Rake SR56Spade TW43Mulch PRODUCT DIMENSION Loc #LocName 1000NY 2000LA 3000PGH LOC DIMENSION CUSTOMER TABLE TIME Order #DateCust#Prod#Loc# 106/15/XX100QR221000 207/19/XX100QR221000 308/30/XX101SR562000 DateQuarter 06/29/XX2 06/30/XX2S 07/01/XX3Juan SALES ORDER (FACT) TABLE DATE DIMENSION

18 Mapping Normalized Tables to Denormalized (Data Warehouse) Tables Using ETL Tools (like MS-SSIS) These are 2 Normalized Transaction Tables EXTRACT The data are “Transformed” in these steps TRANSFORM This is the resulting, Denormalized Product Dimension LOAD

19 The End That’s all! Bye, bye!


Download ppt "Competitive (Business) Intelligence Systems The Road to Denormalization (starring Charlie Sheen & other Random Celebrities)"

Similar presentations


Ads by Google