Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti.

Similar presentations


Presentation on theme: "Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti."— Presentation transcript:

1 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti

2 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 The Big Dimensions The product dimension (complete portfolio of what a company sells), derived from the product master file Example o f Big Dim: product & customer dimensions converting the “production product master file” into the “product dimension table” has the following steps: – remapping of key to avoid duplication (considering time, and reuse, eg of UPC’s) –remapping of key for shorter and efficient join (ie, UPC = 12 digit internationally) –generalization of key for changing product description over time –generalization of key for aggregate products (SKU code for brand?) –addition of text to replace numeric codes or cryptic abbreviations (useless reports containing cryptic numbers and codes) –quality assurance of text strings -- no trivial variations (cleaning up master file)

3 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 The Big Dimenssions DP: “ Although the production product master file is the source of product identification, it must be transformed or augmented on a continuing basis in order to serve as the product dimension in the data warehouse. The primary steps needed are the generalization and/or replacement of the primary product key, and the completion and quality assurance of the descriptive attributes.” at least 50 descriptive fields for large companies! Note: Facts in Fact Table vary continuously every time Dim table attributes are virtually almost constant over time

4 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 many-to-one relationship in ascending hierarchy The true meaning of drill down  show me more detail, ie add a row headerto the report Multiple hierarchies ( eg. Sales and Finance hierarchies) DP “ A typical dimension contains one or more natural hierarchies, together with other attributes that do not have a hierarchical relationship to any of the attributes in the dimension. Any of hte attributes, whether or not they belong ot a hierarchy, can freely be used in drilling down and drilling up.” The merchandise hierarchy

5 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Resisting the urge to snowflake Figure 6.2 p. 96-- separating dimension by hierarchy The threat to browsing performance PD: «Do not snowflake your dimensions, even if they are large. If you do snowflake your dimensions, be prepared to live with poor browsing performance. »  To preserve browsing performance Really big customer dimensions, eg. 10M customers with 3 hierarchies... heavy use of demographic fields (age, gender, « of children, education level, behaviors etc.) -- only ones to be indexed and need of new indexing techniques The Big Dimensions cont.

6 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Exampleof Star Schema: Sales

7 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Example of snowflake: Sales 1M1M 1M1M 1 M 1 M Sales Location Region State City LOC_ID LOC_Desc City_ID TIME_ID LOC_ID CUS_ID PROD_ID Sales_qt Sales_price Sales_total Region_ID Region_name State_ID Region_ID State_name City_ID State_ID City_name

8 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Example of multi_fact tables 1M1M 1M1M 1 M 1 M Sales location Location Region State City LOC_ID LOC_Desc City_ID TIME_ID LOC_ID CUS_ID PROD_ID Sales_qt Sales_price Sales_total Region_ID Region_name State_ID Region_ID State_name City_ID State_ID City_name Sales_City TIME_ID LOC_ID CUS_ID PROD_ID City_ID Sales_City_qt Sales_city__price Sales_city__total 1 M Sales_Region ….

9 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Demographic minidimensions Separating one or more sets of demographic attributes in minidimensions, see Figure 6.3 p. 99 contains only the distinct combinations of demographic information, and grouping them in “bands” (such as age or income level) DP: » The best approach for tracking changes in really huge dimensions is to break off one or more minidimensions from the dimension table, each consisting of small clmups of attributes that have been administered to have a limited number of values.”

10 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Slowly changing dimensions Are the Customer and Product Dim independent of Time Dim? Changes in names, family status, product disrtict/region How to handle these changes in order not to affect the history status? Eg. Insurance 3 suggestion for slowly changing dimensions Type 1 -- overwrite/erase old valuesno accurate tracking of history needed; easy to implement; eg. Overwrite Marital Status field Type 2 -- create new record at time of change; partitioning the history (old and new description); Hajj Boussalhame married and single. If we constrain on Marital Status = Married, we will not see Hajj Boussalhame before he got married; So it is not possible to compare the perfromance across the transition. Type 3 -- new “current” fields, legitimate need to track both old and new states “Original” and “current” values; Intermediate Values are lost.

11 Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Slowly changing dimensions (cont.) DP: « The use of the Type Two slowly changing dimension requires that the dimension key be generalized. It may be sufficient to take the underlying production key and add two or three version digits to the end of the key to simplify the key generation process.” Where can this be implemented? In DW or production system? DP: « The creation of generalized keys is usually the responsibility of the data warehouse team, and always requires metadata to keep track of the generalized keys that have already been used.” DP: “The Type Two slowly changing dimension automatically partitions history and an application must not be required to place any time constraints on effective dates in the dimension”


Download ppt "Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti."

Similar presentations


Ads by Google