Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimensional Modeling – Part 2

Similar presentations


Presentation on theme: "Dimensional Modeling – Part 2"— Presentation transcript:

1 Dimensional Modeling – Part 2
CS 543 – Data Warehousing CS Data Warehousing (Sp ) - Asim LUMS

2 The Snowflake Schema Snowflaking is a method of normalizing the dimension tables in a star schema Normalization increases the efficiency of certain queries, and reduces space requirements CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

3 Star Schema CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

4 Querying Suppose the product table has 500,000 rows (different products). These products fall under 500 product brands, and these brands fall under 10 product categories Query: give me the total quantity of a specific product category sold in Jan 2004? All 500,000 rows in the product dimension table would have to be searched to find the products belonging to the specified product category CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

5 A Snowflake Schema CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

6 Normalization Partially or fully normalize only a few dimension tables, leaving the others intact Partially normalize every dimension table Fully normalize every dimension table CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

7 Snowflaking? Advantages Small savings in storage space
Normalized structures are easier to update and maintain Disadvantages Schema less intuitive and end-users are put off by the complexity Ability to browse through the contents difficult Degraded query performance because of additional joins CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

8 Sub-dimensions CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

9 Some Query Examples (1) CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

10 Some Query Examples (2) Query: Total sales for customer number during the first week of December 2003 for product Widget-1 Find and sum the sales quantity and sales dollars for all fact table rows where the customer key relates to customer number , the product key relates to product Widget-1, and the time key relates to the seven days in the first week of December 2003. Assuming a customer can make a single purchase on a given day, only seven rows of the fact table will be summed CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

11 Some Query Examples (3) Query: total sales for all customers in the south-central territory for the first two quarters of 2003 for product category Bigtools All fact table rows where the customer key relates to all customers in the south-central territory, the product key relates to all products in the product category Bigtools and the time key relates to about 180 days in the first two quarters of 2003. In this query, clearly a large number of fact table rows participate the summation How can we reduce the execution time? CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

12 Fact Table Size (1) CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

13 Fact Table Size (2) Credit card transaction tracking
Time dimension: 5 years = 60 months Number of credit card accounts: 150 million Av. number of monthly transaction/account: 20 Max. number of base fact table records: 180 billion CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

14 Aggregating Fact Table
Typically, queries require detailed data on some dimensions, while only summary data is needed for the other dimensions Example: assume one sale per product per store per week. Estimate the number of fact table rows required: Query involves 1 product, 1 store, 1 week Query involves 1 product, all stores, 1 week Query involves 1 brand, 1 store, 1 week Query involves 1 brand, all stores, 1 year Suppose now you have an aggregate fact table where each row summarizes the totals for a brand, a store, and a week. Now estimate the number of fact table rows required. CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

15 Multi-Way Aggregate Fact Tables (1)
Utilize hierarchies in dimensions to create appropriate aggregate fact tables Single-way aggregate fact table aggregates along one dimension only; multi-way have more than one dimension aggregated CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

16 Multi-Way Aggregate Fact Tables (2)
CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

17 Multi-Way Aggregate Fact Tables (3)
CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

18 Goals for Aggregation Primary goal: improve overall DW performance
Do not get bogged down with too many aggregates. Remember you have to create addition derived dimensions to support the aggregates Try to cater to a wide range of user groups Go for aggregates that do not unduly increase the overall usage of storage Keep the aggregates hidden from the end-users CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

19 Families of Stars CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

20 Snapshot and Transaction Tables
CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

21 Conformed Dimensions Since multiple fact tables share dimension tables, it is essential that dimensions are conformed, i.e., they have the same meaning Conformed dimensions are essential for Building up an enterprise warehouse from data marts Running queries across data marts Consistent semantics of queries and their results Using conformed dimensions is a important responsibility of the DW project team CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS

22 Standardizing Facts Since fact tables can be shared, they need to be standardized. Ensure same definition and terminology across data marts Resolve homonyms and synonyms Guarantee that the same algorithms are used for any derived units in each fact table Make sure each fact uses the right unit of measurement CS Data Warehousing (Sp ) - Asim LUMS CS Data Warehousing (Sp ) - Asim LUMS


Download ppt "Dimensional Modeling – Part 2"

Similar presentations


Ads by Google