Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dimensional Modeling 1. Agenda  Review: Business Requirements  Dimensional Model Components  Dimensional Model Schemas  Additional Modeling Concepts.

Similar presentations


Presentation on theme: "Dimensional Modeling 1. Agenda  Review: Business Requirements  Dimensional Model Components  Dimensional Model Schemas  Additional Modeling Concepts."— Presentation transcript:

1 Dimensional Modeling 1

2 Agenda  Review: Business Requirements  Dimensional Model Components  Dimensional Model Schemas  Additional Modeling Concepts 2

3 DW Development Approach: Kimball  Methodology  DW Project Lifecycle  Business requirements  Business Requirements Documentation  Bus Matrix  Design, build and deliver in increments  DW Architecture  DW Design  ETL system  Reports, query tools, … 3

4 Data Warehouse Project Lifecycle 4 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

5 Project Planning  Determine:  Initial project scope  Project cost  Define:  Team roles  Team members  Project schedule 5

6 Data Warehouse Project Lifecycle 6 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

7 DW Development Approach: Kimball  Methodology  DW Project Lifecycle  Business requirements  Business Requirements Documentation  Bus Matrix  Design, build and deliver in increments  DW Architecture  DW Design  ETL system  Cube, Reports, query tools, … 7

8 Requirements Elicitation  Analysis requirements  Identify who to interview  Conduct Interviews  Business challenges  Definition of success  More effective in job  Other discovery methods  Existing systems  Reports…  Document & Prioritize 8

9 Documenting Requirements  Interview Summaries  Prose summarizing interviews  Kimball format Kimball format  Analytic Themes  Analysis Requirements grouped into “categories”  Kimball format (pg 35) Kimball format  DW Bus Matrix  Business processes mapped to data needed  Kimball format (pg 37) Kimball format  DM Information Package  Prioritized processes  Ponniah format (pg 104) Ponniah format 9

10 Kimball Example: Interview Summaries 10

11 Kimball Example: Analytic Themes 11

12 Class Example: University Dept. Requirements 12

13 Kimball Example: Bus Matrix 13

14 Class Example: University Dept. Bus Matrix 14

15 Class Example: University Dept. Information Package 15

16 In-Class Example: Newspaper Information Package 16

17 Data Warehouse Project Lifecycle 17 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

18 DW Development Approach: Kimball  Methodology  DW Project Lifecycle  Business requirements  Business Requirements Documentation  Bus Matrix  Design, build and deliver in increments  DW Architecture  DW Design  ETL system  Cube, Reports, query tools, … 18

19 ERD 19

20 Reporting Challenges with ERD/OLTP  Model designed for efficient record processing, not "subject" processing  External data often excluded  Analyses require multiple joins  Indexes not optimized for reporting  History not stored 20

21 Pre-Computing Aggregates 21 MonthProductCityTOTAL Sales Quantity OctProd1Abiline9556 Prod1Austin799 Prod1Dallas1356 Prod1Waco36678 Prod2Abiline7869 Prod2Austin2967 Prod2Dallas568 Prod2Waco277980 Prod3Abiline43 Prod3Austin6588 Prod3Dallas8434 Prod3Waco3756 NovProd1Abiline77977 Prod1Austin234 Prod1Dallas4378 Prod1Waco20349 Prod2Abiline210 Prod2Austin789 Prod2Dallas888 Prod2Waco4566 Prod3Abiline2078 Prod3Austin292 Prod3Dallas1111 Prod3Waco36 DecProd1Abiline34657 Prod1Austin2999 Prod1Dallas5888 Prod1Waco9999 Prod2Abiline1580 Prod2Austin2940 Prod2Dallas975 Prod2Waco5748 Prod3Abiline6140 Prod3Austin211 Prod3Dallas1357 Prod3Waco1000 Queries: 1.Total Sales 2.Total Sales by Month 3.Total Sales by Month and Product Line 4.Total Sales by Month, Product Line, and City 5.Total Sales by City …..

22 Pre-Computing Aggregates, cont… 22 OctNov Dec P1 P2 P3 Total Sales Total Sales by Month and Product Total Sales by Month (1 "fact“, 0 “dimensions”) (1 "fact", 1 "dimension" with 3 values) (1 "fact", 2 "dimensions" each with 3 values) OctNovDec select sum(ordered_quantity) as "total" from order_line_t; select month(order_date) as "month", sum(ordered_quantity) as "total" from order_line_t ol, order_t o where ol.order_id = o.order_id group by month(order_date); select month(order_date) as "month", p.product_line_id as "product", sum(ordered_quantity) as "total" from order_line_t ol, order_t o, product_t p where ol.order_id = o.order_id and ol.product_id = p.product_id group by month(order_date), p.product_line_id;

23 Pre-Computing Aggregates, cont… 23 OctNov Dec P1 P2 P3 Total Sales by Month, Product, & City (1 "fact", 3 "dimensions" each with 3 values) AB AU DA WA select month(order_date) as "month", p.product_line_id as "product", c.city, sum(ordered_quantity) as "total" from order_line_t ol, order_t o, product_t p, customer_t c where ol.order_id = o.order_id and ol.product_id = p.product_id and o.customer_id = c.customer_id group by month(order_date), p.product_line_id, c.city;

24 OLAP Review  Short:  Class of applications or tools that support ad-hoc analysis of multidimensional data  Longer:  “…technology that enables [users]… to gain insight into data through…fast, consistent, interactive access [to]…information that has been transformed…to reflect the real dimensionality of the enterprise…”  OLAP Council (www.olapcouncil.org)www.olapcouncil.org 24

25 OLAP Cubes  Flexible, interactive information delivery to DW  Multidimensional data representation and operations  Rollup  Drill-down  Slice/Dice  Pivot (or Rotate)  Improves Reporting Performance  Pre-processed aggregates  Data In-memory  Index Structures  Bye Bye Locks! … 25

26 26

27 27

28 28

29 29

30 Dimensional Modeling  Data Model  Logical view of a multi-dimensional cube  Key structures and components  Fact table(s)  Key business process  Facts/Measurements/metrics  Foreign Keys  Dimension tables  Ways to view measures  Attributes  Often denormalized  Surrogate Key vs. Business Key  Hierarchies 30

31 Dimensional Model Example 31 Fact Table Dimension Tables Foreign Keys Attributes Measures Business Key Include it!Surrogate Key Hierarchy DIM FACT

32 Dimensional Model Characteristics Dim TablesFact Tables                     32

33 Star Schema  At least one fact table and (typically) two or more dimension tables  Fact table has direct relationship with each of the dimension tables  “Single-table” dimensions  Arrangement resembles a "star" 33

34 Star Schema Example 34

35 Snowflake Schema 35  Fact table has direct relationship with some dimension tables, and indirect relationship with other(s)  Multi-table dimensions  i.e., "Normalized" dimensions

36 Snowflake Example 36

37 Comparison of Schemas  Star  The much-preferred approach  Adv:  Faster load/query/analysis performance  Potentially more intuitive to users  Snowflake  Adv:  Potentially faster setup  Avoid data redundancy  Reduces size of dimension table  Ease of maintaining 37

38 Common Dims, Facts, Measures  Dims  38  Facts   Measures 

39 In-Class Example: Newspaper Dim Model 39

40 Additional Modeling Concepts  Surrogate Keys  Attribute Hierarchies  Time Dimensions  Junk Dimensions  Degenerate Dimensions  Slowly-Changing Dimensions 40

41 Surrogate Keys  Problem:  Potential for PK to change in source systems  e.g., PKs with built-in meaning  Data spread across multiple systems  PK's exist???  PK's consistent???  PK's means same thing???  Surrogate Key  Newly-generated PK for dimension rows in DW  System-generated sequence numbers  Mapped to source/application key(s)  Fact rows reference SKs 41

42 Surrogate Keys Example 42

43 Attribute Hierarchies  1:M relationships between attributes  Supports user navigation  drill-downs, drill-ups  Improves performance  Assists SSAS in aggregation selection  Storage improvement 43

44 Attribute Hierarchy Examples 44 State City Year Month Year Semester

45 Date / Time Dimension  Common feature of every data warehouse  Minimum attributes:  Date key (e.g. 20140121, 2014-01-21, 12345)  Date name (e.g. Monday, January 21 2014)  Common additional attributes  Month, Year, Quarter, …  Holiday Name, … 45

46 Time Dimension Example 46

47 Junk Dimensions  Stores one or more "lookup" codes, flags, indicators that describe or categorize transactions/events  Usually low cardinality  May include all valid combinations of codes OR valid combinations that exist 47

48 Junk Dimension Example 48 Enrollment_Status_ID_ SK Registration_Statu s Permit _Issued Class_Fee_ Status 1Wait ListYPaid 2Wait ListYUnpaid 3Wait ListNPaid 4Wait ListNUnpaid 5ConfirmedYPaid 6ConfirmedYUnpaid 7ConfirmedNPaid 8ConfirmedNUnpaid 9Awaiting ApprovalYPaid 10Awaiting ApprovalYUnpaid 11Awaiting ApprovalNPaid 12Awaiting ApprovalNUnpaid

49 Degenerate Dimensions  An attribute (dimension) stored in fact table  Typically a high-cardinality attribute  Attribute does NOT link to a dimension table  Often used for drill-downs and/or data mining (e.g. Market Basket Analysis) 49

50 Degenerate Dimension Example 50

51 Slowly-Changing Dimensions 51  What you want to do when a value in dimension record changes 0. Do Nothing 1. Overwrite Record 2. Retain All History (add new rows) 3. Retain Some History (add new columns)  Impacts ETL

52 Type 0 (Fixed Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles GenderM Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles GenderF Update Update Ignored or Failure © 2006 Microsoft Corporation.

53 Type 1 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271-0000 Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Simple UPDATE statement applied: UPDATE DimCustomer Set AddressLine1 = ‘123 Main St’, ZipCode = ‘54276’ WHERE CustomerID = 5000017302 © 2006 Microsoft Corporation.

54 Simple UPDATE statement applied: UPDATE DimCustomer Set EndDate = ‘2/18/2007’ WHERE CustomerID = 5000017302 Type 2 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 StartDate1/1/2007 EndDateNULL Customer Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10108 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street 123 Main St. ZipCode5427154276 StartDate1/1/20072/18/2007 EndDate2/18/2007NULL © 2006 Microsoft Corporation. Then INSERT statement applied: INSERT INTO DimCustomer (CustomerID, LastName, Firstname…) VALUES (5000017302, 'Harris', 'Miles', ‘123 Main St’, ‘54276’, '2/18/2007',NULL)

55 Type 3 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 StartDate1/1/2007 EndDateNULL Customer Source Extract CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10 CustomerID5000017302 LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 Updated AddressLine1 123 Main St. Updated ZipCode54276 © 2006 Microsoft Corporation. Simple UPDATE statement applied: UPDATE DimCustomer Set UpdatedAddressLine1 = ‘123 Main St’, UpdatedZipCode = ‘54276’ WHERE CustomerID = 5000017302

56 Data Warehouse Project Lifecycle 56 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

57 DW Physical Design 57

58 Summary  DW Requirements, Design  OLTP vs Cube Approach  Dimensional Model Basic Components  Facts  Measures  Dimensions  Attributes  Keys  Primary  Surrogate  Business  Foreign  Schemas  Hierarchies  Slowly-Changing Dimensions  Junk Dimensions  Degenerate Dimensions 58


Download ppt "Dimensional Modeling 1. Agenda  Review: Business Requirements  Dimensional Model Components  Dimensional Model Schemas  Additional Modeling Concepts."

Similar presentations


Ads by Google