Dimensional Modeling 1. Agenda  Review: Business Requirements  Dimensional Model Components  Dimensional Model Schemas  Additional Modeling Concepts.

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Cognos 8 Training Session
An overview of Data Warehousing and OLAP Technology Presented By Manish Desai.
James Serra – Data Warehouse/BI/MDM Architect
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Technical BI Project Lifecycle
DATA WAREHOUSE DATA MODELLING
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Designing a Data Warehouse
Principles of Dimensional Modeling
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
ETL Design and Development Michael A. Fudge, Jr.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Data Warehouse to BI 1. Agenda  Review  Preparing the DW for Analysis  Microsoft BI Platform Overview  Building a Cube in SSAS 2.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Systems analysis and design, 6th edition Dennis, wixom, and roth
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehousing.
BI Terminologies.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Data Staging Data Loading and Cleaning Marakas pg. 25 BCIS 4660 Spring 2012.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Advanced Database Concepts
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Dimensional Modeling 1. Agenda  DW Project Lifecycle  Eliciting Business Requirements  Dimensional Model Components  Dimensional Model Schemas  Additional.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 14: Data Warehousing Modern Database Management 9 th Edition Jeffrey A. Hoffer, Mary.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Business Intelligence Overview
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Applying Data Warehouse Techniques
CMPE 226 Database Systems April 11 Class Meeting
Data Warehouse and OLAP
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Dimensional Modeling.
Introduction of Week 9 Return assignment 5-2
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Data Warehouse and OLAP
Presentation transcript:

Dimensional Modeling 1

Agenda  Review: Business Requirements  Dimensional Model Components  Dimensional Model Schemas  Additional Modeling Concepts 2

DW Development Approach: Kimball  Methodology  DW Project Lifecycle  Business requirements  Business Requirements Documentation  Bus Matrix  Design, build and deliver in increments  DW Architecture  DW Design  ETL system  Reports, query tools, … 3

Data Warehouse Project Lifecycle 4 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

Project Planning  Determine:  Initial project scope  Project cost  Define:  Team roles  Team members  Project schedule 5

Data Warehouse Project Lifecycle 6 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

DW Development Approach: Kimball  Methodology  DW Project Lifecycle  Business requirements  Business Requirements Documentation  Bus Matrix  Design, build and deliver in increments  DW Architecture  DW Design  ETL system  Cube, Reports, query tools, … 7

Requirements Elicitation  Analysis requirements  Identify who to interview  Conduct Interviews  Business challenges  Definition of success  More effective in job  Other discovery methods  Existing systems  Reports…  Document & Prioritize 8

Documenting Requirements  Interview Summaries  Prose summarizing interviews  Kimball format Kimball format  Analytic Themes  Analysis Requirements grouped into “categories”  Kimball format (pg 35) Kimball format  DW Bus Matrix  Business processes mapped to data needed  Kimball format (pg 37) Kimball format  DM Information Package  Prioritized processes  Ponniah format (pg 104) Ponniah format 9

Kimball Example: Interview Summaries 10

Kimball Example: Analytic Themes 11

Class Example: University Dept. Requirements 12

Kimball Example: Bus Matrix 13

Class Example: University Dept. Bus Matrix 14

Class Example: University Dept. Information Package 15

In-Class Example: Newspaper Information Package 16

Data Warehouse Project Lifecycle 17 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

DW Development Approach: Kimball  Methodology  DW Project Lifecycle  Business requirements  Business Requirements Documentation  Bus Matrix  Design, build and deliver in increments  DW Architecture  DW Design  ETL system  Cube, Reports, query tools, … 18

ERD 19

Reporting Challenges with ERD/OLTP  Model designed for efficient record processing, not "subject" processing  External data often excluded  Analyses require multiple joins  Indexes not optimized for reporting  History not stored 20

Pre-Computing Aggregates 21 MonthProductCityTOTAL Sales Quantity OctProd1Abiline9556 Prod1Austin799 Prod1Dallas1356 Prod1Waco36678 Prod2Abiline7869 Prod2Austin2967 Prod2Dallas568 Prod2Waco Prod3Abiline43 Prod3Austin6588 Prod3Dallas8434 Prod3Waco3756 NovProd1Abiline77977 Prod1Austin234 Prod1Dallas4378 Prod1Waco20349 Prod2Abiline210 Prod2Austin789 Prod2Dallas888 Prod2Waco4566 Prod3Abiline2078 Prod3Austin292 Prod3Dallas1111 Prod3Waco36 DecProd1Abiline34657 Prod1Austin2999 Prod1Dallas5888 Prod1Waco9999 Prod2Abiline1580 Prod2Austin2940 Prod2Dallas975 Prod2Waco5748 Prod3Abiline6140 Prod3Austin211 Prod3Dallas1357 Prod3Waco1000 Queries: 1.Total Sales 2.Total Sales by Month 3.Total Sales by Month and Product Line 4.Total Sales by Month, Product Line, and City 5.Total Sales by City …..

Pre-Computing Aggregates, cont… 22 OctNov Dec P1 P2 P3 Total Sales Total Sales by Month and Product Total Sales by Month (1 "fact“, 0 “dimensions”) (1 "fact", 1 "dimension" with 3 values) (1 "fact", 2 "dimensions" each with 3 values) OctNovDec select sum(ordered_quantity) as "total" from order_line_t; select month(order_date) as "month", sum(ordered_quantity) as "total" from order_line_t ol, order_t o where ol.order_id = o.order_id group by month(order_date); select month(order_date) as "month", p.product_line_id as "product", sum(ordered_quantity) as "total" from order_line_t ol, order_t o, product_t p where ol.order_id = o.order_id and ol.product_id = p.product_id group by month(order_date), p.product_line_id;

Pre-Computing Aggregates, cont… 23 OctNov Dec P1 P2 P3 Total Sales by Month, Product, & City (1 "fact", 3 "dimensions" each with 3 values) AB AU DA WA select month(order_date) as "month", p.product_line_id as "product", c.city, sum(ordered_quantity) as "total" from order_line_t ol, order_t o, product_t p, customer_t c where ol.order_id = o.order_id and ol.product_id = p.product_id and o.customer_id = c.customer_id group by month(order_date), p.product_line_id, c.city;

OLAP Review  Short:  Class of applications or tools that support ad-hoc analysis of multidimensional data  Longer:  “…technology that enables [users]… to gain insight into data through…fast, consistent, interactive access [to]…information that has been transformed…to reflect the real dimensionality of the enterprise…”  OLAP Council ( 24

OLAP Cubes  Flexible, interactive information delivery to DW  Multidimensional data representation and operations  Rollup  Drill-down  Slice/Dice  Pivot (or Rotate)  Improves Reporting Performance  Pre-processed aggregates  Data In-memory  Index Structures  Bye Bye Locks! … 25

26

27

28

29

Dimensional Modeling  Data Model  Logical view of a multi-dimensional cube  Key structures and components  Fact table(s)  Key business process  Facts/Measurements/metrics  Foreign Keys  Dimension tables  Ways to view measures  Attributes  Often denormalized  Surrogate Key vs. Business Key  Hierarchies 30

Dimensional Model Example 31 Fact Table Dimension Tables Foreign Keys Attributes Measures Business Key Include it!Surrogate Key Hierarchy DIM FACT

Dimensional Model Characteristics Dim TablesFact Tables                     32

Star Schema  At least one fact table and (typically) two or more dimension tables  Fact table has direct relationship with each of the dimension tables  “Single-table” dimensions  Arrangement resembles a "star" 33

Star Schema Example 34

Snowflake Schema 35  Fact table has direct relationship with some dimension tables, and indirect relationship with other(s)  Multi-table dimensions  i.e., "Normalized" dimensions

Snowflake Example 36

Comparison of Schemas  Star  The much-preferred approach  Adv:  Faster load/query/analysis performance  Potentially more intuitive to users  Snowflake  Adv:  Potentially faster setup  Avoid data redundancy  Reduces size of dimension table  Ease of maintaining 37

Common Dims, Facts, Measures  Dims  38  Facts   Measures 

In-Class Example: Newspaper Dim Model 39

Additional Modeling Concepts  Surrogate Keys  Attribute Hierarchies  Time Dimensions  Junk Dimensions  Degenerate Dimensions  Slowly-Changing Dimensions 40

Surrogate Keys  Problem:  Potential for PK to change in source systems  e.g., PKs with built-in meaning  Data spread across multiple systems  PK's exist???  PK's consistent???  PK's means same thing???  Surrogate Key  Newly-generated PK for dimension rows in DW  System-generated sequence numbers  Mapped to source/application key(s)  Fact rows reference SKs 41

Surrogate Keys Example 42

Attribute Hierarchies  1:M relationships between attributes  Supports user navigation  drill-downs, drill-ups  Improves performance  Assists SSAS in aggregation selection  Storage improvement 43

Attribute Hierarchy Examples 44 State City Year Month Year Semester

Date / Time Dimension  Common feature of every data warehouse  Minimum attributes:  Date key (e.g , , 12345)  Date name (e.g. Monday, January )  Common additional attributes  Month, Year, Quarter, …  Holiday Name, … 45

Time Dimension Example 46

Junk Dimensions  Stores one or more "lookup" codes, flags, indicators that describe or categorize transactions/events  Usually low cardinality  May include all valid combinations of codes OR valid combinations that exist 47

Junk Dimension Example 48 Enrollment_Status_ID_ SK Registration_Statu s Permit _Issued Class_Fee_ Status 1Wait ListYPaid 2Wait ListYUnpaid 3Wait ListNPaid 4Wait ListNUnpaid 5ConfirmedYPaid 6ConfirmedYUnpaid 7ConfirmedNPaid 8ConfirmedNUnpaid 9Awaiting ApprovalYPaid 10Awaiting ApprovalYUnpaid 11Awaiting ApprovalNPaid 12Awaiting ApprovalNUnpaid

Degenerate Dimensions  An attribute (dimension) stored in fact table  Typically a high-cardinality attribute  Attribute does NOT link to a dimension table  Often used for drill-downs and/or data mining (e.g. Market Basket Analysis) 49

Degenerate Dimension Example 50

Slowly-Changing Dimensions 51  What you want to do when a value in dimension record changes 0. Do Nothing 1. Overwrite Record 2. Retain All History (add new rows) 3. Retain Some History (add new columns)  Impacts ETL

Type 0 (Fixed Attribute) DimCustomer Table CustomerSK10 CustomerID LastNameHarris FirstNameMiles GenderM Source Extract CustomerID LastNameHarris FirstNameMiles GenderF Update Update Ignored or Failure © 2006 Microsoft Corporation.

Type 1 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode Source Extract CustomerID LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10 CustomerID LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Simple UPDATE statement applied: UPDATE DimCustomer Set AddressLine1 = ‘123 Main St’, ZipCode = ‘54276’ WHERE CustomerID = © 2006 Microsoft Corporation.

Simple UPDATE statement applied: UPDATE DimCustomer Set EndDate = ‘2/18/2007’ WHERE CustomerID = Type 2 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 StartDate1/1/2007 EndDateNULL Customer Source Extract CustomerID LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10108 CustomerID LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street 123 Main St. ZipCode StartDate1/1/20072/18/2007 EndDate2/18/2007NULL © 2006 Microsoft Corporation. Then INSERT statement applied: INSERT INTO DimCustomer (CustomerID, LastName, Firstname…) VALUES ( , 'Harris', 'Miles', ‘123 Main St’, ‘54276’, '2/18/2007',NULL)

Type 3 (Changing Attribute) DimCustomer Table CustomerSK10 CustomerID LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 StartDate1/1/2007 EndDateNULL Customer Source Extract CustomerID LastNameHarris FirstNameMiles AddressLine1123 Main St. ZipCode54276 Update Updated DimCustomer Table CustomerSK10 CustomerID LastNameHarris FirstNameMiles AddressLine15363 Blackshire Street ZipCode54271 Updated AddressLine1 123 Main St. Updated ZipCode54276 © 2006 Microsoft Corporation. Simple UPDATE statement applied: UPDATE DimCustomer Set UpdatedAddressLine1 = ‘123 Main St’, UpdatedZipCode = ‘54276’ WHERE CustomerID =

Data Warehouse Project Lifecycle 56 Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

DW Physical Design 57

Summary  DW Requirements, Design  OLTP vs Cube Approach  Dimensional Model Basic Components  Facts  Measures  Dimensions  Attributes  Keys  Primary  Surrogate  Business  Foreign  Schemas  Hierarchies  Slowly-Changing Dimensions  Junk Dimensions  Degenerate Dimensions 58