MIS2502: Data Analytics Dimensional Data Modeling

Slides:



Advertisements
Similar presentations
Dimensional Modeling.
Advertisements

CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
BY LECTURER/ AISHA DAWOOD DW Lab # 4 Overview of Extraction, Transformation, and Loading.
Alternative Database topology: The star schema
Jennifer Widom On-Line Analytical Processing (OLAP) Introduction.
Dimensional Modeling Business Intelligence Solutions.
Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics
MIS 451 Building Business Intelligence Systems Logical Design (5) – Aggregate.
Dimensional Modeling – Part 2
Exploiting the DW data DW is a platform for creating a wide array of reports It solves data feed problems, but does not lead to specific decision support.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Business Intelligence. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views.
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Data Warehousing ISYS 650. What is a data warehouse? A data warehouse is a subject-oriented, integrated, nonvolatile, time-variant collection of data.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
MIS2502: Data Analytics Relational Data Modeling
ITEC 3220A Using and Designing Database Systems
MIS2502: Data Analytics Extract, Transform, Load
On-Line Analytic Processing Chetan Meshram Class Id:221.
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Presented By: Muhammad Rizvi Raghuram Vempali Surekha Vemuri.
THE INFORMATION ARCHITECTURE OF THE ORGANIZATION MIS2502 Data Analytics.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Module 1: Introduction to Data Warehousing and OLAP
MIS2502: Data Analytics The Information Architecture of an Organization.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
DW-2: Designing a Data Warehousing System 용 환승 이화여자대학교
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
Data Warehousing Multidimensional Analysis
MIS2502: Data Analytics Relational Data Modeling
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
MIS2502: Data Analytics The Things You Can Do With Data Amy Lavin
MIS2502: Data Analytics Relational Data Modeling David Schuff
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
What Do You Do With Data? Gather Store Retrieve Interpret.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
INCREMENTAL AGGREGATION After you create a session that includes an Aggregator transformation, you can enable the session option, Incremental Aggregation.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Defining Data Warehouse Structures Data Warehouse Data Access End User Data Access Data Sources Staging Area Data Marts Data Extract, Transform, and Load.
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
MIS2502: Data Analytics Relational Data Modeling
On-Line Analytic Processing
Data warehouse and OLAP
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
On-Line Analytical Processing (OLAP)
MIS2502: Data Analytics Relational Data Modeling
MIS2502: Data Analytics The Information Architecture of an Organization David Schuff
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
MIS2502: Data Analytics Dimensional Data Modeling
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Data Warehousing.
Presentation transcript:

MIS2502: Data Analytics Dimensional Data Modeling

Where we are… Now we’re here… Data entry Transactional Database Data extraction Analytical Data Store Data analysis Stores real-time transactional data Stores historical transactional and summary data

What do we know so far? Why are relational databases good for storing transaction data? Why are they bad for analytical processing? What’s the solution?

Some terminology Data Warehouse Takes many forms Really is just a repository for data Data Mart More focused Specially designed for analysis Data Cube Organization of data as a “multidimensional matrix” Implementation of a Data Mart

How they all relate We’ll start here. The data in the operational database… …is put into a data warehouse… …which feeds the data mart… …and is analyzed as a cube. We’ll start here.

Why isn’t product price a measured fact? The Data Cube Product Core component of Online Analytical Processing and Multidimensional Data Analysis Made up of “facts” and “dimensions” Diet Coke Famous Amos M&Ms Doritos quantity & total price quantity & total price quantity & total price quantity & total price Ardmore, PA quantity & total price quantity & total price quantity & total price quantity & total price Temple Main Store quantity & total price quantity & total price quantity & total price quantity & total price Cherry Hill, NJ quantity & total price quantity & total price quantity & total price quantity & total price Mar. 2013 King of Prussia, PA Feb. 2013 Jan. 2013 Time Quantity sold and total price are measured facts. Why isn’t product price a measured fact?

A single summary record representing a business event (monthly sales). The Data Cube Product Diet Coke Famous Amos M&Ms Doritos The highlighted element represents all the M&Ms sold in Ardmore, PA in January, 2013 quantity & total price quantity & total price quantity & total price quantity & total price Ardmore, PA quantity & total price quantity & total price quantity & total price quantity & total price Temple Main A single summary record representing a business event (monthly sales). Store quantity & total price quantity & total price quantity & total price quantity & total price Cherry Hill, NJ quantity & total price quantity & total price quantity & total price quantity & total price Mar. 2013 King of Prussia, PA Feb. 2013 Jan. 2013 Time

This is called “slicing the data.” The Data Cube Product Diet Coke Famous Amos M&Ms Doritos The highlighted elements represent Famous Amos cookies sold on Temple’s Main campus from January to March, 2013 quantity & total price quantity & total price quantity & total price quantity & total price Ardmore, PA quantity & total price quantity & total price quantity & total price quantity & total price Temple Main Store quantity & total price quantity & total price quantity & total price quantity & total price This is called “slicing the data.” Cherry Hill, NJ quantity & total price quantity & total price quantity & total price quantity & total price Mar. 2013 King of Prussia, PA Feb. 2013 Jan. 2013 Time

The Data Cube Product Store Time Diet Coke Famous Amos M&Ms Doritos What do the orange highlighted elements represent? quantity & total price quantity & total price quantity & total price quantity & total price Ardmore, PA quantity & total price quantity & total price quantity & total price quantity & total price Temple Main Store quantity & total price quantity & total price quantity & total price quantity & total price What do the purple highlighted elements represent? Cherry Hill, NJ quantity & total price quantity & total price quantity & total price quantity & total price Mar. 2013 King of Prussia, PA Feb. 2013 Jan. 2013 Time

Could you have a data mart with five dimensions? Then why does our example (and most others) only have three?

Designing the Cube: The Star Schema Impractical to create cube directly from relational database Too many joins Required querying will be too slow So we transform data into a star schema Remove as many joins as possible Most are one-level “deep” Facilitates Ad-hoc reporting Creation of cubes Store Store_ID Store_Address Store_City Store_State Store_Type Dimension Sales Sales_ID Product_ID Store_ID Time_ID Quantity Sold Total Price Fact Product Product_ID Product_Name Product_Price Product_Weight Time Time_ID Day Month Year Dimension Dimension

A join to make the cube? Conceptually yes, but storing the join would create many, many, many rows! Sales ID Qty. Sold Total Price Prod. ID Prod. Name Prod. Price Prod. Weight Store ID Store Address Store City Store State Store Type Time ID Day Month Year 1000 1001 1002 Sales Fact Product Dimension Store Dimension Time Dimension

So summaries get stored in a “multidimensional matrix” Periodically summarize the data and store it in the cube Retrieve only the summary, not the raw data Much more efficient, but can’t be changed (non-volatile)

It adds up fast… 1000 products 300 stores 365 days =109,500,000 records per year!

Designing the Star Schema 1. Choose the business process 2. Identify the fact 3. Decide on the level of granularity 4. Identify the dimensions Kimball’s Four Step Process for Data Cube Design (Kimball et al., 2008)

Choose the business process What your data cube is “about” Determined by the questions you want to answer about your organization Question Business Process Who is my best customer? Sales What are my highest selling products? Which teachers have the best student performance? Standardized testing Which supplier is offering us the best deals? Purchasing Note that a “business process” is not always about business.

Try it for the “student performance” example. Identify the fact The data associated with the business event Keys Unique identifiers for each event For the event itself and the associated dimensions Associates a combination of the dimensions to a unique business event Example: Sales has Product_ID, Store_ID, and Time_ID Measured, numeric data Quantifiable information for each business event Does not describe any particular dimension Describes a particular combination of dimensional data Example: Sales has quantity_sold and total_price. Try it for the “student performance” example.

Decide on the level of granularity Level of detail for each event (row in the table) Will determine the data in the dimensions Example: Who is my best customer? The “event” is a sales transaction Choices for time: yearly, quarterly, monthly, daily Choices for store: store, city, state How would you select the right granularity?

Identify the dimensions The key elements of the process needed to answer the question (“fact”) Example: Sales transaction A “sale” is the fact Occurs for a particular product, store, and time Could this data mart tell you The best selling product? The best customer? Try it for the “student performance” example.

Data cube caveats The cube is “non volatile,” so you’re locked in Measured facts Dimensions Granularity So choose wisely! For example: You can’t track daily sales if “date” is monthly So why not include every single sale and do no aggregation? “In memory” analytics is changing all of this, but not quite yet…