Dimensional Modeling CS 543 – Data Warehousing. CS 543 - Data Warehousing (Sp 2007-2008) - Asim LUMS2 From Requirements to Data Models.

Slides:



Advertisements
Similar presentations
The Organisation As A System An information management framework The Performance Organiser Data Warehousing.
Advertisements

Lecture 3 Themes in this session Basics of the multidimensional data model and star- join schemata The process of, and specific design issues in, multidimensional.
Dimensional Modeling.
Tips and Tricks for Dimensional Modeling
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Data Warehousing M R BRAHMAM.
Dimensional Modeling Business Intelligence Solutions.
Data Warehouse IMS5024 – presented by Eder Tsang.
ETEC 100 Information Technology
Dimensional Modeling – Part 2
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti.
Data Warehousing Design Transparencies
Physical Design CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 Physical Design Steps 1. Develop standards 2.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education, Inc. publishing as Prentice Hall 4-1.
Telecommunication Case Study CS 543 – Data Warehousing.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Chapter 13 The Data Warehouse
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
Principles of Dimensional Modeling
Lecture 5 CS.456 DATABASE DESIGN.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
Business Intelligence
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Database Technical Session By: Prof. Adarsh Patel.
Concepts and Terminology Introduction to Database.
Dimensional model. What do we know so far about … FACTS? “What is the process measuring?” Fact types:  Numeric Additive Semi-additive Non-additive (avg,
Chapter 16 Methodology – Physical Database Design for Relational Databases.
Dimensional Modeling Chapter 2. The Dimensional Data Model An alternative to the normalized data model Present information as simply as possible (easier.
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
1 Data Warehousing Lecture-13 Dimensional Modeling (DM) Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.
Information Systems Today (©2006 Prentice Hall) 3-1 CS3754 Class Note 12 Summery of Relational Database.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
BI Terminologies.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Basic Model: Retail Grocery Store
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
Dimensional Modelling
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
More Dimensional Modeling. Facts Types of Fact Design Transactional Periodic Snapshot –Predictable time period –Ex. Monthly, yearly, etc. Accumulating.
UNIT-II Principles of dimensional modeling
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data modeling. Presentation by – Anupama Vudaru, Phani Kondapalli Content by – Prathibha Madineni, Subrahmanyam Kolluri October 2010.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
PRINCIPLES OF DIMENSIONAL MODELING
Chapter 13 The Data Warehouse
Applying Data Warehouse Techniques
Overview and Fundamentals
Dimensional Model January 14, 2003
Applying Data Warehouse Techniques
Relational Database Model
Applying Data Warehouse Techniques
Dimensional Modeling.
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
DWH – Dimesional Modeling
Applying Data Warehouse Techniques
Applying Data Warehouse Techniques
Presentation transcript:

Dimensional Modeling CS 543 – Data Warehousing

CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models

CS Data Warehousing (Sp ) - Asim LUMS3 Logical Data Model Logical data design includes identification of all data elements and the structures in which they are connected  Data elements  Data structures Requirement gathering, and more specifically, information packages lead to the logical data design

CS Data Warehousing (Sp ) - Asim LUMS4 Dimensional Modeling A logical data design technique to structure the business dimensions and the metrics that are analyzed along these dimensions Dimensional modeling  Is intuitive for business  Has proven to be efficient for queries and analyses Information packages are the foundation of dimensional modeling

CS Data Warehousing (Sp ) - Asim LUMS5 Fact Table

CS Data Warehousing (Sp ) - Asim LUMS6 Dimension Tables

CS Data Warehousing (Sp ) - Asim LUMS7 Desired Characteristics in the Model The model should provide the best data access The model should be query-centric The model must be optimized for queries and analyses The model must show that the dimension tables interact with the fact table It must be structured in such a way that every dimension can interact equally with the fact table The model should allow drill down and roll up along dimension hierarchies

CS Data Warehousing (Sp ) - Asim LUMS8 The Star Schema

CS Data Warehousing (Sp ) - Asim LUMS9 E-R Vs. Dimensional Modeling (1) Entity-relationship modeling  Removes data redundancy  Ensures data consistency  Expresses microscopic relationships Dimensional modeling  Captures critical measures  Views along dimensions  Intuitive for business users

CS Data Warehousing (Sp ) - Asim LUMS10 E-R Vs. Dimensional Modeling (2) DM rules more restrictive than for E-R modeling. DM is a simpler logical model. E-R representative power is greater due to variety of constructs supported. DM looks like normalized E-R conceptual, except:  All relationships mandatory M-1.  Single path between any two relations.

CS Data Warehousing (Sp ) - Asim LUMS11 Another Example: Retail Dimensions

CS Data Warehousing (Sp ) - Asim LUMS12 Star Schema

CS Data Warehousing (Sp ) - Asim LUMS13 Querying Against a Star Schema

CS Data Warehousing (Sp ) - Asim LUMS14 Dimension Tables Characteristics Dimension table key Large number of attributes (wide) Textual attributes Attributes not directly related Flattened out, not normalized Ability to drill down/roll up Multiple hierarchies Less number of records

CS Data Warehousing (Sp ) - Asim LUMS15 Fact Table Characteristics Concatenated fact table key Grain or level of data identified Fully additive measures Semi-additive measures Large number of records Only a few attributes Sparsity of data Degenerate dimensions

CS Data Warehousing (Sp ) - Asim LUMS16 “Factless” Fact Table If the metric or unit of analysis is occurrence or non-occurrence of an event, then the fact table will contain either 1 or nulls

CS Data Warehousing (Sp ) - Asim LUMS17 Data Granularity (1) Actual events are tied to actual transactions (e.g., sales).  This corresponds to the lowest grain or highest detail Accumulated events are the effect of accumulated transactions (e.g., inventory on hand).  This corresponds to a higher grain and lesser detail Allowable events represent the “ability” to perform a transaction (e.g., carried products, a.k.a. plan-o-gram). Actual events are typically more sparse than allowable events (e.g., a store carries more products than it sells each day).

CS Data Warehousing (Sp ) - Asim LUMS18 Data Granularity (2) Low grain designs are easy to change (“graceful” change) Low grain designs result in larger storage and maintenance costs

CS Data Warehousing (Sp ) - Asim LUMS19 Keys (1) Be careful in picking and using operational system keys as keys for the dimension tables  Avoid built-in meanings in the primary keys of the dimension tables  Do not use operational system keys as primary keys of dimension tables  Use surrogate keys (system generated keys) Keep a mapping between the surrogate and primary keys  Include the operational system primary key as an attribute in the dimension table

CS Data Warehousing (Sp ) - Asim LUMS20 Keys (2) Primary key options for fact tables A single compound primary key whose length is the total length of the keys of the dimension tables  Foreign keys need to be stored as additional attributes in the fact table  Increases size of fact table A concatenated primary key that is the concatenation of all the primary keys of the dimension tables  No need to store the foreign keys separately A generated primary key independent of the keys of the dimension tables  Foreign keys need to be stored as additional attributes

CS Data Warehousing (Sp ) - Asim LUMS21 Advantages of the Star Schema Easy for users to understand Optimizes navigation More suitable for query processing Star-join and star-index

CS Data Warehousing (Sp ) - Asim LUMS22 Updates Updates to the fact table  Addition of rows  Changes in row (adjustments in values)  Rarely, addition of attributes (new fact or metric) Updates to dimension tables  Slow addition of rows  Slow addition of attributes  New dimensions

CS Data Warehousing (Sp ) - Asim LUMS23 Updates to the Dimension Tables Most dimensions are generally constant over time Many dimensions, if not constant, change slowly over time The key of the source record does not change The description and other attributes change slowly over time In the source OLTP systems, the new values overwrite the old values Overwriting is not always the best option for dimension table attributes The way updates are made depends on the type of change

CS Data Warehousing (Sp ) - Asim LUMS24

CS Data Warehousing (Sp ) - Asim LUMS25 Type 1 Changes: Correction of Errors (2) Overwrite the attribute value in the dimension table row with the new value The old value of the attribute is not preserved No other change are made in the dimension table row The key of this row or any other key value are not affected This type is easiest to implement

CS Data Warehousing (Sp ) - Asim LUMS26 Type 1 Changes (2)

CS Data Warehousing (Sp ) - Asim LUMS27 Type 2 Changes: Preservation of History (1) Properties  They usually relate to true changes in source systems  There is a need to preserve history in the data warehouse  This type of change partitions the history in the data warehouse  Every change for the same attribute must be preserved Approach  Add a new dimension table row with the new value of the changed attribute  An effective data field may be added into the dimension table  There are no changes to the original row of the dimension table  The new row is inserted with a new surrogate key  The key of the original row is not affected

CS Data Warehousing (Sp ) - Asim LUMS28 Type 2 Changes: Preservation of History (2)

CS Data Warehousing (Sp ) - Asim LUMS29 Type 3 Changes: Tentative Soft Revisions (1) Properties  They usually apply to “soft” or tentative changes in the source systems  There is a need to keep track of history with old and new values of the changed attribute  They are used to compare performances across the transition  They provide the ability to track forward and backward Approach  Add an “old” field in the dimension table for the affected attribute  Push down the existing value of the attribute from the “current” field to the “old” field  Keep the new value of the attribute in the “current” field  Also, you may add a “current” effective date field for the attribute  The key of the row is not affected  No new dimension row is needed

CS Data Warehousing (Sp ) - Asim LUMS30 Type 3 Changes: Tentative Soft Revisions (2)

CS Data Warehousing (Sp ) - Asim LUMS31 Large Dimensions (1) Large dimensions?  Large number of rows (deep)  Large number of attributes (wide) Dimensions can become large because of frequent changes (what type?) and need to have many attributes for analysis Consequence  Slow and inefficient Solution  Proper logical and physical design  Indexes  Optimized algorithms

CS Data Warehousing (Sp ) - Asim LUMS32 Large Dimensions (2)

CS Data Warehousing (Sp ) - Asim LUMS33 Multiple Hierarchies

CS Data Warehousing (Sp ) - Asim LUMS34 Junk Dimensions Dimensions for a DW are typically taken from operational source systems However, source systems contain many additional attributes (such as flags, text, descriptions, etc) that may not be useful in a DW What are the options  Discard all such fields in the source systems  Include them in the fact table  Include all of them as dimensions  Select some and add them to a single “junk” dimension table

CS Data Warehousing (Sp ) - Asim LUMS35 The Snowflake Schema Snowflaking is a method of normalizing the tables in a star schema