Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti.

Slides:



Advertisements
Similar presentations
Chapter 4 Tutorial.
Advertisements

Adv. DBMS & DW Chapter 9: Insurance Chapter 10: Factless Fact Tables
Lecture 3 Themes in this session Basics of the multidimensional data model and star- join schemata The process of, and specific design issues in, multidimensional.
Dimensional Modeling.
Tips and Tricks for Dimensional Modeling
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Entity Relationship (E-R) Modeling Hachim Haddouti
Copyright © Starsoft Inc, Data Warehouse Architecture By Slavko Stemberger.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 7: DW for a large Bank Adv. DBMS & DW Hachim Haddouti.
ETEC 100 Information Technology
1 9 Ch3, Hachim Haddouti Adv. DBS and Data Warehouse CSC5301 Ch3 Hachim Haddouti Hachim Haddouti.
Dimensional Modeling – Part 2
How to build your own… Super Model Dimensional Modelling for Analysis Services Darren Gosbell Principal Consultant - James & Monroe
Ch1: File Systems and Databases Hachim Haddouti
File Systems and Databases Hachim Haddouti
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
CSE6011 Warehouse Models & Operators  Data Models  relations  stars & snowflakes  cubes  Operators  slice & dice  roll-up, drill down  pivoting.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Principles of Dimensional Modeling
COMPUTING FOR BUSINESS AND ECONOMICS-III. Lecture no.6 COURSE INSTRUCTOR- Ms. Tehseen SEMESTER- Summer 2010.
Sayed Ahmed Logical Design of a Data Warehouse.  Free Training and Educational Services  Training and Education in Bangla: Training and Education in.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch4 Adv. DBMS & DW CH 4 Hachim Haddouti.
CODD’s 12 RULES OF RELATIONAL DATABASE
Dimensional model. What do we know so far about … FACTS? “What is the process measuring?” Fact types:  Numeric Additive Semi-additive Non-additive (avg,
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
In this chapter, you learn about the following: ❑ Anomalies ❑ Dependency and determinants ❑ Normalization ❑ A layman’s method of understanding normalization.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch11 Chapter 11: Voyage Businesses Adv. DBMS & DW Hachim Haddouti.
Technical Team WITSML SIG Dubai - November 2008 John Shields / Gary Masters.
BI Terminologies.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Hachim Haddouti, adv. DBMS & DW CSC5301, Ch5 Chapter 5: The Value Chain Adv. DBMS & DW Hachim Haddouti.
DIMENSIONAL MODELING MIS2502 Data Analytics. So we know… Relational databases are good for storing transactional data But bad for analytical data What.
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
Dimensional Modelling
1 Data Warehousing Lecture-15 Issues of Dimensional Modeling Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
A337 - Reed Smith1 Structure What is a database? –Table of information Rows are referred to as records Columns are referred to as fields Record identifier.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
MIS 451 Building Business Intelligence Systems Logical Design (1)
Data Warehousing (Kimball, Ch.5-12) Dr. Vairam Arunachalam School of Accountancy, MU.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.
Advanced Data Modeling. Heterogeneous Mapping Heterogeneous Mapping is the ability of MSTR7 tools to join on unlike column names. Heterogeneous Mapping.
Chapter 4 Logical & Physical Database Design
Chapter 16. Insurance 서울시립대학교 인공지능 연구실 G 조찬연 The Data Warehouse Toolkit 1 /35.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Overview and Fundamentals
Retail Sales is used to illustrate a first dimensional model
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
Examines blended and separate transaction schemas
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Presentation transcript:

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Chapter 6: The Big Dimensions Adv. DBMS & DW Hachim Haddouti

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 The Big Dimensions The product dimension (complete portfolio of what a company sells), derived from the product master file Example o f Big Dim: product & customer dimensions converting the “production product master file” into the “product dimension table” has the following steps: – remapping of key to avoid duplication (considering time, and reuse, eg of UPC’s) –remapping of key for shorter and efficient join (ie, UPC = 12 digit internationally) –generalization of key for changing product description over time –generalization of key for aggregate products (SKU code for brand?) –addition of text to replace numeric codes or cryptic abbreviations (useless reports containing cryptic numbers and codes) –quality assurance of text strings -- no trivial variations (cleaning up master file)

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 The Big Dimenssions DP: “ Although the production product master file is the source of product identification, it must be transformed or augmented on a continuing basis in order to serve as the product dimension in the data warehouse. The primary steps needed are the generalization and/or replacement of the primary product key, and the completion and quality assurance of the descriptive attributes.” at least 50 descriptive fields for large companies! Note: Facts in Fact Table vary continuously every time Dim table attributes are virtually almost constant over time

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 many-to-one relationship in ascending hierarchy The true meaning of drill down  show me more detail, ie add a row headerto the report Multiple hierarchies ( eg. Sales and Finance hierarchies) DP “ A typical dimension contains one or more natural hierarchies, together with other attributes that do not have a hierarchical relationship to any of the attributes in the dimension. Any of hte attributes, whether or not they belong ot a hierarchy, can freely be used in drilling down and drilling up.” The merchandise hierarchy

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Resisting the urge to snowflake Figure 6.2 p separating dimension by hierarchy The threat to browsing performance PD: «Do not snowflake your dimensions, even if they are large. If you do snowflake your dimensions, be prepared to live with poor browsing performance. »  To preserve browsing performance Really big customer dimensions, eg. 10M customers with 3 hierarchies... heavy use of demographic fields (age, gender, « of children, education level, behaviors etc.) -- only ones to be indexed and need of new indexing techniques The Big Dimensions cont.

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Exampleof Star Schema: Sales

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Example of snowflake: Sales 1M1M 1M1M 1 M 1 M Sales Location Region State City LOC_ID LOC_Desc City_ID TIME_ID LOC_ID CUS_ID PROD_ID Sales_qt Sales_price Sales_total Region_ID Region_name State_ID Region_ID State_name City_ID State_ID City_name

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Example of multi_fact tables 1M1M 1M1M 1 M 1 M Sales location Location Region State City LOC_ID LOC_Desc City_ID TIME_ID LOC_ID CUS_ID PROD_ID Sales_qt Sales_price Sales_total Region_ID Region_name State_ID Region_ID State_name City_ID State_ID City_name Sales_City TIME_ID LOC_ID CUS_ID PROD_ID City_ID Sales_City_qt Sales_city__price Sales_city__total 1 M Sales_Region ….

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Demographic minidimensions Separating one or more sets of demographic attributes in minidimensions, see Figure 6.3 p. 99 contains only the distinct combinations of demographic information, and grouping them in “bands” (such as age or income level) DP: » The best approach for tracking changes in really huge dimensions is to break off one or more minidimensions from the dimension table, each consisting of small clmups of attributes that have been administered to have a limited number of values.”

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Slowly changing dimensions Are the Customer and Product Dim independent of Time Dim? Changes in names, family status, product disrtict/region How to handle these changes in order not to affect the history status? Eg. Insurance 3 suggestion for slowly changing dimensions Type 1 -- overwrite/erase old valuesno accurate tracking of history needed; easy to implement; eg. Overwrite Marital Status field Type 2 -- create new record at time of change; partitioning the history (old and new description); Hajj Boussalhame married and single. If we constrain on Marital Status = Married, we will not see Hajj Boussalhame before he got married; So it is not possible to compare the perfromance across the transition. Type 3 -- new “current” fields, legitimate need to track both old and new states “Original” and “current” values; Intermediate Values are lost.

Hachim Haddouti, adv. DBMS & DW CSC5301, Ch6 Slowly changing dimensions (cont.) DP: « The use of the Type Two slowly changing dimension requires that the dimension key be generalized. It may be sufficient to take the underlying production key and add two or three version digits to the end of the key to simplify the key generation process.” Where can this be implemented? In DW or production system? DP: « The creation of generalized keys is usually the responsibility of the data warehouse team, and always requires metadata to keep track of the generalized keys that have already been used.” DP: “The Type Two slowly changing dimension automatically partitions history and an application must not be required to place any time constraints on effective dates in the dimension”