Star Schema.

Slides:



Advertisements
Similar presentations
The Organisation As A System An information management framework The Performance Organiser Data Warehousing.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Cognos 8 Training Session
Chapter 10: Designing Databases
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Dimensional Modeling – Part 2
MIS 451 Building Business Intelligence Systems Logical Design (3) – Design Multiple-fact Dimensional Model.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Agenda Common terms used in the software of data warehousing and what they mean. Difference between a database and a data warehouse - the difference in.
ITEC 3220A Using and Designing Database Systems
DWH – Dimesional Modeling PDT Genči. 2 Outline Requirement gathering Fact and Dimension table Star schema Inside dimension table Inside fact table STAR.
OnLine Analytical Processing (OLAP)
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Chapter 1 Adamson & Venerable Spring Dimensional Modeling Dimensional Model Basics Fact & Dimension Tables Star Schema Granularity Facts and Measures.
Module 1: Introduction to Data Warehousing and OLAP
BI Terminologies.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
Normalized model vs dimensional model
Designing a Data Warehousing System. Overview Business Analysis Process Data Warehousing System Modeling a Data Warehouse Choosing the Grain Establishing.
UNIT-II Principles of dimensional modeling
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Data Warehouses and OLAP 1.  Review Questions ◦ Question 1: OLAP ◦ Question 2: Data Warehouses ◦ Question 3: Various Terms and Definitions ◦ Question.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Extending and Creating Dynamics AX OLAP Cubes
Defining Data Warehouse Structures Data Warehouse Data Access End User Data Access Data Sources Staging Area Data Marts Data Extract, Transform, and Load.
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
View Integration and Implementation Compromises
Operation Data Analysis Hints and Guidelines
Chapter 8 - Data Warehouse and Data Mart Modeling
Database Normalization
Data Warehouse.
Data warehouse and OLAP
Entity-Relationship Model
Tables and Their Characteristics
Chapter 13 The Data Warehouse
Data Warehouse.
Applying Data Warehouse Techniques
MIS2502: Data Analytics Dimensional Data Modeling
Overview and Fundamentals
Competing on Analytics II
Inventory is used to illustrate:
Retail Sales is used to illustrate a first dimensional model
MIS2502: Data Analytics Dimensional Data Modeling
CMPE 226 Database Systems April 11 Class Meeting
Database Fundamentals
Chapter 3 The DATA DIVISION.
Applying Data Warehouse Techniques
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Data Warehousing: Data Models and OLAP operations
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Modeling.
MIS2502: Data Analytics Dimensional Data Modeling
Retail Sales is used to illustrate a first dimensional model
Dimensional Model January 16, 2003
Relational Database Design
DWH – Dimesional Modeling
Review of Major Points Star schema Slowly changing dimensions Keys
Transaction fact table (figure 7.2)
Applying Data Warehouse Techniques
Review of Major Points Star schema Slowly changing dimensions Keys
Information system analysis and design
Presentation transcript:

Star Schema

Snow flake schema

Fact constellation schema

What is a Fact Table? Fact tables contain the data corresponding to a particular business process. Each row represents a single event associated with that process and contains the measurement data associated with that event. The information contained within a fact table is typically numeric data.

What are Dimensions? Dimensions describe the objects involved in a business intelligence effort. While facts correspond to events, dimensions correspond to people, items, or other objects. Dimension tables contain details about each instance of an object.

Fact table Vs dimension table Fact tables and dimension tables are related to each other. If we take example of retail model, the fact table for a customer transaction would likely contain a foreign key reference to the item dimension table, where the entry corresponds to a primary key in that table for a record describing the item purchased.

Dimensional Modeling - Grain Grain: The grain describe the level of detail. It can be also seen as: the unique key of a table of a SQL Statement of the unique identifier of a hierarchy level in a dimension. At the top level, there are two main options in choosing the level of granularity: Unsummarized/Atomic (transaction level granularity): this is the highest level of granularity where each fact table row corresponds to a single transaction or line item Summarized: transactions may be summarized by a subset of dimensions or dimensional attributes. In this case, each row in the fact table corresponds to multiple transactions

Dimensional Modeling - Grain The most granular or atomic data (atomic as an indivisible unit of work) has the most dimensionality. Atomic data is highly dimensional. Preferably, you should develop dimensional models for the most atomic information captured by a Event. Atomic data is the most detailled information collected: such data cannot be subdivided further. Example: If a high grain is the month whereas a low or detail grain can be the day A data warehouse almost always demands data to the lowest possible grain of each dimension not because queries want to see individual low level rows but because queries need to cut through the details in very precise ways. The lower the level of granularity (or conversely, the higher the level of summarization), the less storage space required and the faster queries will be executed.

Additive, Semi-Additive, and Non-Additive Facts The numeric measures in a fact table fall into three categories. The most flexible and useful facts are fully additive; additive measures can be summed across any of the dimensions associated with the fact table. Semi-additive measures can be summed across some dimensions, but not all; balance amounts are common semi-additive facts because they are additive across all dimensions except time. Finally, some measures are completely non-additive, such as ratios. A good approach for non-additive facts is, where possible, to store the fully additive components  of the non-additive measure and sum these components  into the final answer set before calculating the final non-additive fact. This final calculation is often done in the BI layer or OLAP cube.

Slowly Changing Dimensions(SCD) dimensions have been assumed to be independent of time. Unfortunately, this is not the case in the real world. dimension table attributes are relatively static, they are not fixed forever. Dimension attributes change, though rather slowly, over time. Dimensional designers must engage business representatives proactively to help determine the appropriate change-handling strategy.

Clickstream Data Clickstream data is an information trail a user leaves behind while visiting a website. It is typically captured in semi-structured website log files. These website log files contain data elements such as a date and time stamp, the visitor's IP address, the destination URLs of the pages visited, and a user ID that uniquely identifies the website visitor.

Clickstream Data logs

Multivalued Dimensions An open-ended many-valued attribute can be associated with a dimension row by using a bridge table to associate the many-valued attributes with the dimension. Example: In some financial services companies, the individual customer is identified and associated with each transaction. For example, credit card companies often issue unique card numbers to each cardholder. John and Mary Smith may have a joint credit card account, but the numbers on their respective pieces of plastic are unique. In this case there is no need for an account-to-customer bridge table because the atomic transaction facts are at the discrete customer grain. Account and customer would both be foreign keys in this fact table.

Dimension attributes. Dimension tables are integral companions to a fact table. The dimension tables contain the textual descriptors of the business. In a well-designed dimensional model, dimension tables have many columns or attributes. These attributes describe the rows in the dimension table. Dimension attributes serve as the primary source of query constraints, groupings, and report labels. In a query or report request, attributes are identified as the by words. For example, when a user states that he or she wants to see dollar sales by week by brand, week and brand must be available as dimension attributes. Dimension table attributes play a vital role in the data warehouse. Since they are the source of virtually all interesting constraints and report labels, they are key to making the data warehouse usable and understandable. The data warehouse is only as good as the dimension attributes. The power of the data warehouse is directly proportional to the quality and depth of the dimension attributes.