Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC.

Slides:



Advertisements
Similar presentations
Data Warehousing Design Transparencies
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Database Systems: Design, Implementation, and Management Tenth Edition
Dimensional Modeling Business Intelligence Solutions.
Dimensional Modeling CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 From Requirements to Data Models.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 6 Methodology Conceptual Databases Design Transparencies © Pearson Education Limited 1995, 2005.
Dimensional Modeling – Part 2
Manajemen Basis Data Pertemuan 8 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Data Warehousing Design Transparencies
Lecture Fourteen Methodology - Conceptual Database Design
Chapter 4 ENTITY-RELATIONSHIP MODELLING.
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Data Warehousing Dale-Marie Wilson, Ph.D..
Chapter 4 Entity-Relationship modeling Transparencies © Pearson Education Limited 1995, 2005.
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Designing a Data Warehouse
Principles of Dimensional Modeling
CSC271 Database Systems Lecture # 21. Summary: Previous Lecture  Phases of database SDLC  Prototyping (optional)  Implementation  Data conversion.
Methodology Conceptual Databases Design
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Chapter 16 Methodology – Physical Database Design for Relational Databases.
OnLine Analytical Processing (OLAP)
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Ari Cahyono.
Database Systems: Design, Implementation, and Management Tenth Edition
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Data Warehouse design models in higher education courses Patrizia Poščić, Associate Professor Danijela Subotić, Teaching Assistant.
Methodology: Conceptual Databases Design
DATABASE MGMT SYSTEM (BCS 1423) Chapter 5: Methodology – Conceptual Database Design.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Data Warehouses BUAD/American University Data Warehouses.
© Pearson Education Limited, Chapter 9 Logical database design – Step 1 Transparencies.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Dimensional Modeling Primer Chapter 1 Kimball & Ross.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS Instructor Ms. Arwa Binsaleh.
UNIT-II Principles of dimensional modeling
Department of Industrial Engineering Sharif University of Technology Session# 9.
Creating a Data Warehouse Data Acquisition: Extract, Transform, Load Extraction Process of identifying and retrieving a set of data from the operational.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
1 Database Systems Entity Relationship (E-R) Modeling.
June 08, 2011 How to design a DATA WAREHOUSE Linh Nguyen (Elly)
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Data Warehousing DSCI 4103 Dr. Mennecke Chapter 2.
Last Updated : 26th may 2003 Center of Excellence Data Warehousing Introductionto Data Modeling.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Data Warehousing Design DT211/4. Designing Data Warehouses To begin a data warehouse project, we need to find answers for questions such as: – Which user.
TMC2034 Database Concept and Design
Operation Data Analysis Hints and Guidelines
Advanced Applied IT for Business 2
Data Warehouse.
Overview and Fundamentals
Competing on Analytics II
Dimensional Model January 14, 2003
Dr. Awad Khalil Computer Science Department AUC
An Introduction to Data Warehousing
C.U.SHAH COLLEGE OF ENG. & TECH.
Dimensional Modeling.
Introduction of Week 9 Return assignment 5-2
Adding Multiple Logical Table Sources
Retail Sales is used to illustrate a first dimensional model
Methodology Conceptual Databases Design
Dr. Awad Khalil Computer Science Department AUC
Presentation transcript:

Data Warehousing Concepts, by Dr. Khalil 1 Data Warehousing Design Dr. Awad Khalil Computer Science Department AUC

Data Warehousing Concepts, by Dr. Khalil 2 Content  Designing a Data Warehouse Database  Dimensional Modeling  Star Schema  Snowflake Schema  Advantages of Dimensional Modeling  Methodology for Dimensional Modeling

Data Warehousing Concepts, by Dr. Khalil 3 Designing a Data Warehouse Database  Designing a data warehouse database is highly complex.  The database component of a data warehouse is described using a technique called : “A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access”  The database component of a data warehouse is described using a technique called dimensionality modeling : “A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access”  Dimensionality modeling uses the concepts of Entity-Relationship (ER) modeling with some important restrictions.  Every dimensional model (DM) is composed of one table with a composite primary key, called the, and a set of smaller tables called  Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables.  Every dimension table has a simple (non-composite) primary key that corresponds exactly to one of the components of the composite key in the fact table.  This characteristic ‘star-like’ structure is called a or  This characteristic ‘star-like’ structure is called a star schema or star join.

Data Warehousing Concepts, by Dr. Khalil 4 Star Schema  A logical structure that has a fact table containing factual data in the center, surrounded by dimension tables containing reference data (which can be denormalized).  The diagram shows a Star schema for property sales of a Real Estate database.

Data Warehousing Concepts, by Dr. Khalil 5 Other Schema Versions Snowflake Schema  A variant of the star schema where dimension tables do not contain denormalized data. Starflake Schema  A hybrid structure that contains a mixture of star and snowflake schemas.  The diagram shows part of star schema for property sales of a Real Estate database with a normalized version of the Branch dimension table.

Data Warehousing Concepts, by Dr. Khalil 6 Dimensional Model - Advantages  – The consistency of the underlying database structure allows more efficient access to the data by various tools including report writers and query tools.  Efficiency – The consistency of the underlying database structure allows more efficient access to the data by various tools including report writers and query tools.  – The start schema can adapt to changes in the user’s requirements, as all dimensions are equivalent in terms of providing access to the fact table.  Ability to handle changing requirements – The start schema can adapt to changes in the user’s requirements, as all dimensions are equivalent in terms of providing access to the fact table.  – The dimensional model is extensible.  Extensibility – The dimensional model is extensible.  – There are a growing number of standard approaches for handling common modeling situations in the business world.  Ability to model common business situations – There are a growing number of standard approaches for handling common modeling situations in the business world.  – Data warehouse applications that drill down will simply be adding more dimension attributes from within a single star schema.  Predictable query processing – Data warehouse applications that drill down will simply be adding more dimension attributes from within a single star schema.

Data Warehousing Concepts, by Dr. Khalil 7 Database Design Methodology for Data Warehouse   Nine-Step Methodology by Kimball (1996): 1- Choosing the process 2- Choosing the grain 3- Identifying and conforming the dimensions 4- Choosing the facts 5- Storing pre-calculations in the fact table 6- Rounding out the dimension tables 7- Choosing the duration of the database 8- Tracking slowly changing dimensions 9- Deciding the query priorities and the query modes

Data Warehousing Concepts, by Dr. Khalil 8 1- Choosing the process  The process (function) refers to the subject matter of a particular data mart. The best choice for the first data mart tends to be the one that is related to sales.

Data Warehousing Concepts, by Dr. Khalil 9 2- Choosing the grain  Means deciding exactly what a fact table record represents.

Data Warehousing Concepts, by Dr. Khalil Identifying and Conforming the Dimensions  Dimensions set the context for asking questions about the facts in the fact table.  The diagram shows Star schema for property sales and property advertising with Time, PropertyForSale, Branch, and Promotion as conformed (shared) dimension tables.

Data Warehousing Concepts, by Dr. Khalil Choosing the Facts  The grain of the fact table determines which facts can be used in the data mart.  All the facts must be expressed at the level implied by the grain.  The diagram shows how the Lease fact table shown in the previous diagram could be corrected so that the fact table is appropriately structured

Data Warehousing Concepts, by Dr. Khalil Storing Pre-Calculations in the Fact Table  Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre- calculations.  A common example of the need to store pre-calculations occurs when the fact comprise a profit and loss statement.  The diagram shows the fact table with the rentDuration, totalRent, clientAllowance, staffCommission, and totalRevenue attributes. These types of facts are useful because they are additive quantities, from which we can derive valuable information.

Data Warehousing Concepts, by Dr. Khalil Rounding out the Dimension Tables  In this step, we return to the dimension tables and add many text descriptions to the dimensions as possible.  The text descriptions should be as intuitive and understandable to the users as possible.  The usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.

Data Warehousing Concepts, by Dr. Khalil Choosing the Duration of the Database  The duration measures how far back in time the fact table goes.  Very large fact tables raise at least two very significant design issues:  First, it is often increasingly difficult to source increasingly old data.  Second, it is mandatory that the old versions of the important dimensions be used, not the most current versions. This is known as the ‘slowly changing dimension’ problem’.

Data Warehousing Concepts, by Dr. Khalil Tracking Slowly Changing Dimensions  The slowly changing dimension problem means, for example, that the proper description of the old client and the old branch must be used with the old transaction history.  Often, the data warehouse must assign a generalized key to these important dimensions in order to distinguish multiple snapshots of clients and branches over a period of time.  There are three basic types of slowly changing dimensions:  Type 1 – where a changed dimension attribute is overwritten;  Type 2 – where a changed dimension attribute causes a new record to be created;  Type 3 – where a changed dimension attribute causes an alternate attribute to be created so that both the old and the new values of the attribute are simultaneously accessible in the same dimension record.

Data Warehousing Concepts, by Dr. Khalil Deciding the Query Priorities and the Query Modes  In this step we consider physical design issues.  The most critical physical design issues affecting the end-user’s perception of the data mart are the physical sort order of the fact table on disk and the presence of pre-stored summaries or aggregations.  Beyond these issues there are a host of additional physical design issues affecting administration, backup, indexing performance, and security.

Data Warehousing Concepts, by Dr. Khalil 17 Example- Dimensional Model (Fact Constellation) for a Real Estate Data Warehouse  At the end of this methodology, we have a design for a data mart that supports the requirements of a particular Real Estate business is designed for a Real Estate business process and also allows the easy integration with other related data marts to ultimately form the enterprise-wide data warehouse.  We integrate the star schemas for the business processes of the Real Estate company using the conformed dimensions. For example, all the fact tables share the Time and Branch dimensions.  A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables, is referred to as a fact constellation.

Data Warehousing Concepts, by Dr. Khalil 18 Example- Fact and Dimension Tables for each Business Process Business Process Fact Table Dimension Tables Property Sales PropertySale Time, Branch, Staff, PropertyForSale, Owner,ClientBuyer, Promotion Property Rentals Lease Time, Branch, Staff, PropertyForSale, Owner,ClientBuyer, Promotion Property Viewing PropertyViewing Time, Branch, Staff, PropertyForSale, PropertyForRent,ClientBuyer, ClientRenter Property Advertising Advert Time, Branch, Staff, PropertyForSale, PropertyForRent, Promotion, Newspaper Property Maintenance Time, Branch, Staff, PropertyForRent

Data Warehousing Concepts, by Dr. Khalil 19 Thank you