IMS 6217: Data Warehousing / Business Intelligence 1 Dr. Lawrence West, Management Dept., University of Central Florida Database Performance.

Slides:



Advertisements
Similar presentations
Author: Graeme C. Simsion and Graham C. Witt Chapter 11 Logical Database Design.
Advertisements

Dimensional Modeling.
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.
Data Warehousing Willem Visser RW334. Somebody is watching! Everybody seems to be recording your every move Loyalty cards Cookies – Facebook, Twitter,…
Data Warehousing CPS216 Notes 13 Shivnath Babu. 2 Warehousing l Growing industry: $8 billion way back in 1998 l Range from desktop to huge: u Walmart:
Database – Part 3 Dr. V.T. Raja Oregon State University External References/Sources: Data Warehousing – Mr. Sakthi Angappamudali.
Topic Denormalisation S McKeever Advanced Databases 1.
Data Warehouse IMS5024 – presented by Eder Tsang.
Chapter 3 Database Management
Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
13 Chapter 13 The Data Warehouse Hachim Haddouti.
Chapter 13 The Data Warehouse
Data Warehousing DSCI 4103 Dr. Mennecke Introduction and Chapter 1.
Designing a Data Warehouse
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
DATA WAREHOUSING IN SQL SERVER 2005/2008 BUSINESS INTELLIGENCE.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Data Warehouse & Data Mining
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
DIMENSIONAL MODELLING. Overview Clearly understand how the requirements definition determines data design Introduce dimensional modeling and contrast.
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support.
Module 1: Introduction to Data Warehousing and OLAP
IMS 4212: Data Warehousing / Business Intelligence 1 Dr. Lawrence West, Management Dept., University of Central Florida Data Warehousing.
IMS 4212: Data Modeling—Attributes 1 Dr. Lawrence West, Management Dept., University of Central Florida Attributes and Domains Nonkey.
BI Terminologies.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
CISB113 Fundamentals of Information Systems Data Management.
UNIT-II Principles of dimensional modeling
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Chapter 4 Logical & Physical Database Design
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
IMS 6217: Relational Data Model 1 Dr. Lawrence West, MIS Dept., University of Central Florida Introduction to Databases—Topics Information.
IMS 4212: Intro to Multi-Table SELECT Statements 1 Dr. Lawrence West, MIS Dept., University of Central Florida Multi-Table SELECT Statements—Topics.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 9: DATA WAREHOUSING.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Jaclyn Hansberry MIS2502: Data Analytics The Things You Can Do With Data The Information Architecture of an Organization Jaclyn.
BTM 382 Database Management Chapter 13: Business intelligence and data warehousing Chapter 14-4: Data analytics Chitu Okoli Associate Professor in Business.
Data Warehousing/Loading the DW—Topics
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Competing on Analytics II
MIS2502: Data Analytics The Information Architecture of an Organization Acknowledgement: David Schuff.
Dimensional Modeling.
Introduction of Week 9 Return assignment 5-2
Data Warehousing Concepts
Chapter 3 Database Management
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Database Performance Part 1—Topics
Data Warehousing/Loading the DW—Topics
Presentation transcript:

IMS 6217: Data Warehousing / Business Intelligence 1 Dr. Lawrence West, Management Dept., University of Central Florida Database Performance Part 1—Topics Doing vs. Deciding—OLTP vs. OLAP Data Warehouses –Fact tables, Dimension tables, Granularity –DW in an integrated Business Intelligence system Design Steps Designing Fact Tables Designing Dimension Tables –The Time dimension Fact Table Exercises The AdventureWorks DW

IMS 6217: Data Warehousing / Business Intelligence 2 Dr. Lawrence West, Management Dept., University of Central Florida "With uncertainty present…" With the introduction of uncertainty—the fact of ignorance and necessity of acting upon opinion rather than knowledge—into this Eden-like situation, its character is completely changed. With uncertainty absent, man's energies are devoted altogether to doing things; it is doubtful whether intelligence itself would exist in such a situation; in a world so built that perfect knowledge was theoretically possible, it seems likely that all organic readjustments would become mechanical, all organisms automata. With uncertainty present, doing things, the actual execution of activity, becomes in a real sense a secondary part of life; the primary problem or function is deciding what to do and how to do it. The two most important characteristics of social organization brought about by the fact of uncertainty have already been noticed. In the first place, goods are produced for a market, on the basis of an entirely impersonal prediction of wants, not for the satisfaction of the wants of the producers themselves. The producer takes the responsibility of forecasting the consumers' wants. In the second place, the work of forecasting and at the same time a large part of the technological direction and control of production are still further concentrated upon a very narrow class of the producers, and we meet with a new economic functionary, the entrepreneur. Frank H. Knight University of Chicago 1921

IMS 6217: Data Warehousing / Business Intelligence 3 Dr. Lawrence West, Management Dept., University of Central Florida Doing vs. Deciding Organizations do many things –List thirty transactions that your project organization executes or does –Start with the Top-Ten list from Projects 2 & 3 Managers decide things –List thirty decisions that your project organization makes –Identify where in the organizational hierarchy the decision lies –What is the consequence/importance of the decision? –What information influences each decision?

IMS 6217: Data Warehousing / Business Intelligence 4 Dr. Lawrence West, Management Dept., University of Central Florida Doing vs. Deciding / OLTP vs OLAP Are systems designed to support the execution of events suitable for the making of decisions? Event/transaction support requires –High throughput –High reliability –Accuracy –DB structures tuned for storage & performance Online Transaction Processing (OLTP) systems support events –Provide data or information to support transactions –Record acts → New data

IMS 6217: Data Warehousing / Business Intelligence 5 Dr. Lawrence West, Management Dept., University of Central Florida OLTP vs. OLAP—Let Me Count the Ways… Online Analytical Processing (OLAP) or Business Intelligence (BI) systems are oriented at decision making and analysis What are the problems with using our OLTP databases to support managerial decision making? ?

IMS 6217: Data Warehousing / Business Intelligence 6 Dr. Lawrence West, Management Dept., University of Central Florida The Data Warehouse The DW is a separate storage structure Designed to optimize query execution –Not storage efficiency –Not transaction throughput Expected to be loaded during down times Supports "readability" May sacrifice details for summaries Data and structures anticipate user needs –Recurring decisions –Flexible exploration

IMS 6217: Data Warehousing / Business Intelligence 7 Dr. Lawrence West, Management Dept., University of Central Florida Steps and Components Source Systems—provide raw data to the DW Integration Services—Provide transformation and loading services from source data to DW Data Warehouse—Customized data store for Business Intelligence Analysis Services—Tools for data mining and reporting Reporting Services—Our old friend acting on an enhanced data store

IMS 6217: Data Warehousing / Business Intelligence 8 Dr. Lawrence West, Management Dept., University of Central Florida Our Approach This Week –Discuss DW storage strategies –Discuss data to be stored Internal data from OLTP systems External data –Design exercises Next Week –DW loading strategies –DW tools—Analysis Services

IMS 6217: Data Warehousing / Business Intelligence 9 Dr. Lawrence West, Management Dept., University of Central Florida Storage Strategies The DW stores transformed data that –May be accessed directly to support analysis –Supports actions of the Analysis Services to provide enhanced and efficient analysis Multiple Strategies We will look at the widely used approach using –Fact tables, –Dimension tables, –Arranged in a Star Schema or Snowflake Schema (or both)

IMS 6217: Data Warehousing / Business Intelligence 10 Dr. Lawrence West, Management Dept., University of Central Florida Fact Tables Contain Facts (duhhhh) of Interest No PK designated for fact table Natural PK is TimeKeyOrdered, ProductKey, CustomerKey –This defines the granularity of the data CategoryKey FD on ProductKey SalesTerrKey, SalesRepKey FD on CustomerKey UnitsSold, TotalDiscounts –Summed from source data –Additive SalesPrice is not additive ValueSold is derivable and additive

IMS 6217: Data Warehousing / Business Intelligence 11 Dr. Lawrence West, Management Dept., University of Central Florida Star Schema & Dimension Tables Dimension Tables represent concepts (entities) used to group data in the fact tables Also contain descriptive attributes of the entity represented by the dimension table Simplest way for nontechnical users to picture the data Relate to FKs in the fact tables

IMS 6217: Data Warehousing / Business Intelligence 12 Dr. Lawrence West, Management Dept., University of Central Florida Snowflake Schema & Dimension Tables Fewer direct links from dimension tables to fact table Dimension tables relate to each other Natural hierarchical relationships in data are preserved –Implications for drilldown reports Increases complexity of data retrieval for nontechnical users

IMS 6217: Data Warehousing / Business Intelligence 13 Dr. Lawrence West, Management Dept., University of Central Florida Granularity The granularity of the fact tables is a critical There are alternative levels of granularity –Finer granularity → more detail, more records Use SalesDate instead of Month –Coaser granularity → less detail, fewer records Use SalesMonth instead of SalesDate Finer granularity can be aggregated in the DW to find the coarser granularity values Coarse granularity cannot be decomposed Granularity decisions are made for each of the FKs from the dimension tables

IMS 6217: Data Warehousing / Business Intelligence 14 Dr. Lawrence West, Management Dept., University of Central Florida Design Steps It is impractical to design a one-source DW as the first deliverable Identify initial scope of DW –Problem Statement –Business Requirements Build DW Data Model –Business Processes to address requirements –Level of Detail –Fact Tables (what we are measuring) –Dimension Tables (how we look at the data)

IMS 6217: Data Warehousing / Business Intelligence 15 Dr. Lawrence West, Management Dept., University of Central Florida Design Steps (cont.) Design Integration Services Design Analysis Services Design Reports Deploy and Manage DW Add additional business requirements –Repeat process for new requirements –Add additional dimensions to the DW

IMS 6217: Data Warehousing / Business Intelligence 16 Dr. Lawrence West, Management Dept., University of Central Florida Fact Tables (Part 2) Identifying Fact Tables and their facts is an art No obvious mapping from OLTP tables to Fact or Dimension Tables The same DB table can contribute to multiple fact tables Requires analysis to discover central concepts that will become fact tables –Decision maker interviews –Reporting requirements

IMS 6217: Data Warehousing / Business Intelligence 17 Dr. Lawrence West, Management Dept., University of Central Florida Fact Tables (Part 2—cont.) Look for a logical concept or event which measures of interest are about –A sale (invoice) –An order (purchase order) –An enrollment (college DB) The concept/event should support the requirements The event is likely to be based on an OLTP table –Not every OLTP table will become a fact table This concept/event will form the foundation for a fact table

IMS 6217: Data Warehousing / Business Intelligence 18 Dr. Lawrence West, Management Dept., University of Central Florida Fact Tables--Measures Measures are the facts to be recorded for each row in the fact table Measures are often additive –UnitsSold, TotalDiscounts, ValueSold Some are not additive –SalesPrice Sometimes nonadditive measures are transformed into additive measures –ValueSold = (UnitsSold * SalesPrice) - TotalDiscounts

IMS 6217: Data Warehousing / Business Intelligence 19 Dr. Lawrence West, Management Dept., University of Central Florida Fact Tables—Measures (cont.) Measures may come from several sources—often not just values from a single OLTP source table Other candidates in our example –COGS –CurrentInterestRate– CompetitorPrice –GrossMargin– NetMargin –ShippingCost– ShippingWeight

IMS 6217: Data Warehousing / Business Intelligence 20 Dr. Lawrence West, Management Dept., University of Central Florida Fact Tables--Dimensions Dimensions are ways of looking at the data –Users may indicate they look at {fact table subject} "by" {dimension name} –Sales by week –Sales by customer –Sales by product category Dimensions lead us to Dimension Tables –Descriptive attributes about the dimension –Foreign key to the fact table

IMS 6217: Data Warehousing / Business Intelligence 21 Dr. Lawrence West, Management Dept., University of Central Florida Dimension Tables Dimension tables are often based on an OLTP entity Denormalized to include descriptive attributes from other tables –Product might include SupplierName CategoryName SubCategoryName SupplierCountry In Snowflake dimension tables related hierarchical information may be retained in the hierarchical tables

IMS 6217: Data Warehousing / Business Intelligence 22 Dr. Lawrence West, Management Dept., University of Central Florida Dimension Tables—Primary Keys Dimension tables should always be given an artificial identity PK— even if there is a suitable OLTP table PK If tables are ever loaded from multiple sources the natural PK may become invalid –E.g., merging sales data from two business units with different databases Retain the business PK as an attribute in the dimension table Possibly include source system identifier for the row

IMS 6217: Data Warehousing / Business Intelligence 23 Dr. Lawrence West, Management Dept., University of Central Florida Dimension Tables—Time Time is a hugely common "by" dimension Decide on time granularity –Daily, Weekly, Hourly? You might consider two time dimensions –Daily for grossest categorization –Hour for additional precision

IMS 6217: Data Warehousing / Business Intelligence 24 Dr. Lawrence West, Management Dept., University of Central Florida Dimensions—Time (cont.) The time dimension table maps from the measured time attribute associated with the fact table record to various labels and aggregations associated with that value Facilitates summarizing by various aggregates with a single time dimension measure TimeKey PK is often a datetime data type to the date level of precision

IMS 6217: Data Warehousing / Business Intelligence 25 Dr. Lawrence West, Management Dept., University of Central Florida Fact Tables--Granularity In the olden days granularity decisions were made at the DW DB design stage Granularity decisions traded off –Number of records and computational overhead associated with more detailed granularity –Lack of precision with coarser granularity Modern computational power supports finer granularity Analysis services provides support for fast computation over large data sets Just don't go overboard

IMS 6217: Data Warehousing / Business Intelligence 26 Dr. Lawrence West, Management Dept., University of Central Florida Fact Table Exercise #1 Are there any fact tables beyond the one illustrated on Slide 11 for the NorthWind DB? Are there additional facts that you might add to this table? Are there additional dimension tables you might add?

IMS 6217: Data Warehousing / Business Intelligence 27 Dr. Lawrence West, Management Dept., University of Central Florida Fact Table Exercise #2 Expand entities around the core of our University ERD –See next slide Consider two business goals –Understand real credit hour revenue –Understand classroom utilization Identify and design Fact and Dimension Tables

IMS 6217: Data Warehousing / Business Intelligence 28 Dr. Lawrence West, Management Dept., University of Central Florida Fact Table Exercise #2 (Cont.)

IMS 6217: Data Warehousing / Business Intelligence 29 Dr. Lawrence West, Management Dept., University of Central Florida External Data What external data might you want to have in a sales- oriented DW?

IMS 6217: Data Warehousing / Business Intelligence 30 Dr. Lawrence West, Management Dept., University of Central Florida Next Time Transformations to load the DW from the source OLTP (and other) data sources –Automated support –Do it yourself Analysis Services—putting our DW to work