Case Projects in Data Warehousing and Data Mining Mohammad A. Rob & Michael E. Ellis University of Houston-Clear Lake Houston, Texas 77058.

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Warehousing M R BRAHMAM.
Chapter 3 Database Management
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
How Business Intelligence Software Works and a Brief Overview of Leading Products Jai Windsor MIS 5973 December 8, 2005.
DATA WAREHOUSE (Muscat, Oman).
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Data Conversion to a Data warehouse Presented By Sanjay Gunasekaran.
Chetan Bhirud Raza Mohammad Abinash Sahoo Online Marketing Giant.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Decision Support Chapter 23.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
DATA WAREHOUSING IN SQL SERVER 2005/2008 BUSINESS INTELLIGENCE.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
Data Warehouse & Data Mining
1 Brett Hanes 30 March 2007 Data Warehousing & Business Intelligence 30 March 2007 Brett Hanes.
Introduction to the Orion Star Data
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
Presented By: Muhammad Rizvi Raghuram Vempali Surekha Vemuri.
Data Warehouse and Business Intelligence Dr. Minder Chen Fall 2009.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
MIS2502: Data Analytics The Information Architecture of an Organization.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Data Warehousing. Databases support: Transaction Processing Systems –operational level decision –recording of transactions Decision Support Systems –tactical.
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Warehouse. Group 5 Kacie Johnson Summer Bird Washington Farver Jonathan Wright Mike Muchane.
DATABASES AND DATA WAREHOUSES
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Pooja Sharma Shanti Ragathi Vaishnavi Kasala. BUSINESS BACKGROUND Lowe's started as a single hardware store in North Carolina in 1946 and since then has.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Advanced Database Concepts
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Houston E-Retailers Presented BY: Bala AnuDeep Guduri (LEAD)
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
1 Copyright © 2009, Oracle. All rights reserved. Oracle Business Intelligence Enterprise Edition: Overview.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
Building the Corporate Data Warehouse Pindaro Demertzoglou Data Resource Management.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Data Warehouse Project Business Definition Presented by: Mike Ellis Vinh Ngo.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Business Intelligence Overview
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Data warehouse and OLAP
Fundamentals & Ethics of Information Systems IS 201
Data Warehouse.
Data Warehouse Project Star Schema and Physical Design
Data Warehouse and OLAP
University of Houston-Clear Lake Kaiser Permanente San Jose
Unidad II Data Warehousing Interview Questions
Introduction of Week 9 Return assignment 5-2
Data Warehousing Concepts
Data Warehouse and OLAP
Data Warehousing & DATA MINING (SE-409) Lecture-1 Introduction and Background Huma Ayub Software Engineering department University of Engineering and Technology,
Data Warehousing.
Presentation transcript:

Case Projects in Data Warehousing and Data Mining Mohammad A. Rob & Michael E. Ellis University of Houston-Clear Lake Houston, Texas 77058

Data warehousing and data mining became a significant part of many organization’s IT infrastructure The historical data in the warehouse play an important role in providing Business Intelligence It helps companies to streamline workflows, provide better customer services, and target market their products and services Introduction

Software development companies are developing new tools and technologies for data warehouse engines: To provide data transfer services from traditional TPS to data warehouses To perform data analysis for business intelligence To generate reports and ah hoc queries & execute data mining algorithms Introduction

Microsoft, Oracle and IBM have significantly modified their databases such as SQL Server, Oracle and DB2 to accommodate data warehouse tools Companies better known for their BI tools are SAP, SAS, PeopleSoft, Crystal Reports, Business Objects, Teradata, Informatica, DataMirror, Hyperion, Cognos, & etc. Introduction

Demands for data warehousing and data mining jobs are increasing in retail, pharmaceutical, insurance, airlines, etc. Universities are not lagging behind – they are implementing DW tools to track enrollment & improve student services However, faculty members in Computing and MIS are lagging in teaching and training students in data warehousing, data mining, and BI tools Introduction

No good textbook on data warehousing or data mining is available – mostly research-oriented or tool-specific We have developed a graduate course that provides students with both the theoretical knowledge as well as hands- on practice with data warehousing tools and techniques Introduction

The purpose of the course is to: Acquaint students with fundamental knowledge of data warehouse modeling and design The tools and techniques of data analysis using OLAP (Online Analytical Processing) techniques To acquaint students with data mining concepts, techniques and query language To prepare students for future careers in data warehouse planning, analysis, design, and implementation Course Objective

Specifically, the course provides an understanding of the concepts and knowledge in the following key areas: The difference between a database, a data mart, and a data warehouse operational versus decision support systems the difference between Online Transactional Processing (OLTP) and Online Analytical Processing (OLAP) The data warehouse architecture and environment Course Objective

Specific concepts ….. The Dimensional Model versus Entity- Relationship Model Data warehouse modeling with Star- and Snow-flake schema. The concepts of cube and OLAP processing through roll-up, drill-down, slicing, dicing, and pivot table The ETL process – the concept of data extraction, transformation and loading The concept of data mining techniques such as classification, clustering, estimation and prediction. Course Objective

Application of concepts: Designing and developing a data warehouse using Microsoft SQL Server Analysis Services Development of Data Cubes using SQL Server Analysis Services Application of ETL, OLAP, and data mining tools on cubes using SQL Server Analysis Services The application of knowledge is achieved through a Group Project on a Case problem – specific to an industry Course Objective

Students are required to work in groups to model, design, and develop a data warehouse Each group needs to choose a business industry such as airline, education, retail, financial, insurance, hospitality, investment, and healthcare. They will then collect day-to-day business data found in files, databases, spreadsheets, or text documents in various data formats. Next they will develop a strategy to convert these data into a common format as well as in summarized form. Developing the Case Project

They will then design and develop a multi- dimensional data warehouse Then transfer the data to the warehouse Using the data, they will then develop a multidimensional cube using the SQL Server Data Analysis Services Finally, students will apply OLAP and data mining tools to extract meaningful business intelligence on customers, products, purchases, and so on. Further details of the project requirements are outlined in the course web site. Developing the Case Project …..

It is a Heating, Ventilation, Air Conditioning and Refrigeration (HVAC/R) wholesale company The 50-year-old company has about 260 employees divided into its 215,000 square foot distribution center and 30 branch locations around the state of Texas The company sells equipment, parts, and supplies to licensed contractors who perform work on heating, air conditioning, and refrigeration systems It sells approximately 16,000 line items that come from 2,000 stocking manufacturers All items are invoiced at the branch locations These sales are processed by an online transaction processing (OLTP) system The Case of a HVAC/R Wholesale Company

A simplified representation of the existing OLTP system The Case of a HVAC/R Wholesale Company

Why a Data Warehouse? The OLTP system is adequate for the day-to- day operations of the company But the it is painfully inadequate when it comes to strategic decision support Information requests from the management must be dealt with individually by the IT staff Data aggregations are programmed into reports, but any comparisons across time or products must be done manually Data history in the OLTP system typically goes back 2 years, even though the company has been generating computerized data for over 20 years The Case of a HVAC/R Wholesale Company

Why a Data Warehouse? It will provide a central repository for historical data, provide an integrated platform for historical analysis of sales data It will allow the application of online analytical processing (OLAP) techniques by users themselves With a data warehouse and OLAP, users will perform their own roll-up and drill-down operations to analyze sales across product categories, sub-categories, store regions, individual stores, or any combination desired They will have the flexibility to view data and immediately look at data in another form without sending a request to IT for a new report They will enjoy a true user-friendly decision support system The Case of a HVAC/R Wholesale Company

Dimensional Modeling: To create a data warehouse system we must determine how we are going to extract meaningful data and logically group the data. First step is to prepare an information package, that allows the data warehouse’s designers to layout the requirements for the dimension tables, their hierarchies, and the facts to be modeled Designing the Data Warehouse

Dimension Tables : The information package is then used to create the dimension tables Designing the Data Warehouse

The Fact Table : The fact table used for this project was based on sales information. It contains ten thousand randomly generated sales records for 500 dates beginning on January 1, 2005 Designing the Data Warehouse

Data Hierarchies: All 16,000 products are grouped into categories and each category if further divided into several subcategories This allows related items to be grouped and summarized for high level analysis while retaining the ability to drill down to more specific product detail Designing the Data Warehouse

Data Hierarchies……. Customers are organized into three hierarchies: size, license type and category; and each has further hierarchies The A/C sales are high during the summer, but heating sales are high in winter, thus Time dimension is categorized in two categories Designing the Data Warehouse CustomerTime SizeLicense TypeCategoryCalendarSeasonal Small (2-3 techs) Medium (4-10 techs) Large (11-20 techs) Corporate (21+ techs) “A” license (HVAC and refrigeration) “B” license (HVAC only) HVAC Builder Government Refrigeration Maintenance Year Quarter Month Day Year Season Month Day of the week

Database Schema & Implementation Since the dimension tables were not normalized and the size of these dimensions was not too large, the STAR schema was implemented - first in Microsoft Access database Developing the Data Warehouse

Dimension Implementation Customer dimension table shown along with some actual data implemented in Microsoft Access Developing the Data Warehouse

Dimension Implementation …… The Product Dimension with categories and sub-categories Developing the Data Warehouse

Dimension Implementation …… The Store Dimension with various categories Developing the Data Warehouse

Dimension Implementation …… The Time Dimension with hierarchies Developing the Data Warehouse

Fact Implementation The SalesFact table is implemented according to the design and sales data were populated according to the discussion before. All four fields of the primary key are shown. Note that TimeID values begin with 3 instead of 1. TimeID = 1 is 1/1/2005, a Saturday, and TimeID = 2 is 1/2/2005, a Sunday. All stores were closed on weekends and there were no sales. Developing the Data Warehouse

The dimension and the fact tables in Access database are used to develop a cube using SQL Server Analysis Services Cube Implementation in SQL Server

An example of an OLAP operation on the cube, displaying facts for customers in various categories, which can be further viewed in detail sub-categories OLAP Analysis in SQL Server

View measures by customer category for a specific product OLAP Analysis in SQL Server

View measures by License Type within Branch & Region (slicing & Dicing) OLAP Analysis in SQL Server

We have discussed the process of development of a graduate course on Data Warehousing and Data Mining that: Acquaints students with fundamental knowledge of data warehouse modeling and design Provides concepts of cube and OLAP processing through roll-up, drill-down, slicing, dicing, and pivot table Provides hands-on practice through a Case Problem on the development of a data warehouse, data cube, and OLAP processing using Microsoft SQL Server Analysis Services Conclusion

Data Warehousing Fundamentals - by Paulraj Ponniah, Wiley Publications Microsoft SQL Server 2000 Analysis Services Step by Step - by Reed Jacobson, Microsoft Publications Other reference books used for the course are found in the course web site: Resources for the Course