University of Houston-Clear Lake Kaiser Permanente San Jose

Slides:



Advertisements
Similar presentations
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Advertisements

Case Projects in Data Warehousing and Data Mining Mohammad A. Rob & Michael E. Ellis University of Houston-Clear Lake Houston, Texas
Technical BI Project Lifecycle
OLAP Services Business Intelligence Solutions. Agenda Definition of OLAP Types of OLAP Definition of Cube Definition of DMR Differences between Cube and.
Data Warehousing - 2 ISYS 650. Data Warehouse Design - Star Schema - Dimension tables – contain descriptions about the subjects of the business such as.
Decision Support and Data Warehouse. Decision supports Systems Components Data management function –Data warehouse Model management function –Analytical.
Online Analytical Processing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Business Intelligence. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
Building a Data Warehouse with SQL Server Presented by John Sterrett.
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
Chetan Bhirud Raza Mohammad Abinash Sahoo Online Marketing Giant.
ISQS 3358, Business Intelligence Creating Data Marts Zhangxi Lin Texas Tech University 1.
DATA WAREHOUSING IN SQL SERVER 2005/2008 BUSINESS INTELLIGENCE.
IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.
IMS 6217: Data Warehousing / Business Intelligence Part 3 1 Dr. Lawrence West, Management Dept., University of Central Florida Analysis.
Datawarehouse & Datamart OLAPs vs. OLTPs Dimensional Modeling Creating Physical Design Using SQL Mgt. Studio Module II: Designing Datamarts 1.
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
Chapter 6 SAS ® OLAP Cube Studio. Section 6.1 SAS OLAP Cube Studio Architecture.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehousing.
BI Terminologies.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
CS 157B: Database Management Systems II April 3 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
UNIT-II Principles of dimensional modeling
CMPE 226 Database Systems October 21 Class Meeting Department of Computer Engineering San Jose State University Fall 2015 Instructor: Ron Mak
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
CSE 5331/7331 F'071 CSE 5331/7331 Fall 2007 Dimensional Modeling Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University.
Data Warehousing.
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Houston E-Retailers Presented BY: Bala AnuDeep Guduri (LEAD)
CS 157B: Database Management Systems II April 10 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
MyFloridaMarketPlace Analysis 2.0 Functional Overview.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
CMPE 226 Database Systems April 12 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
CHAPTER 9 - Data Warehouse Implementation and Use
SQL Server Analysis Services Fundamentals
Visual Basic 2010 How to Program
BTM 382 Database Management Chapter 13: Business intelligence and data warehousing Chapter 14-4: Data analytics Chitu Okoli Associate Professor in Business.
Reporting and Analysis With Microsoft Office
Module III: Business Analytics
Data Warehousing CIS 4301 Lecture Notes 4/20/2006.
Chapter 13 Business Intelligence and Data Warehouses
On-Line Analytic Processing
Data Warehouses Brief Overview Add ETL Copyright © 2011 Curt Hill.
Fundamentals & Ethics of Information Systems IS 201
Chapter 13 The Data Warehouse
What is OLAP OLAP allows to model data in a multidimensional way like a data cube in order to look for the data from many perspectives.
Data storage is growing Future Prediction through historical data
3. Data storage and data structures in Warehouses
Summarized from various resources Modern Database Management
Data Warehouse.
Databases & Data Warehouses
Competing on Analytics II
Tutorial 8 Objectives Continue presenting methods to import data into Access, export data from Access, link applications with data stored in Access, and.
CMPE 226 Database Systems April 11 Class Meeting
SQL Server Analysis Services Fundamentals
Implementing Data Models & Reports with Microsoft SQL Server
Data Warehouse and OLAP
Enhance BI Applications and Simplify Development
MIS2502: Data Analytics Dimensional Data Modeling
DataMart (Data Warehouse) Tool:
Introduction of Week 9 Return assignment 5-2
Chapter 13 The Data Warehouse
Analysis Services Analysis Services vs. the Data Warehouse vs. OLTP DB
Data Warehouse and OLAP
Presentation transcript:

University of Houston-Clear Lake Kaiser Permanente San Jose Trend in the Leading Causes of Death in the USA: A Case Study using a Data Warehouse and OLAP Cube Mohammad A. Rob University of Houston-Clear Lake rob@uhcl.edu Farhana Rob Kaiser Permanente San Jose Farhana.rob@kp.org

Presentation Outline introduction The Raw Data Why Data Warehousing? Designing the Data Warehouse Designing the OLAP Cube OLAP Reports Conclusion

INTRODUCTION This paper presents how a large amount unstructured mortality data can be organized into a data warehouse and then using an OLAP cube key information can be presented. The OLAP reports show the shifting trend in the leading causes of deaths in the USA. The Reports present the top six causes of deaths by Location, Time, Age Group, and race. Knowing these leading causes of deaths will help general public to take preventive actions.

The Raw Data Center for Disease Control (CDC) Publishes Data for various causes of deaths in the USA. https://www.healthdata.gov/dataset/communi ty-health-status-indicators-chsi-combat- obesity-heart-disease-and-cancer It includes data for 3141 US Counties of All States, for many years, with various ages and races. However, these data are not organized to make any conclusion by the state, disease, race, year, or age group.

The Raw Data Source

The Raw Data Data can be downloaded in CSV Format for a selected number of years, which can be opened in Microsoft Excel. There are 13 data files with thousands of records and about hundred attribute values. We have downloaded data for three years (2012-1014) to show our proof of concept.

sample Raw Data in Excel

Why Data Warehousing? Data warehouse allows a significant large amount of data to be stored in a particular format so that users can query the data in a variety of ways to obtain business intelligence. Typically a dimensional model or star schema is used to design the data warehouse that simplifies query processing of a large amount of data. Online Analytical Processing (OLAP) tool can be used to present the data, that provides an interactive interface to top management to create Reports on an ad hoc basis.

Why Data Warehousing? OLAP CUBE The Concept of the Data warehouse as the Back end and the OLAP Cube as the Front end. DATA WAREHOUSE

Designing the Data Warehouse Before designing the data warehouse, Raw data needed to be cleaned and formatted to fit into dimensions and Facts. From our raw data, we needed to filter and drop many columns so that we could focus on important dimensions like the cause of death, time, location, race, and age group. The fact is the Number of Deaths. The Dimensions and Facts are organized into separate Excel sheets with a Primary Key (PK) for each Dimension and Foreign Keys (FKs) in the fact sheet. All Excel data are then transferred into a Microsoft Access database.

Designing the Data Warehouse From the Access Database, data were then transferred to a Microsoft SQL Server data Warehouse. The Dimensional Hierarchies allow browsing summarized data in various levels of details: The Location Dimension has a hierarchy like: County -> State The Time Dimension has a hierarchy like: Month -> Quarter -> Year.

Designing the Data Warehouse Dimensions and Hierarchies

Designing the Data Warehouse The Fact Table contains the Foreign Keys from the Dimension Tables and the Measures (Number of Deaths and anything measurable like population in our case)

Designing the Data Warehouse The Dimensional Model or STAR Schema of our Project

Designing the olap Cube The OLAP Cube was created using Microsoft Visual Studio Business Intelligence Tool in Conjunction with SQL Server Analysis Services. It allows data from the data warehouse to be summarized in a variety of ways. CUBE allows browsing summarized data like slicing, dicing, roll-up, drill-down and pivoting. Data from the CUBE is exported in a Microsoft Excel Pivot Table for Analysis and Reporting.

A View of the olap Cube

Olap Reports Overall Summary Report: Total Number of Death by Year: Somewhat more in 2014

Olap Reports Number of Death by Year by top six Causes: Heart disease and Cancer are leading causes followed by injuries

Olap Reports Number of Deaths due to Cancer for All States in three years: Texas is leading followed by Georgia, Virginia…

Olap Reports Further Drill-Down on Texas Counties: Number of Deaths by Cancer for All Counties in the state of Texas

Olap Reports Drill-Down to Time Dimension: Number of Deaths due to Injuries in various Quarters for the Year 2012

Olap Reports Drill-Down to Age Group Dimension: Number of Deaths due to various causes for different age groups: Heart Disease and cancer are leading causes followed by injuries.

Olap Reports Drill-Down to Race Dimension: Number of Deaths due to various causes for various Race groups: Again Heart Disease and Cancer are the leading causes of death for each Race followed by Injuries.

conclusion we have discussed how data warehouse can be used to store a large amount of data in a suitable format after going through a cleaning and formatting process of Data. We have also discussed how an olap cube can be created from the data warehouse to display various ad hoc reports in various details. In our example problem, we have shown summarized reports in a variety of ways using various dimensional attributes.

conclusion It is found that the number of deaths is increasing in each year. Heart disease in the #1 Cause of Death amounting to about 28% of the total deaths. About 21% of the deaths are due to cancer. Among the younger age groups (<35 years), major cause of death is injuries. Among the older age groups (>40 years), major cause of death is due to heart disease and cancer.

Thank You