CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by – Prof. Meiliu lu.

Slides:



Advertisements
Similar presentations
Business Intelligence Simon Pease. Experience with BI Developing end-to-end BI prototype for Plan International Developing end-to-end BI prototype for.
Advertisements

Jose Chinchilla MCITP: Database Administrator, SQL Server 2008 MCITP: Business Intelligence Design and Implementation, SQL Server 2008 President & CEO,
OLAP Tuning. Outline OLAP 101 – Data warehouse architecture – ROLAP, MOLAP and HOLAP Data Cube – Star Schema and operations – The CUBE operator – Tuning.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Case Projects in Data Warehousing and Data Mining Mohammad A. Rob & Michael E. Ellis University of Houston-Clear Lake Houston, Texas
Data Warehousing M R BRAHMAM.
Chapter 13 Business Intelligence and Data Warehouses
Decision Support and Data Warehouse. Decision supports Systems Components Data management function –Data warehouse Model management function –Analytical.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
Business Intelligence. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views.
COMP 578 Data Warehousing And OLAP Technology Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University.
Lab3 CPIT 440 Data Mining and Warehouse.
Data Warehousing. On-Line Analytical Processing (OLAP) Tools The use of a set of graphical tools that provides users with multidimensional views of their.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Business Intelligence components Introduction. Microsoft® SQL Server™ 2005 is a complete business intelligence (BI) platform that provides the features,
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CS346: Advanced Databases
Online Analytical Processing (OLAP) Hweichao Lu CS157B-02 Spring 2007.
© 2014 Zvi M. Kedem 1 Unit 11 Online Analytical Processing (OLAP) Basic Concepts.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Understanding Data Analytics and Data Mining Introduction.
DATA WAREHOUSING IN SQL SERVER 2005/2008 BUSINESS INTELLIGENCE.
On-Line Analytic Processing Chetan Meshram Class Id:221.
DW-1: Introduction to Data Warehousing. Overview What is Database What Is Data Warehousing Data Marts and Data Warehouses The Data Warehousing Process.
Vidas Matelis, Toronto SQL Server User Group November 13, 2008.
Data Mining Applied to Document Imaging Jeff Rekoske.
Cube Intro. Decision Making Effective decision making Goal: Choice that moves an organization closer to an agreed-on set of goals in a timely manner Goal:
Microsoft Business Intelligence Environment Overview.
1 Data Warehouses BUAD/American University Data Warehouses.
OLAP & DSS SUPPORT IN DATA WAREHOUSE By - Pooja Sinha Kaushalya Bakde.
Data Warehousing.
Module 1: Introduction to Data Warehousing and OLAP
BI Terminologies.
Copyright © 2004 Pearson Education, Inc.. Chapter 28 Overview of Data Warehousing and OLAP.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Dimensional Modeling Primer Chapter 1 Kimball & Ross.
Next Back MAP 3-1 Management Information Systems for the Information Age Copyright 2002 The McGraw-Hill Companies, Inc. All rights reserved Chapter 3 Data.
CS 157B: Database Management Systems II April 3 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak.
DATABASES AND DATA WAREHOUSES
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
1 On-Line Analytic Processing Warehousing Data Cubes.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
Data Warehousing.
Advanced Database Concepts
The Data Warehouse Chapter Operational Databases = transactional database  designed to process individual transaction quickly and efficiently.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 6 The Data Warehouse Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration.
SQL Server Analysis Services Understanding Unified Dimension Model (UDM)
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
To SSAS or not to SSAS, that is the question Ayman Senior PFE - Microsoft.
An Overview of Data Warehousing and OLAP Technology
Pindaro Demertzoglou Data Resource Management – MGMT 4170 Lally School of Management Rensselaer Polytechnic Institute.
Or How I Learned to Love the Cube…. Alexander P. Nykolaiszyn BLOG:
Data Warehousing and OLAP Outline u Models & operations u Implementing a warehouse u Future directions.
Business Intelligence Environment Integration with Dynamics NAV Rogers Family Company Matthew McGinley Devraj Ghosh Dominic Miller.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Collage Score Card & Software defect prediction
Week 11 – Data Warehouse INFOSYS 222.
Chapter 13 Business Intelligence and Data Warehouses
On-Line Analytic Processing
Data Warehouse.
Databases & Data Warehouses
Business Intelligence
Data Warehouse and OLAP
University of Houston-Clear Lake Kaiser Permanente San Jose
Objectives Data Mining Course
Implementing Data Models & Reports with Microsoft SQL Server
Data Warehouse and OLAP
Presentation transcript:

CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by – Prof. Meiliu lu

Agenda Data Warehouse Project  Introduction  Background  Scope of study  Implementation  Data Cleaning and Preprocessing  Data Mart Data Mining Project  Introduction  Background  Scope of study  Implementation  Data mining  Learning experience  Future Scope  References

Data Warehouse Introduction The objective of our project is to create a data mart with star schema Data mart will be used to find answers related to various company key factors and statistics.

Background Source website : Navathe company schema Dataset : Company dataset Company dataset : Fact table - 7 attribtues,1000 entries

Scope Of Study Data Preprocessing Microsoft Office Excel Microsoft SQL Server Data Mart Microsoft SQL server, Visio, convertCSVtoSQL Olap Operations SQL server queries

Implementation Data Cleaning & Preprocessing Data Mart Olap Operations

Data Cleaning & Preprocessing The company schema had different tables as per navathe, we also added few dimension for analytical processing and created a fact table with star schema.

Data Mart We have 5 dimension tables in our data mart and one fact table which forms star schema. The Fact table tables consists of around 1000 rows having various details about ssn, project, work_id etc

Star Schema

Data Mart Question-Answers How many products were produced over the months? Rollup How to find employee current working project? Slicing on employee dimension How to find the statistics of days where more than 5 products were produced Dicing on product and work dimension How to find which days and how many products of particular product were produced? Scoping

Olap Operations Example Roll Up select t.date_year, t.date_month, sum(w.NumberOfProduct) as 'No. Of Products' from EmpFactTable f, DimTime t, DimEmp_work_record w where f.date_key= t.date_key and f.work_id = w.work_id group by date_year, date_month with rollup date_year date_month No. Of Products NULL 3255 NULL NULL 3255

Quiz Which dimension was used for slicing cube? Employee Time Work Product Answer - Employee

Data Mining Project

Introduction Perform Data mining on data set to discover knowledge Apply data mining algorithms using tools compare the performance of algorithms using these tools. Compare the tools performance

Background Source Website – Dataset : Consumer complaints Data: - 14 attribtues, entries (Data from 2012 to 2014)

Scope Of Study Data Preprocessing Microsoft Office Excel Tools (Weka, Rapidminer) Data Mining Tools : Weka, Rapidminer Algorithms : K-Means, Naïve Bayes

Implementation Data Cleaning & Preprocessing Data Mining Tools Comparision

Data Cleaning & Preprocessing Data Cleaning - Replaced missing values with “unknown” Data selection – Selected Consumer complaints data of two months (Sept, Oct) for mining Sample Data selected as 3000 rows

Data Mining We have used One Classification & One Clustering Algorithm Classification – Naïve Bayes Clustering – K-means

Data Mining Demo

Tools Comparision : K-Means Rapid Miner Weka

Tools Comparision : Naïve Bayes Rapidminer Weka

Quiz Which Clustering Algorithm was used for data mining? K-Means EM Answer – K-means

Learning Experience Learned the analytical processing through data mart project. Helped to improve knowledge for Database statistics Learned to gain information out of the querying results. Learned different data mining tools like weka and rapid miner Improved understanding of various algorithms and their practical implementation through tools Learned to make sense out of the results obtained from the tools

Future Scope Data Warehouse Create a snowflake schema by introducing dimension like employee types contractors/Fulltime and then take it further for analytical processing for different statistics Data Mining Can implement other algorithms and tools like orange etc

References Elmasri and Navathe, Fundamentals of Database System, 6th Edition, Addison-Wesley Publishing OLAP Courseware DM dataset Data Mining Courseware pidMinerInAcademicUse_en.pdf pidMinerInAcademicUse_en.pdf

Questions….