Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by – Prof. Meiliu lu.

Similar presentations


Presentation on theme: "CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by – Prof. Meiliu lu."— Presentation transcript:

1 CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by – Prof. Meiliu lu

2 Agenda Data Warehouse Project  Introduction  Background  Scope of study  Implementation  Data Cleaning and Preprocessing  Data Mart Data Mining Project  Introduction  Background  Scope of study  Implementation  Data mining  Learning experience  Future Scope  References

3 Data Warehouse Introduction The objective of our project is to create a data mart with star schema Data mart will be used to find answers related to various company key factors and statistics.

4 Background Source website : Navathe company schema Dataset : Company dataset Company dataset : Fact table - 7 attribtues,1000 entries

5 Scope Of Study Data Preprocessing Microsoft Office Excel Microsoft SQL Server Data Mart Microsoft SQL server, Visio, convertCSVtoSQL Olap Operations SQL server queries

6 Implementation Data Cleaning & Preprocessing Data Mart Olap Operations

7 Data Cleaning & Preprocessing The company schema had different tables as per navathe, we also added few dimension for analytical processing and created a fact table with star schema.

8 Data Mart We have 5 dimension tables in our data mart and one fact table which forms star schema. The Fact table tables consists of around 1000 rows having various details about ssn, project, work_id etc

9 Star Schema

10 Data Mart Question-Answers How many products were produced over the months? Rollup How to find employee current working project? Slicing on employee dimension How to find the statistics of days where more than 5 products were produced Dicing on product and work dimension How to find which days and how many products of particular product were produced? Scoping

11 Olap Operations Example Roll Up select t.date_year, t.date_month, sum(w.NumberOfProduct) as 'No. Of Products' from EmpFactTable f, DimTime t, DimEmp_work_record w where f.date_key= t.date_key and f.work_id = w.work_id group by date_year, date_month with rollup date_year date_month No. Of Products 2014 1 980 2014 2 761 2014 3 1274 2014 4 240 2014 NULL 3255 NULL NULL 3255

12 Quiz Which dimension was used for slicing cube? Employee Time Work Product Answer - Employee

13 Data Mining Project

14 Introduction Perform Data mining on data set to discover knowledge Apply data mining algorithms using tools compare the performance of algorithms using these tools. Compare the tools performance

15 Background Source Website – www.data.govwww.data.gov Dataset : Consumer complaints Data: - 14 attribtues, 55000 entries (Data from 2012 to 2014)

16 Scope Of Study Data Preprocessing Microsoft Office Excel Tools (Weka, Rapidminer) Data Mining Tools : Weka, Rapidminer Algorithms : K-Means, Naïve Bayes

17 Implementation Data Cleaning & Preprocessing Data Mining Tools Comparision

18 Data Cleaning & Preprocessing Data Cleaning - Replaced missing values with “unknown” Data selection – Selected Consumer complaints data of two months (Sept, Oct) for mining Sample Data selected as 3000 rows

19 Data Mining We have used One Classification & One Clustering Algorithm Classification – Naïve Bayes Clustering – K-means

20 Data Mining Demo

21 Tools Comparision : K-Means Rapid Miner Weka

22 Tools Comparision : Naïve Bayes Rapidminer Weka

23 Quiz Which Clustering Algorithm was used for data mining? K-Means EM Answer – K-means

24 Learning Experience Learned the analytical processing through data mart project. Helped to improve knowledge for Database statistics Learned to gain information out of the querying results. Learned different data mining tools like weka and rapid miner Improved understanding of various algorithms and their practical implementation through tools Learned to make sense out of the results obtained from the tools

25 Future Scope Data Warehouse Create a snowflake schema by introducing dimension like employee types contractors/Fulltime and then take it further for analytical processing for different statistics Data Mining Can implement other algorithms and tools like orange etc

26 References Elmasri and Navathe, Fundamentals of Database System, 6th Edition, Addison-Wesley Publishing OLAP Courseware http://athena.ecs.csus.edu/~olap/olap/introduction.php http://athena.ecs.csus.edu/~olap/olap/introduction.php DM dataset http://www.data.gov/consumer/http://www.data.gov/consumer/ Data Mining Courseware http://athena.ecs.csus.edu/~dataminihttp://athena.ecs.csus.edu/~datamini https://rapidminer.com/wpcontent/uploads/2013/10/RapidMiner_Ra pidMinerInAcademicUse_en.pdf https://rapidminer.com/wpcontent/uploads/2013/10/RapidMiner_Ra pidMinerInAcademicUse_en.pdf

27 Questions….


Download ppt "CSC 177 Data warehouse and Mining project Pooja Vora Vishma Shah Guided by – Prof. Meiliu lu."

Similar presentations


Ads by Google