Download presentation

Presentation is loading. Please wait.

1
Data Mining Chun-Hung Chou g834008@alumni.nthu.edu.tw

2
Outline Data Mining Overview Functionalities Examples Q & A

3
Knowledge Discovery Process 1. Data cleaning - remove noise and inconsistent data 2. Data integration - combine multiple data sources 3. Data selection - data relevant to the analysis task 4. Data transformation - the forms for mining 5. Data mining 6. Pattern evaluation - identify 7. Knowledge presentation

4
What is Data Mining? Searching for knowledge(interesting patterns) in your data a process that uses a variety of data analysis tools to discover patterns and relationships in data. Uses tools from Computer Science and Artificial Intelligence as well as Statistics.

5
Why we need data mining? –Large number of records (cases) (10 8 -10 12 bytes) –High dimensional data (variables) (10-10 4 attributes) –Only a small portion, typically 5% to 10%, of the collected data is ever analyzed. –Data that may never be explored continues to be collected out of fear that something that may prove important in the future may be missing. –Magnitude of data precludes most traditional analysis ANOVA/PC/

6
Goals of Data Mining Prediction using some variables or fields in the data set to predict unknown or future values of other variables of interest produce a model,expressed as an executable code, which can be used to perform classification, prediction, estimation or other similar tasks Description finding patterns describing the data that can be interpreted by humans understanding of the analyzed system by uncovering patterns and relationships in large data sets

7
Procedure of Data Mining Interpret the model & draw the conclusions State the problem Collect the data Perform preprocessing Estimate the model (mine the data)

8
State the problem –domain-specific knowledge and experience are necessary in order to come up with a meaningful problem statement –A close interaction between data mining expert and the application expert –This cooperation does not stop in the initial phase; it continues during the entire data mining process

9
Collect the data –Designed experiment data the data-generation process is under the control of an expert –Observational approach random data generation

10
Preprocessing the data Outlier detection a)Detect and eventually remove outliers as a part of the preprocessing phase b)Develop robust modeling methods that are insensitive to outliers Scaling,encoding and selecting features a)variables with different scale b)dimensionality reduction

11
Estimate the model Selection and implementation of the appropriate data mining technique

12
Interpret the model & draw conclusions Decision making Validate the result

13
Potential Applications –Fraud Detection –Manufacturing Processes –Targeting Markets –Scientific Data Analysis –Risk Management –Web Intelligence –Bioinformation –…...

14
Data Mining Myths Data mining tools need no guidance. Data mining models explain behavior. Data mining requires no data analysis skill. Data mining eliminates the need to understand your business and your data Data mining tools are “different” from statistics.

15
Data Mining Functionalities Concept/Class Description Association Analysis Classification Analysis Cluster Analysis Outlier Analysis Evolution Analysis

16
Concept Description Generate descriptions for characterization and comparison of data characterization : summarizes and describes a collection of data e.g. mean,distribution,percentile,.. comparison : summarizes and distinguishes one collection of data from other collection(s) of data

17
Association Analysis Goal: find interesting relationships among items in a given data set

18
Association Analysis Example: Market Basket Analysis - An example of Rule-based Machine Learning Customer Analysis –Market Basket Analysis uses the information about what a customer purchases to give us insight into who they are and why they make certain purchases Product Analysis –Market Basket Analysis gives us insight into the merchandise by telling us which products tend to be purchased together and which are most amenable to purchase

19
Classification Analysis Goal: Build a model to describe a predetermined set of data classes or concepts and use the model as prediction

20
Classification Analysis Method: Decision Tree Bayesian network Bayesian belife network Neural network k-nearest neighbor case-based reasoning genetic algorithm rough sets fuzzy logic SVM/SOM ….

21
Cluster Analysis Goal: grouping a set of physical or abstract objects into classes of similar objects

22
Cluster Method: Partitioning methods :k-means Hierarchical methods :top-down,bottom-up Density-based methods :arbitrary shapes Grid-based methods :cells Model-based methods :best fit of given model

23
Outlier Analysis Outlier: the data can be considered as inconsistent in a given data set Goal: find an efficient method to mine the outliers

24
Outlier Analysis Method: - Statistical-Based Outlier Detection - Distance-Based Outlier Detection - Deviation-Based Outlier Detection

25
Evolution Analysis Goal: Describe and models regularities or trends for objects whose behavior changes over time

26
Evolution Analysis Method: Statistical Method Trend Analysis Similarity Search in Time-Series Analysis Sequential Pattern Mining Periodicity Analysis

27
Example V6A event : DT 1 poly Dep BLC Root cause: FUR-DPA-02 (VPLPDT1 : DT#1 POLY DEP)

28
Example Result

29
Question & Suggestion

30
Thanks !

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google