Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Discretization of Continuous Variables for Set Mining Author:Stephen D. Bay Advisor: Dr. Hsu Graduate: Kuo-wei Chen.

Similar presentations


Presentation on theme: "Multivariate Discretization of Continuous Variables for Set Mining Author:Stephen D. Bay Advisor: Dr. Hsu Graduate: Kuo-wei Chen."— Presentation transcript:

1 Multivariate Discretization of Continuous Variables for Set Mining Author:Stephen D. Bay Advisor: Dr. Hsu Graduate: Kuo-wei Chen

2 Outline Motivation Objective Introduction (1)~(2) Multivariate Discretization Approach(1)~(5) Experiment (1)~(6) Conclusions Opinion

3 Motivation Most discretization method are univariate and consider only a single feature at a time.This is a sub-optimal approach for knowledge discovery as univariate discretization can destroy hidden patterns in data.

4 Objective To describe why univariate is scarcely comparable to multivariate. Present a bottom up merging algorithm that is called “ MVD ” Present an experiment to prove that MVD ’ s execute time is more efficient than other univariate approaches.

5 Introduction(1) In Knowledge Discovery, to promote predictive accuracy is not the most important thing. The emphasis is previously unknown and insightful patterns. The discretized intervals should not hide patterns. The intervals should be semantically meaningful. Multivariate discretization one considers how all the variables interact before deciding on discretized intervals.

6 Introduction(2) Example

7 Multivariate Discretization Approach(1) Past Discretization Approaches Univariate Miss interactions of several variables Executable Time is long: O(n 2 ) Many Rules

8 Multivariate Discretization Approach(2) STUCCO Find large differences between two probability distributions The mining objectives of STUCCO P(C|G 1 )  p(C|G 2 ) …… (1) |support(C|G 1 )  support(C|G 2 )|   …… (2) Control the merging process.

9 Multivariate Discretization Approach(3) Algorithm Step 1.Partition all continuous attributes into n basic intervals 2.Merging adjacent intervals X and Y where they have the minmum combined support. 3.If Fx~Fy then merge X and Y. 4.If there are no eligible intervals stop.Otherwise go to 2.

10 Multivariate Discretization Approach(4) Efficiency STUCCO runs efficientl on many datasets. The problems STUCCO are often easier than that faced by the main mining program. Only to find single difference between the groups Calling STUCCO repeatedly will result in many passes over the database.

11 Multivariate Discretization Approach(5) Sensitivity to hidden Patterns Parity R+I Eexample

12 Experiment(1) Sun Ultra-5 with 128MB Parameter settings

13 Experiment(2) Discretization Time in CPU seconds

14 Experiment(3) Qualitative Results Discretization Cutpoints for Age on the Adult Census Data

15 Experiment(4) Qualitative Results Discretization Cutpoints for Capital-Loss on the Adult Census Data

16 Experiment(5) Qualitative Results Discretization Cutpoints for Parental Income on the UCI Admission Data

17 Experiment(6) Qualitative Results Discretization Cutpoints for GPA on the UCI Admission Data

18 Conclusions The MVD algorithm can finely partitions continuous variables and then merges adjacent intervals continuous variables only if their instances have similar multivariate distributions. Experimental results indicate that the MVD algorithm detect high dimensional interactions between feature and discretize the data appropriately. The MVD algorithm run in time comparable to a popular univariate recursive approach.

19 Opinion If the adjacent intervals don ’ t have similar distributions between them, then MVD algorithm won ’ t be efficient. Generally,this condition is usually occurred.


Download ppt "Multivariate Discretization of Continuous Variables for Set Mining Author:Stephen D. Bay Advisor: Dr. Hsu Graduate: Kuo-wei Chen."

Similar presentations


Ads by Google