Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lazy Associative Classification

Similar presentations


Presentation on theme: "Lazy Associative Classification"— Presentation transcript:

1 Lazy Associative Classification
A. Veloso, W. M. Jr., and M. J. Zaki ICDM 2006 Advisor: Dr. Koh Jia-Ling Speaker: Liu Yu-Jiun Date: 2007/3/8

2 Outline Introduction Information Gain Decision Tree
Eager Associative Classifier DT v.s. EAC Lazy Associative Classifier LAC v.s. EAC Experiment

3 Introduction Classification problem Models of classification
Decision Tree Associative Classifier Neural Network Genetic Algorithm Lazy association classifier DT缺乏宏觀的相關性 (local) AC有可能產生太多的rule (global) LAC希望保留AC的準確度且不會產生太多的規則 Lazy的意思是force在有用的features上

4 Information gain S: any subset of training instances.
si: the # of instances with class ci. |S|: the total # of training instance. : the probability of class ci in S. : the entropy of S. : information gain

5 Decision Tree A DT is built using a greedy, recursive splitting strategy. Each internal node is split according to the information gain. One rule per leaf.

6 Example

7 Decision Tree Classifier
{outlook=sunny and humidity=high  play=no} {outlook=sunny, temperature=cool, humidity=high, windy=false}

8 Eager Associative Classifier

9 CARs from EAC {windy=false and temperature=cool  play=yes}
{outlook=sunny and humidity=high  play=no} {outlook=sunny and temperature=cool  play=yes} {outlook=sunny, temperature=cool, humidity=high, windy=false}

10 DT v.s. EAC

11 Lazy Associative Classifier

12 Projected Training Data

13 Prediction results of EAC and LAC
minsup = 40% Test instance: {o=overcast, t=hot, h=low, w=true} {windy=false and humidity=normal  play=yes} {windy=false and temperature=cool  play=yes} {temperature=cool and humidity=normal  play=yes} {outlook=overcast  play=yes} {temperature=hot  play=yes} {windy=true  play=no}

14 LAC v.s. EAC

15 Two characteristics Missing CARs Highly Disjunctive Spaces

16 Experiment 26 datasets from UCI Machine Learning Repository
min_conf = 50%, min_sup = 1% Linux-based PC Intel PIII 1.0 GHz 1G RAM

17 Error Rates EAC info. gain 絕對比C4.5好,而其他 EAC則不一定,CBA在稀疏資料空間表現比較好,平均而言EAC info gain比CBA好,而CMAR更好的原因在於預測類別時使用多個規則,EAC info gain只有挑rank最高的那個。

18 Rule-Set Utilization

19 Execution Times Cache size: 10,000 CARs


Download ppt "Lazy Associative Classification"

Similar presentations


Ads by Google