Download presentation
Presentation is loading. Please wait.
1
Lazy Associative Classification
A. Veloso, W. M. Jr., and M. J. Zaki ICDM 2006 Advisor: Dr. Koh Jia-Ling Speaker: Liu Yu-Jiun Date: 2007/3/8
2
Outline Introduction Information Gain Decision Tree
Eager Associative Classifier DT v.s. EAC Lazy Associative Classifier LAC v.s. EAC Experiment
3
Introduction Classification problem Models of classification
Decision Tree Associative Classifier Neural Network Genetic Algorithm Lazy association classifier DT缺乏宏觀的相關性 (local) AC有可能產生太多的rule (global) LAC希望保留AC的準確度且不會產生太多的規則 Lazy的意思是force在有用的features上
4
Information gain S: any subset of training instances.
si: the # of instances with class ci. |S|: the total # of training instance. : the probability of class ci in S. : the entropy of S. : information gain
5
Decision Tree A DT is built using a greedy, recursive splitting strategy. Each internal node is split according to the information gain. One rule per leaf.
6
Example
7
Decision Tree Classifier
{outlook=sunny and humidity=high play=no} {outlook=sunny, temperature=cool, humidity=high, windy=false}
8
Eager Associative Classifier
9
CARs from EAC {windy=false and temperature=cool play=yes}
{outlook=sunny and humidity=high play=no} {outlook=sunny and temperature=cool play=yes} {outlook=sunny, temperature=cool, humidity=high, windy=false}
10
DT v.s. EAC
11
Lazy Associative Classifier
12
Projected Training Data
13
Prediction results of EAC and LAC
minsup = 40% Test instance: {o=overcast, t=hot, h=low, w=true} {windy=false and humidity=normal play=yes} {windy=false and temperature=cool play=yes} {temperature=cool and humidity=normal play=yes} {outlook=overcast play=yes} {temperature=hot play=yes} {windy=true play=no}
14
LAC v.s. EAC
15
Two characteristics Missing CARs Highly Disjunctive Spaces
16
Experiment 26 datasets from UCI Machine Learning Repository
min_conf = 50%, min_sup = 1% Linux-based PC Intel PIII 1.0 GHz 1G RAM
17
Error Rates EAC info. gain 絕對比C4.5好,而其他 EAC則不一定,CBA在稀疏資料空間表現比較好,平均而言EAC info gain比CBA好,而CMAR更好的原因在於預測類別時使用多個規則,EAC info gain只有挑rank最高的那個。
18
Rule-Set Utilization
19
Execution Times Cache size: 10,000 CARs
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.