Lazy Associative Classification

Lazy Associative Classification
A. Veloso, W. M. Jr., and M. J. Zaki ICDM 2006 Advisor: Dr. Koh Jia-Ling Speaker: Liu Yu-Jiun Date: 2007/3/8

Outline Introduction Information Gain Decision Tree
Eager Associative Classifier DT v.s. EAC Lazy Associative Classifier LAC v.s. EAC Experiment

Introduction Classification problem Models of classification
Decision Tree Associative Classifier Neural Network Genetic Algorithm Lazy association classifier DT缺乏宏觀的相關性 (local) AC有可能產生太多的rule (global) LAC希望保留AC的準確度且不會產生太多的規則 Lazy的意思是force在有用的features上

Information gain S: any subset of training instances.
si: the # of instances with class ci. |S|: the total # of training instance. : the probability of class ci in S. : the entropy of S. : information gain

Decision Tree A DT is built using a greedy, recursive splitting strategy. Each internal node is split according to the information gain. One rule per leaf.

Example

Decision Tree Classifier
{outlook=sunny and humidity=high  play=no} {outlook=sunny, temperature=cool, humidity=high, windy=false}

Eager Associative Classifier

CARs from EAC {windy=false and temperature=cool  play=yes}
{outlook=sunny and humidity=high  play=no} {outlook=sunny and temperature=cool  play=yes} {outlook=sunny, temperature=cool, humidity=high, windy=false}

DT v.s. EAC

Lazy Associative Classifier

Projected Training Data

Prediction results of EAC and LAC
minsup = 40% Test instance: {o=overcast, t=hot, h=low, w=true} {windy=false and humidity=normal  play=yes} {windy=false and temperature=cool  play=yes} {temperature=cool and humidity=normal  play=yes} {outlook=overcast  play=yes} {temperature=hot  play=yes} {windy=true  play=no}

LAC v.s. EAC

Two characteristics Missing CARs Highly Disjunctive Spaces

Experiment 26 datasets from UCI Machine Learning Repository
min_conf = 50%, min_sup = 1% Linux-based PC Intel PIII 1.0 GHz 1G RAM

Error Rates EAC info. gain 絕對比C4.5好，而其他 EAC則不一定，CBA在稀疏資料空間表現比較好，平均而言EAC info gain比CBA好，而CMAR更好的原因在於預測類別時使用多個規則，EAC info gain只有挑rank最高的那個。

Rule-Set Utilization

Execution Times Cache size: 10,000 CARs

Lazy Associative Classification

Similar presentations

Presentation on theme: "Lazy Associative Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lazy Associative Classification

Similar presentations

Presentation on theme: "Lazy Associative Classification"— Presentation transcript:

Similar presentations

About project

Feedback