Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu.

Similar presentations


Presentation on theme: "Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu."— Presentation transcript:

1 Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu

2 Association Rules Itemset X = {x 1, …, x k } Find all the rules X  Y with minimum support and confidence support, s, is the probability that a transaction contains X  Y confidence, c, is the conditional probability that a transaction having X also contains Y Let sup min = 50%, conf min = 50% Association rules: A  C (60%, 100%) C  A (60%, 75%) Customer buys diaper Customer buys both Customer buys beer Transaction- id Items bought 100f, a, c, d, g, I, m, p 200a, b, c, f, l,m, o 300b, f, h, j, o 400b, c, k, s, p 500a, f, c, e, l, p, m, n 2

3 Classification based on Association Classification rule mining versus Association rule mining Aim A small set of rules as classifier All rules according to minsup and minconf Syntax X  y X  Y 3

4 Why & How to Integrate Both classification rule mining and association rule mining are indispensable to practical applications. The integration is done by focusing on a special subset of association rules whose right-hand-side are restricted to the classification class attribute. CARs: class association rules 4

5 CBA: Three Steps Discretize continuous attributes, if any Generate all class association rules (CARs) Build a classifier based on the generated CARs. 5

6 Our Objectives To generate the complete set of CARs that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf) constraints. To build a classifier from the CARs. 6

7 Rule Generator: Basic Concepts Ruleitem :condset is a set of items, y is a class label Each ruleitem represents a rule: condset->y condsupCount The number of cases in D that contain condset rulesupCount The number of cases in D that contain the condset and are labeled with class y Support =(rulesupCount/|D|)*100% Confidence =(rulesupCount/condsupCount)*100% 7

8 RG: Basic Concepts (Cont.) Frequent ruleitems A ruleitem is frequent if its support is above minsup Accurate rule A rule is accurate if its confidence is above minconf Possible rule For all ruleitems that have the same condset, the ruleitem with the highest confidence is the possible rule of this set of ruleitems. The set of class association rules (CARs) consists of all the possible rules (PRs) that are both frequent and accurate. 8

9 RG: An Example A ruleitem: assume that the support count of the condset (condsupCount) is 3, the support of this ruleitem (rulesupCount) is 2, and |D|=10 then (A,1),(B,1) -> (class,1) supt=20% (rulesupCount/|D|)*100% confd=66.7% (rulesupCount/condsupCount)*100% 9

10 RG: The Algorithm 1 F 1 = {large 1-ruleitems}; 2 CAR 1 = genRules (F 1 ); 3 prCAR 1 = pruneRules (CAR 1 ); //count the item and class occurrences to determine the frequent 1-ruleitems and prune it 4 for (k = 2; F k-1  Ø; k++) do 5C k = candidateGen (F k-1 ); //generate the candidate ruleitems C k using the frequent ruleitems F k-1 6 for each data case d  D do //scan the database 7C d = ruleSubset (C k, d); //find all the ruleitems in C k whose condsets are supported by d 8 for each candidate c  C d do 9 c.condsupCount++; 10 if d.class = c.class then c.rulesupCount++; //update various support counts of the candidates in C k 11 end 12 end 10

11 RG: The Algorithm(cont.) 13F k = {c  C k | c.rulesupCount  minsup}; //select those new frequent ruleitems to form F k 14 CAR k = genRules(F k ); //select the ruleitems both accurate and frequent 15 prCAR k = pruneRules(CAR k ); 16 end 17 CARs =  k CAR k ; 18 prCARs =  k prCAR k ; 11


Download ppt "Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu."

Similar presentations


Ads by Google