6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications
6/23/2015CSE591: Data Mining by H. Liu2 Market Basket Analysis Transactional data Sparse matrix: thousands of columns, each row has only dozens of values Items Itemsets: transactions (TID) A most cited example “diapers and beer”
6/23/2015CSE591: Data Mining by H. Liu3 Association rule mining Finding interesting association or correlation relationships Defining interesting association rules Support (P(AB)) Confidence (P(B|A)) An association rule A -> B
6/23/2015CSE591: Data Mining by H. Liu4 Finding association rules Finding frequent itemsets downward closure property (or anti-monotonic) Finding association rules from frequent itemsets Frequent Itemsets minisup from 1-itemset to k-itemset Association rules miniconf satisfying minimum confidence Level-wise search Anti-monotone property
6/23/2015CSE591: Data Mining by H. Liu5 Apriori candidate set generation For k=1, C 1 = all 1-itemsets. For k>1, generate C k from L k-1 as follows: The join step C k = k-2 way join of L k-1 with itself If both {a 1, …,a k-2, a k-1 } & {a 1, …, a k-2, a k } are in L k-1, then add {a 1, …,a k-2, a k-1, a k } to C k (We keep items sorted). The prune step Remove {a 1, …,a k-2, a k-1, a k } if it contains a non- frequent (k-1) subset An example
6/23/2015CSE591: Data Mining by H. Liu6 Derive rules from frequent itemsets Frequent itemsets != association rules One more step is required to find association rules For each frequent itemset X, For each proper nonempty subset A of X, Let B = X - A A B is an association rule if Confidence (A B) ≥ minConf, where support (A B) = support (AB) and confidence (A B) = support (AB) / support (A)
6/23/2015CSE591: Data Mining by H. Liu7 Issues Efficiency and thresholding for minsup Number of association rules size of data vs. size of association rules Post-processing Applications combining association rules with classification emergency patterns
6/23/2015CSE591: Data Mining by H. Liu8 Types of association rules Single dimensional association rules Multiple dimensional association rules Multi-level association rules Many other research activities on association rules Creating ARs without candidate set generation Speeding up rule generation Interestingness measures