Data Mining Association Rule Classification Clustering
Data Mining: Association Rule
What Is Association Mining? Association Rule Mining – Finding frequent patterns, associations, correlations, or causal structures among item sets in transaction databases, relational databases, and other information repositories Applications – Market basket analysis (marketing strategy: items to put on sale at reduced prices), cross-marketing, catalog design, shelf space layout design, etc Examples – Rule form: Body ead [Support, Confidence]. – buys(x, “Computer”) buys(x, “Software”) [2%, 60%] – major(x, “CS”) ^ takes(x, “ DB”) grade(x, “A”) [1%, 75%]
Market Basket Analysis Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.
Rule Measures: Support and Confidence Let minimum support 50%, and minimum confidence 50%, we have –A C [50%, 66.6%] –C A [50%, 100%]
Support & Confidence
Association Rule: Basic Concepts Given –(1) database of transactions, –(2) each transaction is a list of items (purchased by a customer in a visit) Find all rules that correlate the presence of one set of items with that of another set of items Find all the rules A B with minimum confidence and support –support, s, P(A B) –confidence, c, P(B|A)
Terminologies Item –I1, I2, I3, … –A, B, C, … Itemset –{I1}, {I1, I7}, {I2, I3, I5}, … –{A}, {A, G}, {B, C, E}, … 1-Itemset –{I1}, {I2}, {A}, … 2-Itemset –{I1, I7}, {I3, I5}, {A, G}, …
Terminologies K-Itemset –If the length of the itemset is K Frequent (Large) K-Itemset –If the length of the itemset is K and the itemset satisfies a minimum support threshold. Association Rule –If a rule satisfies both a minimum support threshold and a minimum confidence threshold
Analysis The number of itemsets of a given cardinality tends to grow exponentially
Fast Algorithms for Mining Association Rules
Mining Association Rules: Apriori Principle For rule A C: –support = support({A C}) = 50% –confidence = support({A C})/support({A}) = 66.6% The Apriori principle: –Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%
Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support –A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset –Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Use the frequent itemsets to generate association rules
Another Example 1 Database D scan D count C 1 C 1 count generate L 1 L scan D count C 2 C 2 count generate L 2 L C generate C 2 scan D count C 3 C 3 count generate L 3 L C generate C 3
Example of Generating Candidates L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3 –abcd from abc and abd –acde from acd and ace Pruning: –acde is removed because ade is not in L 3 C 4 ={abcd}
Example
Apriori Algorithm
Another Example 2
Demo-IBM Intelligent Minner
Demo Database
Multi-Dimensional Association Single-Dimensional (Intra-Dimension) Rules: Single Dimension (Predicate) with Multiple Occurrences. buys(X, “milk”) buys(X, “bread”) Multi-Dimensional Rules: 2 Dimensions –Inter-dimension association rules (no repeated predicates) age(X,”19-25”) occupation(X,“student”) buys(X,“coke”) –hybrid-dimension association rules (repeated predicates) age(X,”19-25”) buys(X, “popcorn”) buys(X, “coke”) Categorical (Nominal) Attributes –finite number of possible values, no ordering among values Quantitative Attributes –numeric, implicit ordering among values
An Example