# Data Mining Techniques Association Rule

## Presentation on theme: "Data Mining Techniques Association Rule"— Presentation transcript:

Data Mining Techniques Association Rule

What Is Association Mining?
Association Rule Mining Finding frequent patterns, associations, correlations, or causal structures among item sets in transaction databases, relational databases, and other information repositories Applications Market basket analysis (marketing strategy: items to put on sale at reduced prices), cross-marketing, catalog design, shelf space layout design, etc Examples Rule form: Body ® Head [Support, Confidence]. buys(x, “Computer”) ® buys(x, “Software”) [2%, 60%] major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%]

Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.

Rule Measures: Support and Confidence
Let minimum support 50%, and minimum confidence 50%, we have A  C [50%, 66.6%] C  A [50%, 100%]

Support & Confidence

Association Rule: Basic Concepts
Given (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) Find all rules that correlate the presence of one set of items with that of another set of items Find all the rules A  B with minimum confidence and support support, s, P(A  B) confidence, c, P(B|A)

Terminologies Item Itemset 1-Itemset 2-Itemset I1, I2, I3, …
A, B, C, … Itemset {I1}, {I1, I7}, {I2, I3, I5}, … {A}, {A, G}, {B, C, E}, … 1-Itemset {I1}, {I2}, {A}, … 2-Itemset {I1, I7}, {I3, I5}, {A, G}, …

Terminologies K-Itemset Frequent (Large) K-Itemset Association Rule
If the length of the itemset is K Frequent (Large) K-Itemset If the length of the itemset is K and the itemset satisfies a minimum support threshold. Association Rule If a rule satisfies both a minimum support threshold and a minimum confidence threshold

Analysis The number of itemsets of a given cardinality tends to grow exponentially

Fast Algorithms for Mining Association Rules

Mining Association Rules: Apriori Principle
Min. support 50% Min. confidence 50% For rule A  C: support = support({A  C}) = 50% confidence = support({A  C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent

Mining Frequent Itemsets: the Key Step
Find the frequent itemsets: the sets of items that have minimum support A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Use the frequent itemsets to generate association rules

Example Database D 1 3 4 2 3 5 1 2 3 5 2 5 scan D count C1 C1 count
2 5 scan D count C1 C1 count generate L1 L1 1 2 3 5 scan D count C2 C2 count generate L2 L2 13 23 25 35 C2 12 15 generate C2 scan D count C3 C3 count generate L3 L3 235 C3 generate C3

Example of Generating Candidates
L3={abc, abd, acd, ace, bcd} Self-joining: L3*L3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L3 C4={abcd}

Example

Apriori Algorithm

Apriori Algorithm

Apriori Algorithm

Exercise 4 min-sup = 20% min-conf = 80%

Demo-IBM Intelligent Minner

Demo Database

Multi-Dimensional Association
Single-Dimensional (Intra-Dimension) Rules: Single Dimension (Predicate) with Multiple Occurrences. buys(X, “milk”)  buys(X, “bread”) Multi-Dimensional Rules:  2 Dimensions Inter-dimension association rules (no repeated predicates) age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”) hybrid-dimension association rules (repeated predicates) age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”) Categorical (Nominal) Attributes finite number of possible values, no ordering among values Quantitative Attributes numeric, implicit ordering among values

Exercise 5 min-sup = 20% min-conf = 80%

Research Topics Quantitative Association Rules
buys (bread, 5) ® buys (milk, 3) Weighted Association Rules High Utility Association Rules Non-redundant Association Rule Constrained Association Rules Mining Multi-dimensional Association Rules Generalized Association Rules Negative Association Rules Incremental Mining Association Rules Data Stream Association Rule Mining Interactive Mining Association Rules