Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999

2 DB of "Basket Data" TIDitems 1001 3 4 2002 3 5 3001 2 3 5 4002 5 Mining Association Rules association rules {1} => {3} {2,3} => {5} {2,5} => {3} association rule metrics:

3 Step I: Find all itemsets with minimum support (minsup) Step II: Generate rules from minsup'ed itemsets General Strategy

4 Step I: Finding Minsup Itemsets Key fact: Adding items to an itemset never increases its support General Strategy: Proceed inductively on itemset size Apriori Algorithm: 1. Base case: Begin with all minsup itemsets of size 1 (L 1 ) 2. Without peeking at the DB, generate candidate itemsets of size k (C k ) from L k-1 3. Remove candidate itemsets that contain unsupported subsets 4. Further refine C k using the database to produce L k repeat

5 Algorithm to Guess Itemsets Naïve way: Extend all itemsets with all possible items More sophisticated: Join L k-1 with itself, adding only a single, final item e.g.: {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2, 3, 4} produces {1 2 3 4} and {1 3 4 5} Remove itemsets with an unsupported subset e.g.: {1 3 4 5} has an unsupported subset: {1 4 5} if minsup = 50% Use the database to further refine C k

6 Example

7 Part II: Generating Rules Key fact: Moving items from the antecedent to the consequent never changes support, and never increases confidence Algorithm For each itemset IS with minsup: Find all minconf rules with a single consequent of the form (IS - L 1 => L 1 ) Guess candidate consequents C k by appending items from IS - L k-1 to L k-1 Verify confidence of each rule IS - C k => C k using known itemset support values repeat

8 Other Details Itemset hash trees for subset testing Buffering Variations Fewer database passes, itemsets from multiple iterations AprioriTID -- exclude unnecessary database records AprioriHybrid -- use either Apriori or AprioriTID Future Work: Multiple ISA Taxonomies constraints on rules (e.g. # of items)

9 Subsequent Papers Mining sequenced rules Finding "interesting" rules Efficiently handling long itemsets Integration with query optimizers Adjustments to handle dense/relational databases Apply constraints to further filter association rules

10 Questions How are rules ranked? Do the minsup and minconf find interesting rules? Do they omit any interesting rules? What about maximum support? How well will this approach work for other problems (e.g. clustering, classification)?

11 Apriori

12 Apriori Join operation Subset filtering

Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

Similar presentations

Presentation on theme: "Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.

Similar presentations

Presentation on theme: "Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999."— Presentation transcript:

Similar presentations

About project

Feedback