Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

Similar presentations


Presentation on theme: "ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)"— Presentation transcript:

1 ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

2 ICDM'06 Panel2 Association rules - idea [Agrawal+SIGMOD93] Consider ‘market basket’ case: Consider ‘market basket’ case: (milk, bread) (milk) (milk, chocolate) (milk, bread) Find ‘interesting things’, eg., rules of the form: Find ‘interesting things’, eg., rules of the form: milk, bread -> chocolate | 90% milk, bread -> chocolate | 90%

3 ICDM'06 Panel3 Association rules - idea In general, for a given rule Ij, Ik,... Im -> Ix | c ‘s’ = support: how often people buy Ij,... Im, Ix ‘c’ = ‘confidence’ (how often people by Ix, given that they have bought Ij,... Im Ix 40% 20% Eg.: s = 20% c= 20/40= 50%

4 ICDM'06 Panel4 Association rules - idea Problem definition: given given – a set of ‘market baskets’ (=binary matrix, of N rows/baskets and M columns/products) –min-support ‘s’ and –min-confidence ‘c’ find find –all the rules with higher support and confidence

5 ICDM'06 Panel5 Association rules - idea Closely related concept: “large itemset” Ij, Ik,... Im, Ix is a ‘large itemset’, if it appears more than ‘min-support’ times Observation: once we have a ‘large itemset’, we can find out the qualifying rules easily Thus, we focus on finding ‘large itemsets’

6 ICDM'06 Panel6 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback?Improvement?

7 ICDM'06 Panel7 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback? 2**1000 is prohibitive... Improvement? scan the db |I| times, looking for 1-, 2-, etc itemsets Eg., for |I|=4 items only (a,b,c,d), we have

8 ICDM'06 Panel8 What itemsets do you count? Anti-monotonicity: Any superset of an infrequent itemset is also infrequent (SIGMOD ’93). – –If an itemset is infrequent, don’t count any of its extensions. Flip the property: All subsets of a frequent itemset are frequent. Need not count any candidate that has an infrequent subset (VLDB ’94) – –Simultaneously observed by Mannila et al., KDD ’94 Broadly applicable to extensions and restrictions.

9 ICDM'06 Panel9 Apriori Algorithm: Breadth First Search { } abcd say, min-sup =

10 ICDM'06 Panel10 Apriori Algorithm: Breadth First Search { } abcd say, min-sup =

11 ICDM'06 Panel11 Apriori Algorithm: Breadth First Search { } abcd say, min-sup = 10

12 ICDM'06 Panel12 Apriori Algorithm: Breadth First Search { } a a ba d b b d cd

13 ICDM'06 Panel13 Apriori Algorithm: Breadth First Search { } a a ba d b b d cd

14 ICDM'06 Panel14 Apriori Algorithm: Breadth First Search { } a a b a b d a d b b d cd

15 ICDM'06 Panel15 Apriori Algorithm: Breadth First Search { } a a b a b d a d b b d cd

16 ICDM'06 Panel16 Subsequent Algorithmic Innovations Reducing the cost of checking whether a candidate itemset is contained in a transaction: Reducing the cost of checking whether a candidate itemset is contained in a transaction: –TID intersection. –Database projection, FP Growth Reducing the number of passes over the data: Reducing the number of passes over the data: –Sampling & Dynamic Counting Reducing the number of candidates counted: Reducing the number of candidates counted: –For maximal patterns & constraints. Many other innovative ideas … Many other innovative ideas …

17 ICDM'06 Panel17 Impact Concepts in Apriori also applied to many generalizations, e.g., taxonomies, quantitative Associations, sequential Patterns, graphs, … Concepts in Apriori also applied to many generalizations, e.g., taxonomies, quantitative Associations, sequential Patterns, graphs, … Over 3600 citations in Google Scholar. Over 3600 citations in Google Scholar.


Download ppt "ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)"

Similar presentations


Ads by Google