Download presentation

Presentation is loading. Please wait.

Published byJulius Flowers Modified over 2 years ago

1
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

2
ICDM'06 Panel2 Association rules - idea [Agrawal+SIGMOD93] Consider ‘market basket’ case: Consider ‘market basket’ case: (milk, bread) (milk) (milk, chocolate) (milk, bread) Find ‘interesting things’, eg., rules of the form: Find ‘interesting things’, eg., rules of the form: milk, bread -> chocolate | 90% milk, bread -> chocolate | 90%

3
ICDM'06 Panel3 Association rules - idea In general, for a given rule Ij, Ik,... Im -> Ix | c ‘s’ = support: how often people buy Ij,... Im, Ix ‘c’ = ‘confidence’ (how often people by Ix, given that they have bought Ij,... Im Ix 40% 20% Eg.: s = 20% c= 20/40= 50%

4
ICDM'06 Panel4 Association rules - idea Problem definition: given given – a set of ‘market baskets’ (=binary matrix, of N rows/baskets and M columns/products) –min-support ‘s’ and –min-confidence ‘c’ find find –all the rules with higher support and confidence

5
ICDM'06 Panel5 Association rules - idea Closely related concept: “large itemset” Ij, Ik,... Im, Ix is a ‘large itemset’, if it appears more than ‘min-support’ times Observation: once we have a ‘large itemset’, we can find out the qualifying rules easily Thus, we focus on finding ‘large itemsets’

6
ICDM'06 Panel6 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback?Improvement?

7
ICDM'06 Panel7 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback? 2**1000 is prohibitive... Improvement? scan the db |I| times, looking for 1-, 2-, etc itemsets Eg., for |I|=4 items only (a,b,c,d), we have

8
ICDM'06 Panel8 What itemsets do you count? Anti-monotonicity: Any superset of an infrequent itemset is also infrequent (SIGMOD ’93). – –If an itemset is infrequent, don’t count any of its extensions. Flip the property: All subsets of a frequent itemset are frequent. Need not count any candidate that has an infrequent subset (VLDB ’94) – –Simultaneously observed by Mannila et al., KDD ’94 Broadly applicable to extensions and restrictions.

9
ICDM'06 Panel9 Apriori Algorithm: Breadth First Search { } abcd say, min-sup = 10 120

10
ICDM'06 Panel10 Apriori Algorithm: Breadth First Search { } abcd say, min-sup = 10 120 8030570

11
ICDM'06 Panel11 Apriori Algorithm: Breadth First Search { } abcd 8030570 say, min-sup = 10

12
ICDM'06 Panel12 Apriori Algorithm: Breadth First Search { } a a ba d b b d cd

13
ICDM'06 Panel13 Apriori Algorithm: Breadth First Search { } a a ba d b b d cd

14
ICDM'06 Panel14 Apriori Algorithm: Breadth First Search { } a a b a b d a d b b d cd

15
ICDM'06 Panel15 Apriori Algorithm: Breadth First Search { } a a b a b d a d b b d cd

16
ICDM'06 Panel16 Subsequent Algorithmic Innovations Reducing the cost of checking whether a candidate itemset is contained in a transaction: Reducing the cost of checking whether a candidate itemset is contained in a transaction: –TID intersection. –Database projection, FP Growth Reducing the number of passes over the data: Reducing the number of passes over the data: –Sampling & Dynamic Counting Reducing the number of candidates counted: Reducing the number of candidates counted: –For maximal patterns & constraints. Many other innovative ideas … Many other innovative ideas …

17
ICDM'06 Panel17 Impact Concepts in Apriori also applied to many generalizations, e.g., taxonomies, quantitative Associations, sequential Patterns, graphs, … Concepts in Apriori also applied to many generalizations, e.g., taxonomies, quantitative Associations, sequential Patterns, graphs, … Over 3600 citations in Google Scholar. Over 3600 citations in Google Scholar.

Similar presentations

OK

Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Data Warehousing / Data Mining.

Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications C. Faloutsos Data Warehousing / Data Mining.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on bank lending rates Ppt on history of olympics in canada Ppt on depth first search vs breadth Working of raster scan display ppt on tv Ppt on product advertising letter Ppt on selling techniques Ppt on file tracking system Ppt on iron and steel industry in india A ppt on loch ness monster pictures Ppt on column chromatography set