Download presentation

Presentation is loading. Please wait.

Published byHaleigh Truelove Modified over 3 years ago

1
Zeev Dvir – dvirzeev@post.tau.ac.il GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki

2
Zeev Dvir – dvirzeev@post.tau.ac.il The Problem Given a large database of items transactions, find all frequent itemsets A frequent itemset is a set of items that occurs in at-least a user-specified percentage of the data-base We call this percentage : min_sup (for minimum support).

3
Zeev Dvir – dvirzeev@post.tau.ac.il A Maximal Frequent Itemset is a frequent itemset, that doesn ’ t have a frequent superset FI := frequent itemsets MFI := maximal frequent itemsets Fact: |MFI| << |FI| GenMax is an algorithm to find the exact MFI

4
Zeev Dvir – dvirzeev@post.tau.ac.il Example Item /Tid ABCD 1xxx 2xx 3xxx 4xxxx 5x 6xx 7x ABCD ABC ABD ACD BCD AB AC AD BC BD CD A B C D Min_sup = 3

5
Zeev Dvir – dvirzeev@post.tau.ac.il Some Useful Definitions The Combine-Set of an itemset I, is the set of items that can be added to I to create a frequent itemset. For example, in the previous example, The combine-set of the itemset {A} is {B, C}. The combine-set of the empty itemset is called F1 and is actually the set of frequent itemsets ofsize 1.

6
Zeev Dvir – dvirzeev@post.tau.ac.il

8
Improvement At each level, sort the combine-set (C) in increasing order of support An itemset with low support has a smaller chance of producing a large combine-set in the next level The sooner we prune the tree, the more work we save This heuristic was first used in MaxMiner

9
Zeev Dvir – dvirzeev@post.tau.ac.il Bottlenecks 1.Superset checking : The best algorithms for superset checking give an amortized bound of per operation. that ’ s bad if we have many itemsets in the MFI. 2. Frequency testing : How can we make frequency testing faster ?

10
Zeev Dvir – dvirzeev@post.tau.ac.il Optimizing Superset Checking A technique called “ Progressive Focusing ” is used to narrow down the group of potential supersets, as the recursive calls are made LMFI := Local MFI Before each recursive call, we construct the LMFI for the next call, based on the current LMFI and the new item added.

11
Zeev Dvir – dvirzeev@post.tau.ac.il FGHI FGHJ … FGH FGI … FG … LMFI Example

12
Zeev Dvir – dvirzeev@post.tau.ac.il

13
Frequency Testing Optimization GenMax uses a “ vertical database format ” : For each item, we have a set of all the transactions containing this item. This set is called a tidset. (Transaction ID Set). This method makes support computations easier, because we don ’ t have to go over the entire database.

14
Zeev Dvir – dvirzeev@post.tau.ac.il Vertical Database Item /Tid ABCD 1xxx 2xx 3xxx 4xxxx 5x 6xx 7x A {1, 3, 4, 5} B {1, 3, 4, 6} C {1,2,3,4,7} D {2, 4, 6} t(A) = {1, 3, 4, 5} t(AC) = {1, 3, 4} supp(I) = |t(I)|

15
Zeev Dvir – dvirzeev@post.tau.ac.il ABC ABD ABE AB … = { C, E } t(ABC) t(ABE) Each item y in the combine-set, actually represents the itemset, and stores the tidset associated with it.

16
Zeev Dvir – dvirzeev@post.tau.ac.il Additional Optimization Diffsets: don ’ t store the entire tidsets, only the differences between tidsets (described in “ Fast Vertical Mining Using Diffsets ” )

17
Zeev Dvir – dvirzeev@post.tau.ac.il Experimental Results GenMax is compared with: MaxMiner, MAFIA, MAFIA-PP MaxMiner & MAFIA-PP give the exact MFI, while MAFIA gives a superset of the MFI The Databases used in the experiments are grouped according to the MFI length distribution

18
Zeev Dvir – dvirzeev@post.tau.ac.il Type I Datasets

19
Zeev Dvir – dvirzeev@post.tau.ac.il Type II Datasets

20
Zeev Dvir – dvirzeev@post.tau.ac.il Type III Datasets

21
Zeev Dvir – dvirzeev@post.tau.ac.il Type IV Datasets

22
Zeev Dvir – dvirzeev@post.tau.ac.il

Similar presentations

OK

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on transportation of substances in plants and animals Lc esi ms ppt online Ppt on indian culture vs western culture Ppt on intelligent manufacturing in industrial automation Ppt on electricity for class 10th math Ppt on census 2001 kerala Ppt on non biodegradable wastewater Ppt on brand building and management Ppt online open port Ppt on panel discussion flyer