Presentation is loading. Please wait.

Presentation is loading. Please wait.

LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo.

Similar presentations


Presentation on theme: "LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo."— Presentation transcript:

1 LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo Uchida National Institute of Informatics Kyushu University 19/Nov/2003 FIMI 2003

2 small supports Motivation - We want to solve difficult problems in short time Few solutions for small support Many solutions for even large support #closed set = #freq. set #closed set << #freq. set retail accidents IBMdatas chess connect mushroom kosarak pumsb* pumsb BMS POS BMS web1,2 ・ database reduction ・ remove infrequent items ・ sparse/dense (occ-deliv/diffsets) (occ-deliv/diffsets) ・ exact enumeration of closed item set of closed item set ・ generation of all/maximal item set from closed item set large supports

3 Outline of Our Research Exact enumeration - Exact enumeration of closed item sets (no sophisticated pruning, post processing, nor memory for obtained closed item sets) - Enumerate all/maximal frequent item sets using closed item set - Algorithms for updating occurrences/maximality check adaptive hybrid in dense/sparse cases, and their adaptive hybrid Save additional memoryuse - Save additional memory use (right first sweep, adjacency matrix only for large transactions)

4 parent-child relationship - Introduce acyclic parent-child relationship on freq. closed sets tree-shaped transversal route ( it induces a tree-shaped transversal route ) depth-first manner - Traverse the route in depth-first manner ( find a child, and go to it ) Exact Enumeration of Closed Item Sets  Exact enumeration (linear time to #closed set)  Any child is found by taking closure (in short time)  Not need to store obtained item sets (small memory) can enumerate all closed item sets (even without min. support) root root (= φ)

5 X : closed item set parent of X = closure of X∩{1,…,i} where i is the maximum s.t. X ≠closure of X∩{1,…,i}  parent of X ⊆ X, acyclic X' = child of X ⇔ X' is closure of X ∪ {i} for some i and (cond) X' \ X includes no item

6 Computation of Occurrences X ∪ {i} for Sparse and Dense Cases - In sparse case, by tracing items of each occurrence of X (occurrence deliver : maybe a known technique) - In dense case, use diffsets (proposed by Zaki) Adaptive Hybrid Algorithm We choose best one according to estimations of computation time in each iterations

7 - Maximal frequent sets  generated from closed item sets - All frequent sets (hypercube decomposition)  -- decompose classes of closed item sets into complete sublattices -- enumerate pairs of greatest/least elements of sublattices -- generate others from the pairs Maximal and All Frequent Sets Maximal and All Frequent Sets closed item set class 01 lattice

8 Result retail accidents IBMdatas chess connect mushroom kosarak pumsb* pumsb BMS POS BMS web1,2 fast if support is small fast or usual Slower than others large supports small supports fast

9 Conclusion - For data sets s.t. #freq. closed sets << #freq. sets - large business datasets: BMS-web1,2, retails - machine learning datasets with small supports: UCI repository exact enumeration exact enumeration of closed item sets and hypercube decomposition hypercube decomposition perform well - These techniques are orthogonal to other techniques, ( ・ database reduction, ・ pruning infrequent items,… )  we can do better for large supports / accidents (blue area). hybrid - Parameter of hybrid is not tuned  not fast for kosarak, IBMdatas  now faster For further speed up Fast without pruning, trie, other existing method

10 We think… What are the real problem (bottleneck) ? ● What are the real problem (bottleneck) ? ---- Mining structured item sets (closed item sets, association rule with threshold,… ) Is it only a counting problem ? ● Is it only a counting problem ? ---- for all frequent item set mining, Yes. the problem is how to make the occurrences of an item set from other item sets (choose best way, represent Is maximal item set useful ? ● Is maximal item set useful ? ---- closed item set is useful!! have an application for classification, association rule mining

11 Usually, < 1/2 Really need to prune ? - Computing occurrences for infrequent items from X Some Observations X X ∪ {1} X ∪ {2} X ∪ {3} X ∪ {4} X ∪ {5} frequency - Almost computation is for updating occurrences - There is a best e to get occurrence of X from X - e Can we design algorithm choosing e in each iteration ? how we find this e ? Does this accelerate? ( we can evaluate the lower bound of occurrence computation ) Pruning of infrequent sets really necessary? Need for accelerating occurrence computation ?

12 Usually, < 1/2 - Computing occurrences for infrequent items from X Some Observations Really need to prune ? X X ∪ {10} X ∪ {11} X ∪ {12} X ∪ {13} X ∪ {14} frequency

13 - Generate recursive calls in decreasing order of items - Clear memory after the recursive call - Re-use the memory in the following recursive calls Right First Sweep Child iterations need no memory X ∪ {10} X ∪ {11} X ∪ {12} X ∪ {13} X ∪ {14} A A A B B CD D D E

14 Compute T(X ∪ {i}) by tracing each occurrence of X Occurrence deliver In sparse cases, fast EDCBAEDCBA X ∪ {10} X ∪ {11} X ∪ {12} X ∪ {13} X ∪ {14} A A A B B CD D D E

15 - Check (cond) closure of X ∪ {i} \ X includes no item

16 Results Results all closed maximal


Download ppt "LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo."

Similar presentations


Ads by Google