Download presentation

Presentation is loading. Please wait.

Published byMarvin James Modified over 2 years ago

1
LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo Uchida National Institute of Informatics Kyushu University 19/Nov/2003 FIMI 2003

2
small supports Motivation - We want to solve difficult problems in short time Few solutions for small support Many solutions for even large support #closed set = #freq. set #closed set << #freq. set retail accidents IBMdatas chess connect mushroom kosarak pumsb* pumsb BMS POS BMS web1,2 ・ database reduction ・ remove infrequent items ・ sparse/dense (occ-deliv/diffsets) (occ-deliv/diffsets) ・ exact enumeration of closed item set of closed item set ・ generation of all/maximal item set from closed item set large supports

3
Outline of Our Research Exact enumeration - Exact enumeration of closed item sets (no sophisticated pruning, post processing, nor memory for obtained closed item sets) - Enumerate all/maximal frequent item sets using closed item set - Algorithms for updating occurrences/maximality check adaptive hybrid in dense/sparse cases, and their adaptive hybrid Save additional memoryuse - Save additional memory use (right first sweep, adjacency matrix only for large transactions)

4
parent-child relationship - Introduce acyclic parent-child relationship on freq. closed sets tree-shaped transversal route ( it induces a tree-shaped transversal route ) depth-first manner - Traverse the route in depth-first manner ( find a child, and go to it ) Exact Enumeration of Closed Item Sets Exact enumeration (linear time to #closed set) Any child is found by taking closure (in short time) Not need to store obtained item sets (small memory) can enumerate all closed item sets (even without min. support) root root (= φ)

5
X : closed item set parent of X = closure of X∩{1,…,i} where i is the maximum s.t. X ≠closure of X∩{1,…,i} parent of X ⊆ X, acyclic X' = child of X ⇔ X' is closure of X ∪ {i} for some i and (cond) X' ＼ X includes no item *
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/14/4245824/slides/slide_5.jpg",
"name": "X : closed item set parent of X = closure of X∩{1,…,i} where i is the maximum s.t.",
"description": "X ≠closure of X∩{1,…,i} parent of X ⊆ X, acyclic X = child of X ⇔ X is closure of X ∪ {i} for some i and (cond) X ＼ X includes no item
*

6
Computation of Occurrences X ∪ {i} for Sparse and Dense Cases - In sparse case, by tracing items of each occurrence of X (occurrence deliver : maybe a known technique) - In dense case, use diffsets (proposed by Zaki) Adaptive Hybrid Algorithm We choose best one according to estimations of computation time in each iterations

7
- Maximal frequent sets generated from closed item sets - All frequent sets (hypercube decomposition) -- decompose classes of closed item sets into complete sublattices -- enumerate pairs of greatest/least elements of sublattices -- generate others from the pairs Maximal and All Frequent Sets Maximal and All Frequent Sets 000 0 111 1 closed item set class 01 lattice

8
Result retail accidents IBMdatas chess connect mushroom kosarak pumsb* pumsb BMS POS BMS web1,2 fast if support is small fast or usual Slower than others large supports small supports fast

9
Conclusion - For data sets s.t. #freq. closed sets << #freq. sets - large business datasets: BMS-web1,2, retails - machine learning datasets with small supports: UCI repository exact enumeration exact enumeration of closed item sets and hypercube decomposition hypercube decomposition perform well - These techniques are orthogonal to other techniques, ( ・ database reduction, ・ pruning infrequent items,… ) we can do better for large supports / accidents (blue area). hybrid - Parameter of hybrid is not tuned not fast for kosarak, IBMdatas now faster For further speed up Fast without pruning, trie, other existing method

10
We think… What are the real problem (bottleneck) ? ● What are the real problem (bottleneck) ? ---- Mining structured item sets (closed item sets, association rule with threshold,… ) Is it only a counting problem ? ● Is it only a counting problem ? ---- for all frequent item set mining, Yes. the problem is how to make the occurrences of an item set from other item sets (choose best way, represent Is maximal item set useful ? ● Is maximal item set useful ? ---- closed item set is useful!! have an application for classification, association rule mining

11
Usually, < 1/2 Really need to prune ? - Computing occurrences for infrequent items from X Some Observations X X ∪ {1} X ∪ {2} X ∪ {3} X ∪ {4} X ∪ {5} frequency - Almost computation is for updating occurrences - There is a best e to get occurrence of X from X - e Can we design algorithm choosing e in each iteration ? how we find this e ? Does this accelerate? ( we can evaluate the lower bound of occurrence computation ) Pruning of infrequent sets really necessary? Need for accelerating occurrence computation ?

12
Usually, < 1/2 - Computing occurrences for infrequent items from X Some Observations Really need to prune ? X X ∪ {10} X ∪ {11} X ∪ {12} X ∪ {13} X ∪ {14} frequency

13
- Generate recursive calls in decreasing order of items - Clear memory after the recursive call - Re-use the memory in the following recursive calls Right First Sweep Child iterations need no memory X ∪ {10} X ∪ {11} X ∪ {12} X ∪ {13} X ∪ {14} A A A B B CD D D E

14
Compute T(X ∪ {i}) by tracing each occurrence of X Occurrence deliver In sparse cases, fast EDCBAEDCBA X ∪ {10} X ∪ {11} X ∪ {12} X ∪ {13} X ∪ {14} A A A B B CD D D E

15
- Check (cond) closure of X ∪ {i} ＼ X includes no item *
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/14/4245824/slides/slide_15.jpg",
"name": "- Check (cond) closure of X ∪ {i} ＼ X includes no item
*

16
Results Results all closed maximal

Similar presentations

OK

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on surface water data Ppt on electricity from waste Cg ppt online registration 2012 Cathode ray tube display ppt online Ppt on event driven programming c# Ppt on shell scripting definition Ppt on ethanol production process Ppt online viewer for pdf Ppt on insulator manufacturing process Ppt on polynomials in maths lesson