1 Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005.

1 Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005

2 Outline Introduction Related Work Enhanced Fast Share Measure (EFSM) Algorithm Enhanced Fast Share Measure (EFSM) Algorithm Support-Counted Fast Share Measure (SuFSM) Algorithm Support-Counted Fast Share Measure (SuFSM) Algorithm Share-Counted Fast Share Measure (ShFSM) Algorithm Share-Counted Fast Share Measure (ShFSM) Algorithm Experimental Results Conclusions

3 Introduction (1/2) Goal: discovering the buying patterns of customers Itemset: a group of items (products) bought together in a transaction Support: the ratio of transactions containing the itemset to the total transaction number (limited in informative feedback) Share: the ratio of the total count of items in the itemset to the total count of items in the database

4 Introduction (2/2) Share-confidence framework: providing useful information about numerical values associated with transaction items ( Carter et al., 1997) Share-frequent (SH-frequent) itemset: usually includes some infrequent subsets Fast Share Measure (FSM) algorithm discovers share-frequent itemsets on small dataset efficiently This study proposes Enhanced FSM, SuFSM and ShFSM to discover share-frequent itemsets more efficiently than that of FSM

5 Related Work Support-Confidence Framework (Agrawal et al., 1993) Each item is a binary variable denoting whether an item was purchased Apriori (Agrawal & Swami, 1994) & Apriori-like algorithms Pattern-growth algorithms (Han et al., 2000; Han et al, 2004) Share-Confidence Framework ( Carter et al., 1997 ) Support-confidence framework does not analyze the exact number of products purchased The support count method does not measure the profit or cost of an itemset Exhaustive search algorithm (Carter et al., 2000) FSM algorithm (Li et al., 2005)

6 Related Work Apriori algorithm (Agrawal and Srikant, 1994): minSup = 40%

7 Share-Confidence Framework Measure value: mv(i p, T q ) mv({D}, T01) = 1 mv({C}, T03) = 3 Transaction measure value: tmv(T q ) = tmv(T02) = 9 Total measure value: Tmv(DB)= Tmv(DB)=44 Itemset measure value: imv(X, T q )= imv({A, E}, T02)=4 Local measure value: lmv(X)= lmv({BC})=2+4+5=11

8 minShare=30% Itemset share : SH(X)= SH({BC})= 11/44=25 % SH-frequent: if SH(X) >= minShare, X is a share-frequent (SH-frequent) itemset

9 Existing algorithms ZP(Zero Pruning) 、 ZSP(Zero Subset Pruning) Variants of exhaustive search Prune the candidate itemsets whose local measure values are exactly zero FSM(Fast Share Measure) (Li et al., 2005) Fast on a small dataset Generate too many candidates Existing algorithms are inefficient on a large datasets

10 ZP Algorithm

11 ZSP Algorithm

12 FSM: Fast Share Measure Algorithm ML: Maximum transaction length in DB MV: Maximum measure value in DB Let min_lmv=minShare×Tmv Let CF (X) FSM = lmv(X)+(lmv(X)/k)×MV ×(ML-k) If CF (X) FSM < min_lmv, all supersets of X are infrequent

13 FSM: Fast Share Measure Algorithm minShare=30%, ML=6, MV=3, TMV=44 min_lmv=14 Prune X if CF (X) FSM <min_lmv Let X={A B C} CF (X) FSM =3+(3/3)×3×(6-3)=12<14=min_lmv

14 Enhanced FSM (EFSM) Algorithm EFSM: instead of joining arbitrary two itemsets in RC k-1, EFSM joins arbitrary itemset of RC k-1 with a single item in RC 1 to generate C k efficiently Reduce time complexity from O(n 2k-2 ) to O(n k )

15 SuFSM (Support-counted FSM) X k+1 : arbitrary superset of X with length k+1 in DB S(X k+1 ): the set which contains all X k+1 in DB db S(X k+1 ) : the set of transactions of which each transaction contains at least one X k+1 SuFSM and ShFSM from EFSM which prune the candidates more efficiently than FSM SuFSM (Support-counted FSM): Theorem 1. If lmv(X)+Sup(S(X k+1 ))×MV×(ML – k)< min_lmv, all supersets of X are infrequent

16 SuFSM (Support-counted FSM) lmv(X)/k Sup(X) Sup(S(X k+1 )) EX. lmv({BCD})/k=15/3=5, Sup({BCD})=3, Sup(S({BCD} k+1 ))=2 If there is no superset of X is an SH-frequent itemset, then the following three equations hold lmv(X)+(lmv(X)/k)×MV× (ML - k) < min_lmv lmv(X)+Sup(X) ×MV× (ML - k) < min_lmv lmv(X)+Sup(S(X k+1 )) ×MV× (ML - k) < min_lmv

17 ShFSM (Share-counted FSM) db S(X k+1 ) : the set of transactions of which each transaction contains at least one X k+1 ShFSM (Share-counted FSM): Theorem 2. If Tmv(db S(X k+1 ) ) < min_lmv, all supersets of X are infrequent FSM:lmv(X)+(lmv(X)/k)×MV× (ML - k) < min_lmv SuFSM: lmv(X)+Sup(S(X k+1 )) ×MV× (ML - k) < min_lmv ShFSM: Tmv(db S(X k+1 ) ) < min_lmv CF(X) FSM >=CF(X) SuFSM >=CF(X) ShFSM

18 FSM:lmv(X)+(lmv(X)/k)×MV× (ML - k) < min_lmv SuFSM: lmv(X)+Sup(S(X k+1 )) ×MV× (ML - k) < min_lmv ShFSM: Tmv(db S(X k+1 ) ) < min_lmv Ex. X = {BCD} CF(X) FSM = 9+(9/3) ×3×(6-3)=36 CF(X) SuFSM = 9+2 ×3×(6-3)=18 CF(X) ShFSM = 6+8=14

19 ShFSM (Share-counted FSM) Ex. X={AB} Tmv(db S(X k+1 ) ) = tmv(T01)+tmv(T05) =6+6=12 <14 = min_lmv

20 Experimental Results (1/3) PC: Pentium IV 1.5 GHZ, 1.5GB SDRAM, running Windows XP professional All algorithms were coded in VC++ 6.0 Figure 1Figure 2

21 Experimental Results (2/3) minShare=0.1% Figure 3Figure 4

22 Experimental Results (3/3) T6.I4.D100k.N200.S10 minShare = 0.1% ML=20, MV=10 Tmv=2,302,443 Method Pass (k) FSMEFSMSuFSMShFSMFkFk k=1 CkCk 200 159 RC k 200 199197 k=2 CkCk 19900 1970119306 1844 RC k 16214 133127199 k=3 CkCk 829547 564324190607 101 RC k 251877 997659792 k=4 CkCk 3290296 79304220913 0 RC k 332877 410571420 k=5 CkCk 393833 250031050 5 RC k 71420 19720959 k=6 CkCk 26137 11582518 8 RC k 25562 11045506 k=7 CkCk 11141 5940204 7 RC k 11099 5827196 k=8 CkCk 4426 279758 1 RC k 4423 275054 k>=9 CkCk 2036 15671212 0 RC k 2030 151310 Time(sec)13610.471.5529.6710.95

23 Conclusions This study proposes the Enhanced FSM (EFSM) algorithm to efficiently reduce the time complexity of the join step We have also developed SuFSM and ShFSM from EFSM SuFSM and ShFSM can efficiently prune the candidates, and significantly improve the performance The experimental results have indicated that ShFSM has the best performance In the future, we plan to develop even more advanced algorithms to accelerate the process of identifying all share-frequent itemsets

24 Thank You

1 Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005.

Similar presentations

Presentation on theme: "1 Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005.

Similar presentations

Presentation on theme: "1 Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005."— Presentation transcript:

Similar presentations

About project

Feedback