Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.

Similar presentations


Presentation on theme: "Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25."— Presentation transcript:

1 Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25 1

2 Outline 2010/8/25 2 Motivation Problem Definition Method UP-Tree Structure UP-Growth Method Experimental Results Conclusions

3 Motivation 2010/8/25 3 The unit profits and purchased quantities of the items are not taken into considerations in frequent itemset mining. The basic meaning of utility is the interestedness/ importance/profitability of items to the users.

4 (Cont.) 2010/8/25 4 The utility of items in a transaction database consists of two aspects: External utility: the importance of distinct items. Internal utility: the importance of the items in the transaction. The utility of an itemset is defined as the external utility multiplied by the internal utility. High utility itemset: its utility is no less than a user- specified threshold.

5 (Cont.) 2010/8/25 5 Mining high utility itemsets from the databases is not an easy task since the downward closure property used in frequent itemset mining cannot be applied here. How to effectively prune the search space and efficiently capture all high utility itemsets with no miss is a big challenge.

6 Problem Definition 2010/8/25 6 If TWU(X) is no less than the minimum utility threshold, X is called a high transaction- weighted utilization itemset (abbreviated as HTWUI) u(i p,T d )=p(i p )*q(i p, T d ) u({A},T 1 )=5*1=5 u({AC},T 1 )=u({A},T 1 )+u({C},T 1 )=5+1=6 u({AD})=u({AD},T 1 )+u({AD}, T 3 )=7+17=24 TU(T 1 )=u({ACD},T 1 )= 8 TWU({AD})=TU(T 1 )+TU(T 3 ) =8+30=38 The transaction-weighted downward closure(TWDC): For any itemset X, if X is not a HTWUI, any superset of X is a low utility itemset. An itemset is called a high utility itemset if its utility is no less than min_util

7 Proposed Method 2010/8/25 7 Construction of UP-Tree Generation of potential high utility itemsets (PHUIs) from the UP-Tree by UP-Growth

8 Construction of UP-Tree 2010/8/25 8 The construction of UP-Tree can be performed with two scans of the original database. First scan TU of each transaction is computed. TWU of each single item is also accumulated. Discarding global unpromising items. Unpromising items are removed from the transaction and utilities are eliminated from the TU of the transaction. The remaining promising items in the transaction are sorted in the descending order of TWU. Second scan Transactions are inserted into UP-Tree.

9 (Cont.) 2010/8/25 9 min_util= 40 First scan unpromising items Descending order of TWU

10 (Cont.) 2010/8/25 10 Second scan

11 (Cont.) 2010/8/25 11 18

12 (Cont.) 2010/8/25 12 18

13 (Cont.) 2010/8/25 13 230 1 1 22

14 (Cont.) 2010/8/25 14 Strategy 1. Discarding global unpromising items (DGU).

15 Generating PHUIs from the global UP- tree 2010/8/25 15 {D}’s conditional pattern base ({D}-CPB) An item i p is called a local promising item in {a i }-CPB if pu(i p, {a i }-CPB) is no smaller than min_util; {A}is a local unpromising item in {D}-CPB, any superset of {A} is not a high utility itemset.

16 (Cont.) 2010/8/25 16 Generating PHUIs from {D}-Tree: {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53} A set of PHUIs is {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53}, {B}:61 {BE}:54, {BEC}:54, {BC}:54, {A}:65, {AC}:55, {ACE}:47, {AE}:47, {E}:88, {EC}:76, {C}:96}.

17 Decreasing global node (DGN) utilities in construction of a global UP-Tree 2010/8/25 17 Strategy 2. Discarding global node utilities (DGN) The utilities of its descendants are discarded from the utility of the node during the construction of a global UP-Tree {B}’s-CPB

18 (Cont.) 2010/8/25 18

19 (Cont.) 2010/8/25 19 11

20 (Cont.) 2010/8/25 20 11

21 (Cont.) 2010/8/25 21 27 {C}.nu=1+p({C})×q({C}, T 2 ’)=1+1×6=7

22 (Cont.) 2010/8/25 22 27 {E}.nu=p({C})×q({C}, T 2 ’)+p({E})×q({E}, T 2 ’)=1×6+3×2=12 112

23 (Cont.) 2010/8/25 23 27 {E}.nu=p({C})×q({C}, T 2 ’)+p({E})×q({E}, T 2 ’)+p({A})×q({A}, T 2 ’)=1×6+3×2+5×2=22 112 122

24 (Cont.) 2010/8/25 24 A set of PHUIs is {{D}:58, {DE}:45, {DEB}:45, {DEBC}:45, {DEC}:45, {DB}:45, {DBC}:45, {DC}:53, {B}:61, {A}:65, {E}:88, {C}:96}.

25 UP-Growth 2010/8/25 25 For efficiently generating PHUIs from the global UP-Tree with two strategies: DLU(Discarding local unpromising items) DLN(Decreasing local node utilities)

26 DLU 2010/8/25 26 Due to memory space limit, instead of maintaining exact utility values of the items in the conditional pattern base, we maintain a minimum item utility table(MIUT). Strategy 3. Discarding local unpromising items(DLU) The MIUT of unpromising items are discarded from path utilities of the paths during the construction of a local UP-Tree

27 (Cont.) 2010/8/25 27 8-miu({A})× {AC}.count = 5×1 = 5 25-miu({A})× {BAEC}.count = 5×1 = 5

28 DLN 2010/8/25 28 Strategy 4. Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 13

29 DLN 2010/8/25 29 Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 2 16 3+{20-miu({B})×1-miu({E}) ×1} = 3+13 = 16 1 17 1 20 20-miu({E})×1 = 20-3= 17

30 DLN 2010/8/25 30 Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 3 29 16+{20-miu({B})×1-miu({E}) ×1} = 16+13 = 29 2 34 2 40 17+20-miu({E})×1 = 17+17= 34

31 Experimental Results 2010/8/25 31

32 Scalability 2010/8/25 32

33 Conclusions 2010/8/25 33 This paper proposed an efficient UP-Growth algo. For mining high utility itemsets. A UP-Tree structure is proposed for maintaining the information of high utility itemsets By four strategies, the mining performance is enhanced significantly since both the search space and the number of candidates are effectively reduced.


Download ppt "Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25."

Similar presentations


Ads by Google