1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010.

1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010

Outline 2 Introduction Background Problem Definition Related Work Proposed Method UP-Tree Structure UP-Growth Method Experimental Results Conclusions and Discussion

Introduction 3 In the real world, each item in the supermarket has a different profit and single transaction may take same item multiple times. But finding traditional frequent patterns algorithm has two limitations: It treats all items with the same price. In one transaction each item appears in a binary(0/1), i.e. either present or absent.

(Cont.) 4 The utility of items in a transaction database consists of two aspects: External utility: the importance of distinct items. Internal utility: the importance of the items in the transaction. The utility of an itemset is defined as the external utility multiplied by the internal utility. High utility itemset: its utility is no less than a user- specified threshold.

(Cont.) 5 Mining high utility itemsets from the databases is not an easy task since a superset of a low utility itemset may be a high utility itemset. Existing studies applied overestimated methods to facilitate the mining performance of utility mining.

Background 6 In this section, we first define the problem of utility mining and then describe the previous works of utility mining.

Problem Definition 7 If TWU(X) is no less than the minimum utility threshold, X is called a high transaction-weighted utilization itemset (abbreviated as HTWUI) u(i p,T d )=p(i p )*q(i p, T d ) u({A},T 1 )=5*1=5 u({AC},T 1 )=u({A},T 1 )+u({C},T 1 )= 5+1=6 u({AD})=u({AD},T 1 )+u({AD},T 3 ) =7+17=24 TU(T 1 )=u({ACD},T 1 )= 8 TWU({AD})=TU(T 1 )+TU(T 3 )=8+ 30=38 The transaction-weighted downward closure(TWDC): For any itemset X, if X is not a HTWUI, any superset of X is a low utility itemset. An itemset is called a high utility itemset if its utility is no less than min_util

Related Work 8 Two-Phase Phase 1 It generates candidate itemsets of length k from HTWUIs of length(k-1) and prunes candidate itemsets by TWDC property. Phase 2 High utility itemsets and their utilities are identified from the HTWUIs by scanning original database once.

(Cont.) 9 IHUP Step 1: construction of IHUP-Tree Items in transaction are rearranged in TWU descending order. Step 2: generation of HTWUIs HTWUIs are generated from the IHUP-Tree by applying the FP- Growth algorithm. Step 3: identification of high utility itemsets High utility itemsets and their utilities are identified from the HTWUIs by scanning original database once.

(Cont.) 10 These models may overestimate too many low utility itemsets as HTWUIs and produce too many candidate in phase 1. The more HTWUIs are generated in phase 1, the more execution time is required identifying high utility itemsets in phase 2

Proposed Method Construction of UP-Tree The construction of UP-Tree can be performed with two scans of the original database. Generation of potential high utility itemsets (PHUIs) from the UP-Tree by UP-Growth 11

DGU 12 min_util= 40 First scan unpromising items Descending order of TWU Strategy 1. Discarding global unpromising items.

(Cont.) 13 Second scan

(Cont.) 14 1, 8

(Cont.) 15 2, 30 1, 22

(Cont.) 16 2, 30 1, 22

(Cont.) 17

Generating PHUIs from the global UP- tree 18 {D}’s conditional pattern base ({D}-CPB) An item i p is called a local promising item in {a i }-CPB if pu(i p, {a i }-CPB) is no smaller than min_util; {A}is a local unpromising item in {D}- CPB, any superset of {A} is not a high utility itemset.

(Cont.) Generating PHUIs from {D}-Tree: {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53} 19

DGN 20 Strategy 2. Discarding global node utilities (DGN) The utilities of its descendants are discarded from the utility of the node during the construction of a global UP-Tree {B}’s-CPB

(Cont.) 21

(Cont.) 22 11

(Cont.) 23 11

(Cont.) 24 27 {C}.nu=1+p({C})×q({C}, T 2 ’)=1+1×6=7

(Cont.) 25 27 {E}.nu=p({C})×q({C}, T 2 ’)+p({E})×q({E}, T 2 ’)=1×6+3×2=12 112

(Cont.) 26 27 {E}.nu=p({C})×q({C}, T 2 ’)+p({E})×q({E}, T 2 ’)+p({A})×q({A}, T 2 ’)=1×6+3×2+5×2=22 112 122

(Cont.) 27

UP-Growth For efficiently generating PHUIs from the global UP- Tree with two strategies: DLU(Discarding local unpromising items) DLN(Decreasing local node utilities) Strategies DGU and DGN cannot be applied during the construction of the local UP-Tree: The individual items and their utilities are not maintained in the conditional pattern base. 28

DLU Due to memory space limit, instead of maintaining exact utility values of the items in the conditional pattern base, we maintain a minimum item utility table(MIUT). Strategy 3. Discarding local unpromising items(DLU) The MIUT of unpromising items are discarded from path utilities of the paths during the construction of a local UP-Tree 29

(Cont.) 30 8-miu({A})× {AC}.count = 5×1 = 5 25-miu({A})× {BAEC}.count = 5×1 = 5

DLN Strategy 4. Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 31 13

DLN Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 32 2 16 3+{20-miu({B})×1-miu({E}) ×1} = 3+13 = 16 1 17 1 20 20-miu({E})×1 = 20-3= 17

DLN Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 33 3 29 16+{20-miu({B})×1-miu({E}) ×1} = 16+13 = 29 2 34 2 40 17+20-miu({E})×1 = 17+17= 34

Experimental Results 34

Experimental Results 35

Conclusions This paper proposed an efficient UP-Growth algo. For mining high utility itemsets. A UP-Tree structure is proposed for maintaining the information of high utility itemsets. By four strategies, the mining performance is enhanced significantly since both the search space and the number of candidates are effectively reduced. 36

Discussions Strongest part of this paper Outperforms other algorithm substantially in terms of execution time, especially when database contains lots of long transaction. Possible improvement DGU can be used repeatedly till all reorganized transactions contain no unpromising items. Possible extension & applications find rare high utility itemset In medical application, the rare combination of symptoms can provide useful insights for doctors. 37

1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010.

Similar presentations

Presentation on theme: "1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010.

Similar presentations

Presentation on theme: "1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010."— Presentation transcript:

Similar presentations

About project

Feedback