Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.

Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley Western Australia PAKDD08 Efficient Mining of High Utility Itemsets from Large Datasets 1

Outline Introduction Preliminaries Method – Compressed Transaction Utility-Prol Experiments Conclusions 2

Introduction The goal of frequent itemset mining is to find items that co-occur in a transaction database above a user given frequency threshold, without considering the quantity or weight such as profit of the items. Quantity and weight are significant for addressing real world decision problems that require maximizing the utility in an organization. TwoPhase based on Apriori is suitable for sparse data sets with short patterns, CTU-Mine based on the pattern growth is suitable for dense data. 3

Definition u(3 4, t1) =$60 u(3 4, t3)=$60 u(3 4) = $120, 4

Definition Transaction Utility : Transaction weighted Utility: tu(1) = 80 twu(3 4)=$190 5

Compressed Transaction Utility-Prol 99<min_Utility(129.9) GlobalItem index 12345- Original item id 512436 Profit 51015035252 Quantity 60124542 TWU 98 7 96 4 8105954229 6

Compressed Utility Pattern-Tree Parallel projection of transaction database 7

CUP-tree Traverse index 1 (110) from 5, 2 (310) from (2,3,4), 3 (195) from 2, and 4 (190)from (3,5) 8

ProCUP-tree index 1 (110) from 5, cause 110<min_Utility(129.9) 2 (310) from (2,3,4),3 (195) from 2, and 4 (190)from (3,5) 9

ProCUP-tree oriUtility*itemQuantity + proUtility*proQuantity = Utility 35*2+25*2=120, 150*1+25*1=175, 10*5+25*3=125 High_Utility_Itemset = (3,2) (3,2,1) GlobalItem index 12345 Original item id 51243 ProItem index --123 Profit 5101503525 Quantity 6012454 TWU 987964810595422 10

Experiments 11

Conclusion CTU-Pro algorithm to mine the complete set of high utility itemsets from both sparse and relatively dense datasets with short or longer high utility patterns. The algorithm adapts to large data by constructing parallel subdivisions on disk that can be mined independently. 12

Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.

Similar presentations

Presentation on theme: "Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.

Similar presentations

Presentation on theme: "Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley."— Presentation transcript:

Similar presentations

About project

Feedback