A Fast High Utility Itemsets Mining Algorithm Ying Liu,Wei-keng Liao,and Alok Choudhary KDD’05 Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques Association Rule
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Adaptive Frequency Counting over Bursty Data Streams Bill Lin, Wai-Shing Ho, Ben Kao and Chun-Kit Chui Form CIDM07.
Rakesh Agrawal Ramakrishnan Srikant
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
DATA MINING -ASSOCIATION RULES-
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Fast Algorithms for Association Rule Mining
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
1 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Mining High Utility Itemset in Big Data
Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Mining Top-K High Utility Itemsets Date: 2013/04/08 Author: Cheng Wei Wu, Bai-En Shie, Philip S. Yu, Vincent S. Tseng Source: KDD ’12 Advisor: Dr. Jia-Ling.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Data Mining Find information from data data ? information.
Association Rule Mining
1/24 Novel algorithm for mining high utility itemsets Shankar, S. Purusothaman, T. Jayanthi, S. International Conference on Computing, Communication and.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Elsayed Hemayed Data Mining Course
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international.
UP-Growth: An Efficient Algorithm for High Utility Itemset Mining
Association Rules Repoussis Panagiotis.
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining II: Association Rule mining & Classification
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
A Parameterised Algorithm for Mining Association Rules
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Dynamically Maintaining Frequent Items Over A Data Stream
Presentation transcript:

A Fast High Utility Itemsets Mining Algorithm Ying Liu,Wei-keng Liao,and Alok Choudhary KDD’05 Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen

Introduction Traditional ARM model treat all the items in the database equally by only considering if an item is present in a transaction or not. Frequent itemsets may only contribute a small portion of the overall profit, whereas non-frequent itemsets may contribute a large portion of the profit. Frequency is not sufficient to answer questions, such as whether an itemset is highly profitable, or whether an itemset has a strong impact.

Introduction (conti.) Why utility mining ? support threshold = 10% Utility mining is likely to be useful in a wide range of practical applications. Frequent or not supportprofit {milk,bread}Y40%4% {birthday cake, birthday card} N8%

Introduction (conti.) What is utility mining ? The utility of an item or itemset is based on local transaction utility and external utility. (b) The utility table. (a) Transaction table. u({B, D}) = (6×10+1×6) + (10×10+1×6) = 172  high utility itemset > utility threshold 120

Definition I = {i1, i2, …, im} is a set of items. D = {T1, T2, …, Tn} be a transaction database where each transaction Ti ∈ D o(ip, Tq), local transaction utility value, represents the quantity of item ip in transaction Tq. For example, o(A, T8) = 3, in Table 1(a). s(ip), external utility, is the value associated with item ip in the Utility Table. This value reflects the importance of an item, which is independent of transactions. For example, in Table 1(b), the external utility of item A, s(A), is 3.

Definition (conti.)

u(A, T8) = 3×3 = 9 u({A, D, E}, T8) = u(A, T8) + u(D, T8) + u(E, T8) = 3×3 + 3×6 + 1×5 = 32 u({A,D, E}) = u({A, D, E}, T4) + u({A, D, E}, T8) = = 46. If ε= 120, {A, D, E} is a low utility itemset.

MEU(Mining using Expected Utility) MEU prunes the search space by predicting the high utility k-itemset, I k, with the expected utility value, denoted as u’(I k ).

MEU u’({B,D,E})

MEU Drawbacks : (1)pruning the candidate : When m is small, the term (k-m)/(k-1) is close to 1,or even greater then 1,so u’(I k ) is likely greater than ε, Therefore, this estimation does not prune the candidates effectively at the beginning stages. (2)Accuracy : if ε = 40 in our example, the expected utility of itemset u’( {C, D, E} ) is u({C, D, E}) = 48 > ε,it is indeed a high utility itemset

Two-phase Algorithm—Phase 1 Definition 1. (Transaction Utility) The transaction utility of transaction Tq, denoted as tu(Tq), is the sum of the utilities of all item in Tq, Definition 1. (Transaction-weighted Utilization) The transaction-weighted utilization of an itemset X, denoted as twu(X), is the sum of the transaction utilities of all the transactions containing X:

Phase 1 tw(A) = tu(T3) + tu(T4) + tu(T6) + tu(T8) + tu(T9) = = 109 twu({A, D}) =tu(T4) + tu(T8) = = 71.

Phase1 Definition 3. (High Transaction-weighted Utilization Itemset) For a given itemset X, X is a high transaction-weighted utilization itemset if twu(X) >=ε’, where ε’ is the user specified threshold.

Phase1 Theorem 1. (Transaction-weighted Downward Closure Property) Let I k be a k-itemset and I k-1 be a (k-1)-itemset such that I k-1 I k. If I k is a high transaction-weighted utilization itemset, I k-1 is a high transaction-weighted utilization itemset. proof:

Phase1

Advantage : (1)Less candidate : Whenε’ is large, the search space can be significantly reduced at the second level and higher levels. (2)Accuracy Based on Theorem 2, if we let ε’=ε, the complete set of high utility itemsets is a subset of the high transaction-weighted utilization itemsets discovered by our transaction-weighted utilization mining model.

Phase 2 In Phase II, one database scan is required to select the high utility itemsets from high transaction-weighted utilization itemsets identified in Phase I. high utility itemsets ({B}, {B, D}, {B, E} and {B,D, E}) (in solid black circles) are covered by the high transaction weighted utilization itemsets Nine itemsets in circles are maintained after Phase I, and one database scan is performed in Phase II to prune 5 of the 9 itemsets since they are not high utility itemsets.

Phase1

Experimental Results OS: Linux CPU: 4 Gbytes memory, RAM: 512MB Two databases : -Synthetic Data from IBM Quest Data Generator T10.I6.DX000K(average transaction size is 10); T20.I6.DX000K(average transaction size is 20); -Real-World Market Data -a real world data from a major grocery chain store in California. -1,112,949 transactions and 46,086 items in the database. -Each transaction consists of the products and the sales volume of each product purchased by a customer at a time point.

Experimental Results