Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Slides:



Advertisements
Similar presentations
Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25.
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Frequent Closed Pattern Search By Row and Feature Enumeration
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
What Is Sequential Pattern Mining?
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Chapter 2: Association Rules & Sequential Patterns.
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Mining High Utility Itemset in Big Data
Sequential Pattern Mining
Mining Frequent Patterns without Candidate Generation.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
Association Rule Mining
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Interactive Discovery of Influential Friends from Social Networks By: Behzad Rezaie In the Name of God Professor: Dr. Mashayekhi May 11, 2014
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Approach to Data Mining from Algorithm and Computation Takeaki Uno, ETH Switzerland, NII Japan Hiroki Arimura, Hokkaido University, Japan.
TITLE What should be in Objective, Method and Significant
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Association rule mining
Frequent Pattern Mining
Mining Frequent Subgraphs
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Transactional data Algorithm Applications
Data Mining Association Analysis: Basic Concepts and Algorithms
Effective Replica Allocation
Discriminative Frequent Pattern Analysis for Effective Classification
FP-Growth Wenlong Zhang.
Association Analysis: Basic Concepts
Presentation transcript:

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 1

Related Work 2

3

4

5

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 6

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 7

Methodologies to Specify Items’ MIS values 8 Liu et. al. (KDD ’99) have introduced percentage-based methodology to specify items’ MIS values. Percentage-based Methodology: – Items’ MIS values are equivalent to the percentage of their respective support. MIS(i j ) = maximum (S(i j ) * β, LS) where, S(i j ) = support of an item i j in I LS = lowest MIS an item can have β = user-specified constant, [0, 1] This methodology still suffer from rare item problem.

Rare Item Problem in Percentage Based Methodology 9

Proposed Methodology to Specify Items’ MIS values 10

Experimental Results 11 Dataset 1.Synthetic dataset Total items: 870 Total number of Transactions: 1,00, Real-world dataset. 1.Total items: 83 2.Total number of transactions: 298 Parameter values: – LS = 0.1 – α = mean of the support of all frequent items. – β = varied at 0.25, 0.5 and 0.9 Algorithms – Apriori algorithm – MSApriori – uses percentage-based methodology – IMSApriori – uses support difference-based methodology. Table 3: SD values used in different datasets.

Experiment 1: Analysis of MIS values specified by both methods 12 Figure: MIS values specified by percentage-based methodology in synthetic dataset. Figure: MIS values specified by support difference-based methodology in synthetic dataset.

Experiment 2: Generation of Frequent Patterns 13 Figure: Generation of frequent patterns in synthetic and retail datasets.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework – Contribution of this thesis Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 14

Improved Multiple Minimum Support Based Frequent Pattern Mining Approach es. 15

CFP-growth Algorithm E.g. 16 ac bd e f g h Items MIS

CFP-growth Algorithm 2. Using the sorted list of items, an FP-tree-like structure known as MIS- tree is constructed with every scan on the transactional database. 17 Figure 19: Construction of MIS- tree. (a) Before scanning the database. (b) After scanning first transaction. (c) After scanning second transaction. (d) After scanning every transaction.

CFP-growth Algorithm 3.From MIS-tree, the items which cannot generate any frequent pattern are removed by using the following criterion. “Items whose support is less than the lowest MIS value among all items cannot generate any frequent pattern.” The lowest MIS value is 2. Therefore, the item ‘h’ that has support less than 2 is removed from the MIS-tree. 18 ac bd e f g h Items MIS Sup.

CFP-growth Algorithm 4.The resultant tree is known as the compact MIS-tree. 19 Figure: Compact MIS-tree created after pruning item ‘h’ from the MIS-tree.

CFP-growth Algorithm 5.The compact MIS-tree is mined using conditional pattern bases to discover complete set of frequent patterns. 6.Since downward closure property no longer holds, the CFP- growth builds conditional pattern bases until it is empty for a suffix pattern. 20 Figure: Mining frequent patterns from the MIS-tree.

Performance Issues in CFP-growth 21 1.The criterion used by CFP-growth to prune the items from the MIS-tree still considers some of those items which cannot generate any frequent pattern. CFP-growth prunes the item ‘h’ and considers ‘a, b, c, d, e, f and g’ items for generating frequent patterns. However, ‘g’ cannot generate frequent pattern as its support is less than the lowest MIS value among all remaining items. 2.Searches in some of those infrequent suffix patterns which cannot generate any frequent pattern at any higher order. ac bd e f g h Items MIS Sup.

An Improved CFP-growth Algorithm: CFP-growth++ 22

Correctness of the observations 23

Four pruning techniques 24

Four pruning techniques 25

Working of CFP-growth++ Algorithm 26

Working of CFP-growth++ Algorithm 27

Step 1: Construction of MIS-tree 28 The algorithm constructs MIS-tree using the user-specified items’ MIS values. Figure :MIS-tree constructed after scanning every transaction in the database.

Step 2: Construction of Compact MIS- tree 29 Using least minimum support, CFP-growth++ prunes all those items which cannot generate any frequent pattern at higher order. Figure : MIS-tree after completely pruning the items ‘g’ and ‘h’. Note that ‘g’ is not pruned in CFP-growth.

Step 2: Construction of Compact MIS-tree 30 Using infrequent leaf node pruning, the leaf nodes of the infrequent items are pruned from the MIS-tree. The resultant tree is known as compact MIS-tree. Figure : Compact MIS-tree generated after infrequent node pruning.

Step 3: Mining Compact MIS-tree 31 Using conditional minimum support and conditional closure property, compact MIS-tree is mined using conditional pattern bases to discover complete set of frequent patterns. Figure : Mining Compact MIS-tree Using Conditional Pattern Bases.

Experimental Results 32 Table 4: Dataset characteristics. Datasets Percentage-based methodology is used for specifying items’ MIS values. LS=minsup=0.1 β=1/α and varied α from 1 to 20.

Experiment 1: Generation of Frequent patterns. 33 Figure : Generation of frequent patterns in different datasets.

Experiment 2: Runtime Requirements 34 Figure : Runtime taken by various algorithms in different datasets.

Experiment 3: Scalability Test β=0.5 and LS=0.1 Experimental procedure – Dataset: Kosark – We divided the dataset into five portions of 0.2 million transactions in each part. – Each part is added to one another. 35 Figure : Runtime taken by different algorithms.

Summary of the Contributions TopicExisting methodology Performance problemProposed Methodology Specifying items’ MIS values Percentage- based methodology Causes rare item problem as it will not maintain uniform difference between items’ support and MIS values Support-difference based methodology Patterns do not satisfy downward closure property CFP-growth1.Constructed tree is not efficient 2.Search space is huge as it searches using those items that cannot generate frequent pattern at higher order. CFP-growth++ uses “least minimum support”, and “infrequent leaf node pruning” to construct tree effectively. In addition, uses “conditional minsup” and “conditional closure property” to effectively reduce the search space. 36

Summary of Contributions TopicExisting methodology Performance problemProposed Methodology Not sufficient for databases of widely varying items’ frequencies Multiple minimum support framework Generates uninteresting frequent patterns containing both very high and very low frequency items. The items within the pattern are not correlated. A new interestingness measure “item-to- pattern difference” has been extended to prune such interesting frequent patterns. Periodic- frequent pattern mining. Single minimum support and single maximum periodicity framework The rare item problem.1.The multiple minimum supports and maximum periodicity framework 2.A pattern growth algorithm 37

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 38

Conclusions and Future Work 39

Conclusions and Future Work 40

References