Association Analysis (3)

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
CPS : Information Management and Mining
Frequent Item Mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
FP-Tree/FP-Growth Practice. FP-tree construction null B:1 A:1 After reading TID=1: After reading TID=2: null B:2 A:1 C:1 D:1.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Performance and Scalability: Apriori Implementation.
SEG Tutorial 2 – Frequent Pattern Mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Based on Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.
Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
What Is Association Mining? l Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items.
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
Mining Frequent Patterns without Candidate Generation.
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1. UTS 2. Basic Association Analysis (IDM ch. 6) 3. Practical: 1. Project Proposal 2. Association Rules Mining (DMBAR ch. 16) 1. online radio 2. predicting.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
What is Frequent Pattern Analysis?
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Frequent Pattern Mining
FP-Tree/FP-Growth Detailed Steps
Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Association Analysis: Basic Concepts and Algorithms
Frequent-Pattern Tree
Presentation transcript:

Association Analysis (3)

Alternative Methods for Frequent Itemset Generation Traversal of Itemset Lattice General-to-specific vs Specific-to-general (top-down vs. bottom-up) Apriori From (k-1)-itemsets, create k-itemsets (more “specific”) Used for finding Max-Frequent Itemsets If a k-itemset in the lattice isn’t Max-Freq, then we don’t need to examine any of its subsets of size (k-1).

Alternative Methods for Frequent Itemset Generation Traversal of Itemset Lattice Search within an equivalence class first before moving to another equivalence class. APRIORI algorithm (implicitly) partitions itemsets into equivalence classes based on their length (same length – same class) However, we can search by partitioning (implicitly) according to the prefix or suffix labels of an itemset.

Alternative Methods for Frequent Itemset Generation Traversal of Itemset Lattice Breadth-first vs. Depth-first Lattice can also be traversed in a depth­first way. An algorithm can start from, say, node a and count its support to determine whether it is frequent. If so, the algorithm progressively expands the next level of nodes, i.e., ab, abc, and so on, until an infrequent node is reached, say, abcd. It then backtracks to another branch, say, abce, … APRIORI algorithm traverses the lattice in a breadth­first manner. It first discovers all the frequent 1­itemsets, followed by the frequent 2­itemsets, and so on.

Depth­first approach is often used by algorithms designed to find max- freq itemsets. Allows the frequent itemset border to be detected more quickly. Once a max-freq itemset is found, substantial pruning can be performed on its subsets. E.g., if bcde is maximal frequent, then the algorithm does not have to visit the subtrees rooted at bd, be, c, d, and e because they will not contain any max-freq itemsets.

Alternative Methods for Frequent Itemset Generation Representation of Database Horizontal vs vertical data layout APRIORI uses horizontal layout Vertical: For each item, store a list of transaction ids (tids) ECLAT uses vertical layout TID-list

ECLAT Determine support of any k-itemset by intersecting TID-lists of two of its (k-1) subsets. Advantage: very fast support counting Disadvantage: intermediate tid-lists may become too large for memory  

FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets. Building the FP-Tree Scan data to determine the support count of each item. Infrequent items are discarded, while the frequent items are sorted in decreasing support counts. Make a second pass over the data to construct the FP­tree. As the transactions are read, before being processed, their items are sorted according to the above order.

First scan – determine frequent 1-itemsets, then build header 8 A 7 C D 5 E 3

FP-tree construction null After reading TID=1: B:1 A:1

FP-Tree Construction Transaction Database B:8 A:5 null C:3 D:1 A:2 C:1 Header table Chain pointers help in quickly finding all the paths of the tree containing some given item.

FP-Tree size The size of an FP­tree is typically smaller than the size of the uncompressed data because many transactions often share a few items in common. Best­case scenario: All transactions have the same set of items, and the FP­tree contains only a single branch of nodes. Worst­case scenario: Every transaction has a unique set of items. As none of the transactions have any items in common, the size of the FP­tree is effectively the same as the size of the original data. The size of an FP­tree also depends on how the items are ordered. If the ordering scheme in the preceding example is reversed, i.e., from lowest to highest support item, the resulting FP­tree probably is denser (shown in next slide). Not always though…ordering is just a heuristic.

An FP­tree representation for the data set with a different item ordering scheme.

FP-Growth (I) FP­growth generates frequent itemsets from an FP­tree by exploring the tree in a bottom­up fashion. Given the example tree, the algorithm looks for frequent itemsets ending in E first, followed by D, C, A, and finally, B. Since every transaction is mapped onto a path in the FP­tree, we can derive the frequent itemsets ending with a particular item, say, E, by examining only the paths containing node E. These paths can be accessed rapidly using the pointers associated with node E.

Paths containing node E B:8 A:5 null C:3 D:1 A:2 C:1 E:1 B:3 null C:3 A:2 C:1 D:1 E:1

Conditional FP-Tree for E We now need to build a conditional FP-Tree for E, which is the tree of itemsets ending in E. It is not the tree obtained in previous slide as result of deleting nodes from the original tree. Why? Because the order of the items change. In this example, C has a higher than B count.

Conditional FP-Tree for E Header table The new header B:3 null C:3 A:2 C:1 D:1 E:1 The set of paths containing E. Insert each path (after truncating E) into a new tree. C:3 null B:3 C:1 A:1 D:1 The conditional FP-Tree for E Adding up the counts for D we get 2, so {E,D} is frequent itemset. We continue recursively. Base of recursion: When the tree has a single path only.

FP-Tree Another Example Transactions Freq. 1-Itemsets. Supp. Count 2 Transactions with items sorted based on frequencies, and ignoring the infrequent items. A B C E F O A C G E I A C D E G A C E G L E J A B C E F P A C D A C E G M A C E G N A:8 C:8 E:8 G:5 B:2 D:2 F:2 A C E B F A C G E A C E G D A C E G A C D

FP-Tree after reading 1st transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:1 C:1 E:1 B:1 F:1

FP-Tree after reading 2nd transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:2 C:2 G:1 E:1 B:1 F:1

FP-Tree after reading 3rd transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:2 E:1 C:2 G:1 E:1 B:1 F:1

FP-Tree after reading 4th transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:3 E:1 C:3 G:1 E:2 G:1 B:1 F:1 D:1

FP-Tree after reading 5th transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:4 E:1 C:4 G:1 E:3 G:2 B:1 F:1 D:1

FP-Tree after reading 6th transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:4 E:2 C:4 G:1 E:3 G:2 B:1 F:1 D:1

FP-Tree after reading 7th transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:5 E:2 C:5 G:1 E:4 G:2 B:2 F:2 D:1

FP-Tree after reading 8th transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:6 E:2 C:6 G:1 D:1 E:4 G:2 B:2 F:2 D:1

FP-Tree after reading 9th transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:7 E:2 C:7 G:1 D:1 E:5 G:3 B:2 F:2 D:1

FP-Tree after reading 10th transaction A C E B F A C G E A C E G D A C E G A C D Header null A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:8 E:2 C:8 G:1 D:1 E:6 G:4 B:2 F:2 D:1

Conditional FP-Trees Build the conditional FP-Tree for each of the items. For this: Find the paths containing on focus item. With those paths we build the conditional FP-Tree for the item. Read again the tree to determine the new counts of the items along those paths. Build a new header. Insert the paths in the conditional FP-Tree according to the new order.

Conditional FP-Tree for F null Header null New Header A:8 C:8 E:8 G:5 B:2 D:2 F:2 A:2 C:2 E:2 B:2 A:2 A:8 C:2 C:8 E:6 E:2 B:2 B:2 F:2 There is only a single path containing F

Recursion We continue recursively on the conditional FP-Tree for F. However, when the tree is just a single path it is the base case for the recursion. So, we just produce all the subsets of the items on this path merged with F. {F} {A,F} {C,F} {E,F} {B,F} {A,C,F}, …, {A,C,E,F} null New Header A:6 C:6 E:5 B:2 A:2 C:2 E:2 B:2

Conditional FP-Tree for D null null A:8 C:8 E:6 G:4 D:1 Paths containing D after updating the counts New Header A:2 C:2 A:2 C:2 The other items are removed as infrequent. The tree is just a single path; it is the base case for the recursion. So, we just produce all the subsets of the items on this path merged with D. {D} {A,D} {C,D} {A,C,D} Exercise: Complete the example.