Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CPS : Information Management and Mining
Frequent Item Mining.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Organization “Association Analysis”
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Minqi Zhou Minqi Zhou Introduction.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Association Rules in Large Databases
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Performance and Scalability: Apriori Implementation.
DATA MINING LECTURE 2 Frequent Itemsets Association Rules.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
What Is Association Mining? l Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
DATA MINING LECTURE 3 Frequent Itemsets Association Rules.
Eick, Tan, Steinbach, Kumar: Association Analysis Part1 Organization “Association Analysis” 1. What is Association Analysis? 2. Association Rules 3. The.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Find information from data data ? information.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1. UTS 2. Basic Association Analysis (IDM ch. 6) 3. Practical: 1. Project Proposal 2. Association Rules Mining (DMBAR ch. 16) 1. online radio 2. predicting.
Association Analysis (3)
What is Frequent Pattern Analysis?
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Frequent-Pattern Tree
Mining Association Rules in Large Databases
Association Analysis: Basic Concepts
What Is Association Mining?
Presentation transcript:

Frequent Item Mining

What is data mining? =Pattern Mining? What patterns? Why are they useful?

3 Definition: Frequent Itemset Itemset – A collection of one or more items Example: {Milk, Bread, Diaper} – k-itemset An itemset that contains k items Support count (  ) – Frequency of occurrence of an itemset – E.g.  ({Milk, Bread,Diaper}) = 2 Support – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 Frequent Itemset – An itemset whose support is greater than or equal to a minsup threshold

Frequent Itemsets Mining TIDTransactions 100{ A, B, E } 200{ B, D } 300{ A, B, E } 400{ A, C } 500{ B, C } 600{ A, C } 700{ A, B } 800{ A, B, C, E } 900{ A, B, C } 1000{ A, C, E } Minimum support level 50% – {A},{B},{C},{A,B}, {A,C} How to link this to Data Cube?

Three Different Views of FIM Transactional Database – How we do store a transactional database? Horizontal, Vertical, Transaction-Item Pair Binary Matrix Bipartite Graph How does the FIM formulated in these different settings? 5

6 Frequent Itemset Generation Given d items, there are 2 d possible candidate itemsets

7 Frequent Itemset Generation Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset – Count the support of each candidate by scanning the database – Match each transaction against every candidate – Complexity ~ O(NMw) => Expensive since M = 2 d !!!

8 Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due to the following property of the support measure: – Support of an itemset never exceeds the support of its subsets – This is known as the anti-monotone property of support

9 Illustrating Apriori Principle Found to be Infrequent Pruned supersets

10 Illustrating Apriori Principle Items (1-itemsets) Pairs (2-itemsets) (No need to generate candidates involving Coke or Eggs) Triplets (3-itemsets) Minimum Support = 3 If every subset is considered, 6 C C C 3 = 41 With support-based pruning, = 13

Apriori R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB, , 1994Fast algorithms for mining association rules

13 How to Generate Candidates? Suppose the items in L k-1 are listed in an order Step 1: self-joining L k-1 insert into C k select p.item 1, p.item 2, …, p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 =q.item 1, …, p.item k-2 =q.item k-2, p.item k-1 < q.item k-1 Step 2: pruning forall itemsets c in C k do forall (k-1)-subsets s of c do if (s is not in L k-1 ) then delete c from C k

14 Challenges of Frequent Itemset Mining Challenges – Multiple scans of transaction database – Huge number of candidates – Tedious workload of support counting for candidates Improving Apriori: general ideas – Reduce passes of transaction database scans – Shrink number of candidates – Facilitate support counting of candidates

15 Alternative Methods for Frequent Itemset Generation Representation of Database – horizontal vs vertical data layout

16 ECLAT For each item, store a list of transaction ids (tids) TID-list

17 ECLAT Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets. 3 traversal approaches: – top-down, bottom-up and hybrid Advantage: very fast support counting Disadvantage: intermediate tid-lists may become too large for memory 

20 FP-growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed, it uses a recursive divide-and-conquer approach to mine the frequent itemsets

21 FP-tree construction null A:1 B:1 null A:1 B:1 C:1 D:1 After reading TID=1: After reading TID=2:

22 FP-Tree Construction null A:7 B:5 B:3 C:3 D:1 C:1 D:1 C:3 D:1 E:1 Pointers are used to assist frequent itemset generation D:1 E:1 Transaction Database Header table

23 FP-growth null A:7 B:5 B:1 C:1 D:1 C:1 D:1 C:3 D:1 Conditional Pattern base for D: P = {(A:1,B:1,C:1), (A:1,B:1), (A:1,C:1), (A:1), (B:1,C:1)} Recursively apply FP- growth on P Frequent Itemsets found (with sup > 1): AD, BD, CD, ACD, BCD D:1

25 Compact Representation of Frequent Itemsets Some itemsets are redundant because they have identical support as their supersets Number of frequent itemsets Need a compact representation

26 Maximal Frequent Itemset Border Infrequent Itemsets Maximal Itemsets An itemset is maximal frequent if none of its immediate supersets is frequent

27 Closed Itemset An itemset is closed if none of its immediate supersets has the same support as the itemset

28 Maximal vs Closed Itemsets Transaction Ids Not supported by any transactions

29 Maximal vs Closed Frequent Itemsets Minimum support = 2 # Closed = 9 # Maximal = 4 Closed and maximal Closed but not maximal

30 Maximal vs Closed Itemsets

Beyond Itemsets Sequence Mining – Finding frequent subsequences from a collection of sequences Graph Mining – Finding frequent (connected) subgraphs from a collection of graphs Tree Mining – Finding frequent (embedded) subtrees from a set of trees/graphs Geometric Structure Mining – Finding frequent substructures from 3-D or 2-D geometric graphs Among others…

Frequent Pattern Mining B A E AB C C F B D F F D EAB A C AE D C F D A B A C E A D A B DC A AB B D D C C AB DC

Why Frequent Pattern Mining is So Important? Application Domains – Business, biology, chemistry, WWW, computer/networing security, … Summarizing the underlying datasets, providing key insights Basic tools for other data mining tasks – Assocation rule mining – Classification – Clustering – Change Detection – etc…

Network motifs: recurring patterns that occur significantly more than in randomized nets Do motifs have specific roles in the network? Many possible distinct subgraphs

The 13 three-node connected subgraphs

199 4-node directed connected subgraphs And it grows fast for larger subgraphs : node subgraphs, 1,530,843 6-node…

Finding network motifs – an overview Generation of a suitable random ensemble (reference networks) Network motifs detection process:  Count how many times each subgraph appears  Compute statistical significance for each subgraph – probability of appearing in random as much as in real network (P-val or Z-score)

Real = 5 Rand=0.5±0.6 Zscore (#Standard Deviations) =7.5 Ensemble of networks

39 References R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD, , 1993.Mining association rules between sets of items in large databases R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB, , 1994.Fast algorithms for mining association rules R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD, 85-93, 1998.Efficiently mining long patterns from databases

References: Christian Borgelt, Efficient Implementations of Apriori and Eclat, FIMI’03 Ferenc Bodon, A fast APRIORI implementation, FIMI’03 Ferenc Bodon, A Survey on Frequent Itemset Mining, Technical Report, Budapest University of Technology and Economic, 2006

Important websites: FIMI workshop – Not only Apriori and FIM FP-tree, ECLAT, Closed, Maximal – Christian Borgelt’s website – Ferenc Bodon’s website –