SEG4630 2009-2010 Tutorial 2 – Frequent Pattern Mining.

Slides:

Advertisements

Similar presentations

Mining Association Rules

Advertisements

Graph Mining Laks V.S. Lakshmanan

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña FP grow algorithm Correlation analysis.

FP-Growth algorithm Vasiljevic Vladica,

Data Mining Association Analysis: Basic Concepts and Algorithms

CPS : Information Management and Mining

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining: Association Rule Mining CSE880: Database Systems.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms

Mining Association Rules in Large Databases

FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.

Data Mining Association Analysis: Basic Concepts and Algorithms

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.

Association Analysis: Basic Concepts and Algorithms.

Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

Data Mining Association Analysis: Basic Concepts and Algorithms

Chapter 4: Mining Frequent Patterns, Associations and Correlations

FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.

Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,

1 1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 6 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.

Mining Frequent Patterns without Candidate Generation Presented by Song Wang. March 18 th, 2009 Data Mining Class Slides Modified From Mohammed and Zhenyu’s.

Jiawei Han, Jian Pei, and Yiwen Yin School of Computing Science Simon Fraser University Mining Frequent Patterns without Candidate Generation SIGMOD 2000.

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

Data Mining: Concepts and Techniques (3rd ed.) — Chapter 6 —

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar.

Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Mining Frequent Patterns without Candidate Generation.

Chapter 4 – Frequent Pattern Mining Shuaiqiang Wang ( 王帅强 ) School of Computer Science and Technology Shandong University of Finance and Economics Homepage:

Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授：廖述賢博士報告人：朱佩慧班級：管科所博一.

Frequent Pattern  交易資料庫中頻繁的被一起購買的產品  可以做為推薦產品、銷售決策的依據  兩大演算法 Apriori FP-Tree.

Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

1 Data Mining: Mining Frequent Patterns, Association and Correlations.

Association Analysis (3)

What is Frequent Pattern Analysis?

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

Association Rule Mining CENG 514 Data Mining July 2,

DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques

Frequent Pattern Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Analysis: Basic Concepts and Algorithms

Mining Association Rules in Large Databases

Data Mining Association Analysis: Basic Concepts and Algorithms

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

Association Analysis: Basic Concepts and Algorithms

732A02 Data Mining - Clustering and Association Analysis

Mining Frequent Patterns without Candidate Generation

Frequent-Pattern Tree

FP-Growth Wenlong Zhang.

Mining Association Rules in Large Databases

Presentation transcript:

SEG Tutorial 2 – Frequent Pattern Mining

2 Frequent Patterns  Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set itemset: A set of one or more items k-itemset: X = {x 1, …, x k } Mining algorithms  Apriori  FP-growth TidItems bought 10Beer, Nuts, Diaper 20Beer, Coffee, Diaper 30Beer, Diaper, Eggs 40Nuts, Eggs, Milk 50Nuts, Coffee, Diaper, Eggs, Beer

3 Support & Confidence  Support (absolute) support, or, support count of X: Frequency or occurrence of an itemset X (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X) An itemset X is frequent if X’s support is no less than a minsup threshold  Confidence (association rule: X  Y ) sup(XY)/sup(x) (conditional prob.: Pr(Y|X) = Pr(X^Y)/Pr(X) ) confidence, c, conditional probability that a transaction having X also contains Y Find all the rules X  Y with minimum support and confidence  sup(XY) ≥ minsup  sup(XY)/sup(X) ≥ minconf

4 Apriori Principle  If an itemset is frequent, then all of its subsets must also be frequent  If an itemset is infrequent, then all of its supersets must be infrequent too frequent infrequent (X  Y) (¬Y  ¬X)

5 Apriori: A Candidate Generation & Test Approach  Initially, scan DB once to get frequent 1- itemset  Loop Generate length (k+1) candidate itemsets from length k frequent itemsets Test the candidates against DB Terminate when no frequent or candidate set can be generated

6 Generate candidate itemsets  Example Frequent 3-itemsets: {1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}, {2, 3, 5} and {3, 4, 5} Candidate 4-itemset: {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {1, 3, 4, 5}, {2, 3, 4, 5} Which need not to be counted? {1, 2, 4, 5} & {1, 3, 4, 5} & {2, 3, 4, 5}

7 Maximal vs Closed Frequent Itemsets  An itemset X is a max-pattern if X is frequent and there exists no frequent super-pattern Y כ X  An itemset X is closed if X is frequent and there exists no super-pattern Y כ X, with the same support as X Closed Frequent Itemsets are Lossless: the support for any frequent itemset can be deduced from the closed frequent itemsets

8 Maximal vs Closed Frequent Itemsets # Closed = 9 # Maximal = 4 Closed and maximal frequent Closed but not maximal minsup=2

9 Algorithms to find frequent pattern  Apriori: uses a generate-and-test approach – generates candidate itemsets and tests if they are frequent Generation of candidate itemsets is expensive (in both space and time) Support counting is expensive  Subset checking (computationally expensive)  Multiple Database scans (I/O)  FP-Growth: allows frequent itemset discovery without candidate generation. Two step: 1.Build a compact data structure called the FP-tree  2 passes over the database 2.extracts frequent itemsets directly from the FP-tree  Traverse through FP-tree

10 Pattern-Growth Approach: Mining Frequent Patterns Without Candidate Generation  The FP-Growth Approach Depth-first search (Apriori: Breadth-first search) Avoid explicit candidate generation Fp-tree construatioin: Scan DB once, find frequent 1-itemset (single item pattern) Sort frequent items in frequency descending order, f-list Scan DB again, construct FP- tree FP-Growth approach: For each frequent item, construct its conditional pattern-base, and then its conditional FP-tree Repeat the process on each newly created conditional FP-tree Until the resulting FP-tree is empty, or it contains only one path—single path will generate all the combinations of its sub-paths, each of which is a frequent pattern

11 FP-tree Size  The size of an FPtree is typically smaller than the size of the uncompressed data because many transactions often share a few items in common Bestcase scenario: All transactions have the same set of items, and the FPtree contains only a single branch of nodes. Worstcase scenario: Every transaction has a unique set of items. As none of the transactions have any items in common, the size of the FPtree is effectively the same as the size of the original data.  The size of an FPtree also depends on how the items are ordered

12 Example  FP-tree with item descending ordering  FP-tree with item ascending ordering

13 Find Patterns Having p From P-conditional Database  Starting at the frequent item header table in the FP-tree  Traverse the FP-tree by following the link of each frequent item p  Accumulate all of transformed prefix paths of item p to form p’s conditional pattern base Conditional pattern bases itemcond. pattern base cf:3 afc:3 bfca:1, f:1, c:1 mfca:2, fcab:1 pfcam:2, cb:1 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

14 + p + m + b + a FP-Growth

15 + p + m+ b+ a f: 1,2,3,5 (1) (2) (3)(4) (5) (6) + c FP-Growth

16 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 {} f:2c:1 b:1 p:1 c:2 a:2 m:2 {} f:3 c:3 a:3 b:1 {} f:2c:1 a:1 {} f:3 c:3 {} f:3 +p+p +m+m +b+b +a+a +c+c f:4 (1) (2) (3)(4)(5)(6)

17 + p + m+ b+ a f: 1,2,3,5 + p p: 3 cp: 3 + m m: 3 fm: 3 cm: 3 am: 3 fcm: 3 fam: 3 cam: 3 fcam: 3 b: 3 f: 4 a: 3 fa: 3 ca: 3 fca: 3 c: 4 fc: 3 + c min_sup = 3