Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Multiple-level Association Rules in Large Databases
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
1 Efficient Algorithms for Mining Share-Frequent Itemsets Authors: Y. C. Li, J. S. Yeh and C. C. Chang Speaker: Yu-Chiang Li Date :July 28, 2005.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Association Rules
SEG Tutorial 2 – Frequent Pattern Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Sequential PAttern Mining using A Bitmap Representation
Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
An Efficient Polynomial Delay Algorithm for Pseudo Frequent Itemset Mining 2/Oct/2007 Discovery Science 2007 Takeaki Uno (National Institute of Informatics)
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Δ-Tolerance Closed Frequent Itemsets James Cheng,Yiping Ke,and Wilfred Ng ICDM ’ 06 報告者:林靜怡 2007/03/15.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
A Research Oriented Study Report By :- Akash Saxena
Association rule mining
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
The Concept of Maximal Frequent Itemsets
CARPENTER Find Closed Patterns in Long Biological Datasets
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Presentation transcript:

Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center

Abstract Max-Miner : scale roughly linearly in the number of maximal patterns, irrespective of the length of the longest pattern previous algorithms : scale exponentially with longest pattern length

Introduction (Cont’d) pattern mining algorithms have been developed to operate on databases where the longest patterns are relatively short Interesting data-sets with long patterns –sales transactions detailing the purchases made by regular customers over a large time window –biological data from the fields of DNA and protein analysis

Introduction (Cont’d) Apriori-like algorithms are inadequate on data-sets with long patterns a bottom-up search –enumerates every single frequent itemset –exponential complexity in order to produce a frequent itemset of length l, it must produce all 2 l of its subsets since they too must be frequent –restricts Apriori-like algorithms to discovering only short patterns

Introduction Max-Miner algorithm –for efficiently extracting only the maximal frequent itemsets –roughly linear in the number of maximal frequent itemsets –“look ahead”, not bottom-up search –can prune all its subsets from consideration, by identifying a long frequent itemset early on

Max-Miner (Cont’d) Rymon’s generic set-enumeration tree search frame work ex. Figure 1. breadth-first search –in order to limit the number of passes pruning strategies –subset infrequency pruning (as does Apriori) –superset frequency pruning

Figure 1. Complete set-enumeration tree over four items { } ,21,31,42,32,43,4 1,2,31,3,42,3,4 1,2,3,4

Max-Miner (Cont’d) candidate group g, –head, h(g) represents the itemsets enumerated by the node –tail, t(g) an ordered set contains all items not in h(g) that can potentially appear in any sub-node –ex. the node enumerating itemset {1}  h(g) = {1}, t(g) = {2, 3, 4}

Max-Miner counting the support of a candidate group g, –computing the support of itemsets h(g), h(g)  t(g) and h(g)  {i} for all i  t(g) superset-frequency pruning –halting sub-node expansion at any candidate group g for which h(g)  t(g) is frequent subset-infrequency pruning –removing any such tail item from candidate group before expanding its sub-nodes

Figure 2. Max-Miner at its top level MAX-MINER (Data-set T) ;; Returns the set of maximal frequent itemsets present in T Set of Candidate Groups C  { } Set of Itemsets F  {GEN-INITIAL-GROUP(T, C)} while C is non-empty do scan T to count the support of all candidate groups in C for each g  C such that h(g)  t(g) is frequent do F  F  {h(g)  t(g)} Set of Candidate Groups C new  { } for each g  C such that h(g)  t(g) is infrequent do F  F  {GEN-SUB-NODES(g, C new )} C  C new remove from F any itemset with a proper superset in F remove from C any group g such that h(g)  t(g) has a superset in F return F

Figure 3. Generating the initial candidate groups GEN-INITIAL-GROUPS (Data-set T, Set of Candidate Groups C) ;; C is passed by reference and returns the candidate groups ;; The return value of the function is a frequent 1-itemset scan T to obtain F 1, the set of frequent 1-itemsets impose an ordering on the items in F 1 for each item i in F 1 other than the greatest item do let g be a new candidate with h(g) = {i} and t(g) = {j | j follows i in the ordering} C  C  {g} return the itemset in F 1 containing the greatest item

Figure 4. Generating sub-nodes GEN-SUB-NODES (Candidate Group g, Set of Cand. Groups C) ;; C is passed by reference and returns the sub-nodes of g ;; The return value of the function is a frequent itemset remove any item i from t(g) if h(g)  {i} is infrequent reorder the items in t(g) for each i  t(g) other than greatest do let g’ be a new candidate with h(g’) = h(g)  {i} and t(g’) = { j | j  t(g) and j follows i in t(g)} C  C  {g’} return h(g)  {m} where m is the greatest item in t(g), or h(g) if t(g) is empty

Item Ordering Policies to increase the effectiveness of superset- frequency pruning to position the most frequent items last ordering : Gen-Initial-Group –in increasing order of sup({i}) reordering : Gen-Sub-Nodes –in increasing order of sup(h(g)  {i}) consider only the subset of transactions relevant to the given node