1 Association Graphs Selim Mimaroglu University of Massachusetts Boston.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Institut für Scientific Computing - Universität WienP.Brezany 1 Datamining Methods Mining Association Rules and Sequential Patterns.
Mining Multiple-level Association Rules in Large Databases
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining various kinds of Association Rules
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Data Mining Find information from data data ? information.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Association Rule Mining. 2 Clearly not limited to market-basket analysis Associations may be found among any set of attributes – If a representative.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Find information from data data ? information.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Repoussis Panagiotis.
Mining Association Rules
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Association Analysis: Basic Concepts
Presentation transcript:

1 Association Graphs Selim Mimaroglu University of Massachusetts Boston

2 This Presentation is Built On:  A Graph-Based Approach for Discovering Various Types of Association Rules (MAIN) by Show-Jane Yen and A.L.P. Chen at IEEE Transactions on Data and Knowledge Engineering 2001  Mining Multiple-Level Association Rules from Large Databases by Jiawei Han and Yongjian Fu at VLDB ’95  Mining Multilevel Association Rules from Transaction Databases section 6.3 ISBN  Mining Generalized Association Rules by R. Srikant, and R. Agrawal at VLDB ’95  Introduction to Data Mining by Tan, Steinbach, Kumar ISBN:

3 Organization  Primitive Association Rules Definition Association Graphs for finding large (frequent) item sets  Multiple-Level Association Rules Definition Why it’s important? Association Graphs for finding large item sets Interest Measure  Generalized Association Rules Definition Association Graphs for finding large item sets

4 Association Rules  Discovering patterns from a large database (generally data warehouse) is computationally expensive  To find all the rules that satisfy minsupport and minconf  X → Y  Transactions in the database contain the items in X tend also contain the items in Y.

5 Definitions  Let I={i1, i2, …, id} be set of all items in a market basked data  Let T={t1, t2,..tN} be the set of all transactions  A collection of items is termed itemset TIDi1i2…id t1101 t2100 …001 tN111

6 Definitions Let X and Y be two disjoint itemsets  Support Count of X  Support of X → Y  Confidence of X → Y

7 Taxonomy (or Concept Hierarchy)  This is created by the domain expert, e.g. store manager.  There may be more than one taxonomy for the same items  is-a relation  “Folgers Coffee Classic Roast Singles - 19 ct” (barcode ) is-a “Regular Ground Coffee”  “Regular Ground Coffee” is-a “Coffee”  “Coffee” is-a “Beverage”  “Beverage” is-a “General Grocery Item”

8 Taxonomy Example ……………………………………………………

9 Primitive Association Rules  Deals with the lowest level items of the taxonomy (most widely used)  Very well studied Algorithms using Apriori principle FP-Growth Algorithms  Steps Large (frequent) itemset generation Rule Generation

10 Found to be Infrequent Frequent Itemset Generation (Apriori) Pruned supersets

11 Motivation  Many database scans for checking if the candidate itemsets are qualified (i.e. ≥ minsupport )  Reducing the database scans FP-Growth: 2 scans Association Graphs: 1 scan ?

12 Mining Primitive Association Rules Create Bit Vectors TIDABCDE Number the Attributes A:1D:4 B:2E:5 C:3 BV BV BV BV BV

13 Bit Vectors (Column Store)  This is a column store (as opposed to a row store)  Read Efficient  Row stores are write efficient  For an item or itemset a 64 bit processor can count the support count of 64 rows in one instruction only  Logical AND, OR minsupport = 50% (2 transactions) Is item 1 (column A) frequent ? Yes BV Is the itemset {1, 3} frequent? Yes BV BV {1,3} =

14 Association Graph Construction Property 1: The support for the itemset {i 1, i 2, …, i k } is the number of 1s in BVi 1 ∧ BVi 2 … ∧ BVi k, where the notation “ ∧ ” is a logical AND operator ACG: Association Graph Construction: For every two large items i and j ( i < j), if the number of 1s in BV i ∧ BV j achieves the user-specified minimum support, a directed edge from item i to item j is created (The Association Graph is a Directed Acyclic Graph. Proven in Appendix A). Also, itemset (i, j) is a large 2 - itemset

15 Example 1: BV BV BV BV BV

16 Lemma 1: If an itemset is not a large itemset, then any itemset which contains this itemset cannot be a large itemset (Proof: Apriori principle) Lemma 2: For a large itemset (i 1, i 2, …, i k ), if there is no directed edge from i k to an item v, then itemset (i 1, i 2, …, i k, v) cannot be a large itemset Proof of Lemma 2: If there is not an edge from i k to v then (i k, v) is not a large itemset. If (i k, v) is not a large itemset then none of the supersets of (i k, v) can be a large itemset (e.g. (i 1, i 2, …, i k, v) isn’t a large itemset )

17 Finding All Large Itemsets by Using an Association Graph  Suppose (i 1, i 2, …, i k ) is a large k-itemset. If there is no directed edge from i k to v then the itemset need not be extended into k+1-itemset because (i 1, i 2, …, i k, v) must not be a large itemset according to Lemma 2  If there is a directed edge from i k to u, then the itemset (i 1, i 2, …, i k ) is extended to k+1-itemset (i 1, i 2, …, i k, u). The itemset (i 1, i 2, …, i k, u) is a large k+1 itemset if the number of 1s in BVi 1 ∧ BVi 2 … ∧ BVi k ∧ BVi u achieves the minsupport Fig 1: Association Graph

18 Example 2: Finding all large itemsets by using the Association Graph of Example1 Large 2-itemsets: (1, 3) (2, 3) (2, 5) (3,5) minsupport : 50% (2 rows) Fig 1: Association Graph

19 Large 3-itemset Generation  (1, 3, 5)  (2, 3, 5) BV BV BV (1, 3, 5) = BV BV = (2, 3, 5) BV

20 Is it really 1 (one) scan?  Creating an Association Graph may be computationally expensive  This really is not 1 scan only.  Bit Vectors make reading extremely efficient.  This is almost as good as it gets Fig 2: An itemset lattice (at level 2) Fig 1: An Association Graph

21 Multilevel Association Rules all computer desktop IBMDell laptop SonyToshiba software educational Microsoft… financial management … printer color HP… b/w Sony… computer accessory wrist pad Ergoway mouse Logitech

22 Why Multilevel Association Rules?  Support at the lower concept levels is low  You can miss the association rules at higher levels (bigger picture)  Support: Uniform support for all levels Reduced minimum support at lower levels Level by level independent Other approaches

23 Mining Multiple level Association Rules  Replace items/concepts at level k with the concepts at level k-1  Apply an Association Rules Algorithm at the items at level k-1 computer desktoplaptop IBMDellSonyToshiba

24 Example 3 for Multilevel Associations  For level 4  Generate Association Graphs on this table  Upgrade the items at Table 1 to level 3 (by looking at the taxonomy tree) TIDItems Purchased T1IBM desktop computer, Sony b/w printer T2Microsoft educational software, Microsoft financial software T3Logitech mouse, Microsoft financial software T4IBM desktop computer, Microsoft financial software T5IBM desktop computer Table 1: Market basket data (at lowest level taxonomy: level 4) TIDItems Purchased T1desktop computer, b/w printer T2educational software, financial software T3mouse, financial software T4desktop computer, financial software T5desktop computer Table 2: Market basket data (at level 3)

25 Redundant Multilevel Rules desktop computer → b/w printer [support=8%, confidence = 70%](R1) IBM desktop computer → b/w printer [support=2%, confidence=72%] (R2)  A rule is interesting if: it doesn’t have a parent rule it can’t be deduced from its parent rule Suppose that one quarter of all “desktop computers” are “IBM desktop computers”, so we expect following Confidence of R2 shall be around 70% Support of R2 shall be around 8% * ¼ = 2% R2 is redundant (uninteresting) because it does not convey any additional information All redundant rules shall be pruned

26 Mining Generalized Association Rules all computersoftwareprinter desktoplaptop IBMDell computer accessory SonyToshiba educational financial management Microsoft…… colorb/w HP…Sony… wrist pad Ergoway mouse Logitech

27 Rule Generation  To generate generalized association patterns one can add all ancestors of each item and then apply the basic algorithm  {IBM desktop computer, Sony b/w printer}  {IBM desktop computer, desktop computer, computer, Sony b/w printer, b/w printer, printer}  This way works, but inefficient If {IBM desktop computer} is a large itemset then {IBM desktop computer, desktop computer}, {IBM desktop computer, computer} and {desktop computer, computer} are large itemsets but redundant

28 Definition  A generalized association rule is an implication of the form X→Y, where X ⊂ I, Y ⊂ I, X∩Y= ∅, and no item in Y is an ancestor of any item in X.  X →ancestor(X) is trivially true with 100% confidence and hence redundant

29 Support for an item and its ancestor Lemma 3: The support for an itemset X that contains an item x i and its ancestor y i will be the same as the support for the itemset X- y i Proof of Lemma 3: X={x 1,..., x i, y i,…, x n } The support for the itemset X is: BV x 1 ∧ … ∧ BV x i ∧ BV y i … ∧ BV x n Note that: BV x i = BV x i ∧ BV y i The support of X becomes as the support of X- y i BV x 1 ∧ … ∧ BV x i ∧ … ∧ BV x n

30 Post Order Numbering  Number the items using the Post Order Numbering method (PON)  Lemma 4: For every two items i and j (i<j), item v is an ancestor of item i, but not an ancestor of item j, then v<j An example of concept hierarchy

31 Support for an ancestor  Lemma 5: Suppose items i 1, i 2, …, and i m are all specific descendants of the generalized item i n. The bit vector BV i n associated with item in is BV i 1 ∨ BV i 2 ∨ …… ∨ BV i m and the number of 1s in this bit vector is the support for item in where ∨ stands for logical OR operation.  Lemma 6: If an itemset X is large itemset, then any itemset generated by replacing an item in itemset X with its ancestor is also a large itemset

32 Creating the Generalized Association Graph  Lemma 7: if (the number of 1s in BV i ∧ BV j )≥ minsupport, then for each ancestor u of item i and for each ancestor v of item j, (the number of 1s in BV u ∧ BV j )≥minsupport and (the number of 1s in BV i ∧ BV v )≥minsupport From Lemma 7, if an edge from item i to item j is created, the edges from item i to the ancestors of item j, which are not ancestors of item i, are also created From Lemma 4, the ancestors of item i, which are not ancestors of item j, are all less than j. Hence if an edge from item i to item j is created, the edges from the ancestors of item i, which are not ancestors of j, to item j are also created (A. Graph Construction Pr.)

33 Example of Generalized Association Rule with minsupport 40% (2 transactions) The generalized Association Graph

34 Finally Theorem 1: Any itemset generated by traversing the Generalized Association Graph (GAG) will not contain both an item and its ancestor Proof of Theorem 1: Basis of Induction: In GAG there will be no edge between an item and its ancestor. Therefore all 2-itemsets are free of ancestors Inductive Hypothesis: We assume that any large k-itemset (i 1, i 2, …, i k ) does not contain both an item and its ancestor Inductive Step: Large k-itemset (i 1, i 2, …, i k ) is extended to k+1-itemset (i 1, i 2, …, i k,, w). Suppose v 1, v 2, …, v k-1 are ancestors of i 1, i 2, …, i k-1 respectively, but none are ancestors of item i k. Because items are numbered by PON method, i k > v j (1≤ j ≤ k-1). Hence, there are no edges from item i k to the ancestors of items i 1, i 2, …, i k. So, item w cannot be an ancestor of item i 1, i 2, …, i k