1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Edi Winarko, John F. Roddick
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Rank-Correlated Sets Of Numerical Attributes.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
Mining various kinds of Association Rules
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu.
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Amer Zaheer PC Mohammad Ali Jinnah University, Islamabad
Farzaneh Mirzazadeh Fall 2007
Association Analysis: Basic Concepts
Presentation transcript:

1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004

2 Review Association Rules –interesting association relationship among huge amounts of transactions An association rule is an expression of the form X => Y, where X and Y are sets of items Goal of AA – To find all association rules that satisfy user-specified minimum support and minimum confidence threshold

3 Outline Introduction 5 steps of discovering quantitative association rules Partitioning quantitative attributes Interest Algorithm Conclusion

4 Introduction Boolean Association Rules Problem- finding associations between the “1” values in a relational table, where all attributes are Boolean. E.g.: {A,B,C},{A,C},{B,C} TIDABC

5 Introduction,CONT Most Databases have Richer Attributes types e.g. Quantitative: Age, Income Categorical: Zip, Make of Car Quantitative Association Rules Problems - Mining association rules over quantitative and categorical attributes

6 Mapping Quantitative Association Rules Problem into the Boolean Association Rules Problem If all attributes are categorical or the quantitative attributes have only a few values, we could map each pair to a boolean attribute. If the domain of values for a quantitative attribute is large, first partition the values into intervals and then map each pair to a boolean attribute.

7 RecIDAge: Age: Marrie d: Yes Marrie d: No NumCa rs : 0 NumCa rs : RecordIDAgeMarriedNumCars 10023No Yes Yes No0

8 Mapping Problems MinSup – If the number of intervals for a quantitative attribute is large, the support for any single interval can be low MinConf – Information lost due to partitioning into intervals. This information lost increases as the interval sizes become larger.

9 Catch-22 If intervals are too large, rules may not have MinConf If intervals are too small, rules may not have MinSup How do we solve it ?

10 Solve Catch-22 Consider all possible continuous ranges over the values of the quantitative attribute, or over the partitioned intervals Solve minimum support – combine adjacent intervals/values Solve minimum confidence – increase number of intervals

11 Unfortunately, More Problems Exec Time – If a quantitative attribute has n values(or intervals), there are on average O(n 2 ) ranges that include a specific value or interval. Many Rules – If a value(or interval) has MinSup, then any range containing this value also has MinSup, thus producing many uninteresting rules.

12 Our Approach Maximum Support – stop combining adjacent intervals if their combined support exceeds this value Partial Completeness – quantify information lost due to partitioning Interest Measure – help prune out uninteresting rules

13 Problem definition The rule X=>Y holds in the record set D with confidence c if c% of records in D that support X also support Y. The rule X=>Y has support s in the record set D if s% of records in D support XUY

14 Formal Problem Statement “ Given a set of Records D, the problem of mining quantitative association rules is to find all quantitative association rules that have support and confidence greater than the user-specified minimum support and minimum confidence”

15 5 steps of discovering quantitative association rules 1)Determine the number of partitions for each quantitative attribute 2) Mapping the values of each attribute to a set of consecutive integers, such that the order of the values is preserved 3)Find the support for each value of both quantitative and categorical attributes. For quantitative attributes, adjacent values are combined as long as their support is less than the user-specified maximum support. Next,generate the frequent itemsets. 4)Use Frequent itemsets to generate association rules 5)Determining the interesting rules.

16

17 Partitioning Quantitative attributes Partial Completeness – Gives a handle on the amount of information lost by partitioning. The lower the level of partial completeness, the less the information lost. Equi-Depth Partitioning – Minimizes the number of intervals required to satisfy Partial Completeness level

18 Partial Completeness R – Set of rules generated by considering all ranges over the raw values R ’ – Set of rules generated by considering all ranges over the partitions Measure the information loss – for each rule in R, how “ far ” the “ closest ” rule in R ’ is Using the ratio of the support of the rules as a measure of how far apart the rules are

19 Partial Completeness Over Itemsets Let C denote the set of all frequent itemsets in D. For any, we call P K-complete w.r.t C if

20 Sample Partial Completeness NumberItemsetSupport 1Age: % 2Age: % 3Age: % 4Cars: 1-25% 5Cars: 1-36% 6Age: 20-30, Cars 1-24% 7Age: 20-40, Cars 1-35% Itemsets 2,3,5,7 form a 1.5-complete set

21 Close Rule Given a set of frequent itemsets P which is K-complete w.r.t. the set of all frequent itemsets, the minimum confidence when generating rules from P must be set to 1/K times the desired level to guarantee that a close rule will be generated

22 Determining Number of Partitions Given A Partial Completeness Level K, and Equi-Depth partitioning, we get Number of Intervals = where n = Number of quantitative attributes m = Minimum support K = Partial Completeness Level

23 Interest Consider the following rules, where about a quarter of people in the age group are in the age group Age: =>Cars: 1..2 (8% Supp, 70% Conf) Age: =>Cars: 1..2 (2% Supp, 70% Conf) Second Rule Redundant Capture Rules by “Greater than Expected”

24 Expected Values Epr(Z’)[Pr(Z)] – the “expected” value of Pr(Z) based on Pr(Z’), where Z’ is a generalization of Z Epr(Y’|X’)[Pr(Y|X)] – the “expected” confidence of the rule X=>Y based on the rule X’=>Y’, where X’ and Y’ are generalizations of X and Y, respectively.

25 Expected Values, Cont

26 Interest Measure A Rule X =>Y is R-Interesting w.r.t X’ => Y’ if the support of the rule X=>Y is R times the expected support based on X’ => Y’, Or the confidence is R times the expected confidence based on X’ => Y’, and the itemset X U Y is R-interesting w.r.t X’ U Y’.

27 Algorithm—finding frequent itemset Based on the Apriori algorithm for finding boolean association rules Candidate Generation Join Phase Subset Prune Phase Interest Prune Phase Counting Support of Candidates

28 Algorithm’Cont K-itemset denote an itemset having k-items L k : Set of Frequent k - itemsets L k –1 is used to generate C k, the Candidate k- Itemsets Scan Database, determine which of the candidates in C k are contained in the record, and their support increment by one At the end of the pass, C k is examined and yield L k.

29 Candidate Generation Join Phase: L k-1 is joined with itself, first k- 2 items are the same, and the attributes of the last two items are different. e.g. L 2 { } After the Join step, C3 will consist of the following: { }

30 Candidate Generation Subset Prune Phrase:Join Results having some (k-1)-subset that is not in L k-1 are deleted Delete: (Married: yes; Age ; NumCars: 0..1) (Age: ; NumCars: 0..1) Not in L 2 Interest Prune Phase: Further pruning candidate set according to user-specified interest level

31 Counting Support of Candidates Partition candidates into groups such that candidates in each group have the same attributes and the same values for categorical attributes. Replace each group with a single super candidate. 1) The common categorical attribute values 2) A data structure representing the set of values of the quantitative attributes e.g. { } { }

32 Counting Support of Candidates’Cont Find which “super-candidates” are supported by the categorical attributes in the record. If categorical attributes of a “super- candidates” are supported by a given record, we need to find which of the candidates in the super-candidates are supported by quantitative attributes.

33 Conclusion Partitioning and combining adjacent partitions Partial Completeness “Greater-than-expected-value” interest measure

34 Questions??