1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.

Slides:

Advertisements

Similar presentations

Association Rule Mining

Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

A distributed method for mining association rules

Data Mining Techniques Association Rule

Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.

Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.

IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms

1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.

Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.

6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Fast Algorithms for Association Rule Mining

Mining Association Rules

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.

Mining Association Rules

Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,

Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.

Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.

Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.

ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.

Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )

3.Mining Association Rules in Large Database 3.1 Market Basket Analysis:Example for Association Rule Mining 1.A typical example of association rule mining.

Mining various kinds of Association Rules

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.

Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.

Data Mining Find information from data data ? information.

Association Rule Mining

Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.

Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.

Elsayed Hemayed Data Mining Course

Data Mining  Association Rule  Classification  Clustering.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.

Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Mining Association Rules in Large Database This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed.

Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*

Frequent Pattern Mining

Association Rules.

Association Rules Zbigniew W. Ras*,#) presented by

Waikato Environment for Knowledge Analysis

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

Data Mining Association Analysis: Basic Concepts and Algorithms

Farzaneh Mirzazadeh Fall 2007

Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS

Association Analysis: Basic Concepts

Presentation transcript:

1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial Engineering 3128 CEBA Building Louisiana State University Baton Rouge, LA and

2 Introduction Background A fuzzy approach for mining associate rules Experimental evaluation Conclusions Outline

3 Introduction Associate analysis is a new and attractive research area in data mining The Apriori algorithm (R. Agrawal, IBM 1993) is a key technique for Associate analysis Though the Apriori principle allows us to considerably reduce the search space, the technique still requires a huge computation, particularly for large databases This research proposes an approach for finding fuzzy sets for quantitative attributes in a database by using clustering techniques and then employs techniques for mining of fuzzy Associate rules.

4 Introduction Background Associate rules and the Apriori algorithm Necessity to find fuzzy sets for quantitative attributes A fuzzy approach for fuzzy mining associate rules Experimental evaluation Conclusions Outline

5 Associate rules: Market basket analysis Analyzes customer buying habits by finding associations between the different items that customers place in their “shopping baskets” (in the form X  Y, where X and Y are sets of items) I = {I1=beer, I2=cake, I3=onigiri} A transactional database An Associate rule: {I1}  {I3} How often people buy candy and beer together? TID1: {I1, I2, I3} TID2: {I1, I2} TID3: {I2, I3} TID4: {I2} TID5: {I1, I2}

6 Rule measures: Support and Confidence  Associate rule: X  Y  support s = probability that a transaction contains X and Y  confidence c = conditional probability that a transaction having X also contains Y  A  C (s=50%, c=66.6%)  C  A (s=50%, c=100%) Customer buys onigiri Customer buys both Customer buys beer

7 Associate mining: the Apriori algorithm It is composed of two steps: 1. Find all frequent itemsets: By definition, each of these itemsets will occur at least as frequently as a pre-determined minimum support count 2. Generate strong Associate rules from the frequent itemsets: By definition, these rules must satisfy minimum support and minimum confidence (Agrawal, 1993)

8 Associate mining: the Apriori principle For rule A  C support = support({A and C}) = 50% confidence = support({A and C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent (if an itemset is not frequent, neither are its supersets) Min. support 50% Min. confidence 50%

9 The Apriori algorithm: Finding frequent itemsets using candidate generation 1.Find the frequent itemsets: the sets of items that have support higher than the minimum support A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemsets Iteratively find frequent itemsets L k with cardinality from 1 to k (k-itemset) from candidate itemsets C k (L k  C k ) 2.Use the frequent itemsets to generate Associate rules. C 1  …  L i-1  C i  L i  C i+1  …  L k

10 Example (min_sup_count = 2) TID List of items_IDs T100 I1, I2, I5 T200 I2, I4 T300 I2, I3 T400 I1, I2, I4 T500 I1, I3 T600 I2, I3 T700 I1, I3 T800 I1, I2, I3, I5 T900 I1, I2, I3 Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2 C1 Itemset Sup.Count {I1} 6 {I2} 7 {I3} 6 {I4} 2 {I5} 2 L1 Transactional data Scan D for count of each candidate Compare candidate support count with minimum support count

11 Example (min_sup_count = 2) Itemset {I1, I2} {I1, I3} {I1, I4} {I1, I5} {I2, I3} {I2, I4} {I2, I5} {I3, I4} {I3, I5} {I4, I5} C2 Scan D for count of each candidate Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I4} 1 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 {I3, I4} 0 {I3, I5} 1 {I4, I5} 0 C2 Compare candidate support count with minimum support count Itemset S.count {I1, I2} 4 {I1, I3} 4 {I1, I5} 2 {I2, I3} 4 {I2, I4} 2 {I2, I5} 2 L2 Generate candidates C3 from L2 by using the Apriori principle Itemset {I1, I2, I3} {I1, I2, I5} Scan D for count of each candidate Itemset Sc {I1, I2, I3} 2 {I1, I2, I5} 2 C3 Compare candidate support count with minimum support count Itemset Sc {I1, I2, I3} 2 {I1, I2, I5} 2 L3 Generate candidates C2 from L1 by using the Apriori principle

12 Necessity to find fuzzy sets for quantitative attributes Transaction IDAgeMarriedNumCars 10033Yes Yes No No0 A quantitative associate rule with min_sup= min_conf =50% (Age = 33 or 39) and (Married = Yes) -> (NumCars =2) A quantitative associate rule with min_sup= min_conf=50% (Age = ) and (Married = Yes) -> (NumCars =2) A fuzzy associate rule with min_sup= min_conf =50% (Age = middle-aged) and (Married = Yes) -> (NumCars =2)

13 Solution: Shape boundary intervals It is composed of two steps: 1.Partition the attribute domains into small intervals and combine adjacent intervals into larger ones such that the combined intervals will have enough supports 2. Replace the original attribute by its attribute-interval pairs, the quantitative problem can be transformed to a Boolean one. (Srikant and Agrawal, 1996)

14 Example: Shape boundary intervals Transaction IDAgeMarriedNumCars 10033Yes Yes No No0 Yes No Age: No Yes Age: No Yes Married Yes No NumCars:0-1 No400 No300 Yes200 Yes100 NumCars:2-3Transaction ID Algorithms ignore or over-emphasize the elements near the boundary of the intervals in the mining process The use of shape boundary interval is also not intuitive with respect to human perception

15 Solution: Experts An user or expert must provide to this algorithm the required fuzzy sets of the quantitative attributes and their corresponding membership functions Fuzzy sets and their corresponding membership functions provided by experts may not be suitable for mining fuzzy Associate rules in the database

16 Solution: Fuzzy sets for quantitative attributes It is composed of three steps: Step 1: T ransform the original database into positive integer Step 2 : For each attribute Cluster values of the attribute i th into k medoids Classify the attribute i th into k fuzzy sets Generate membership functions for each fuzzy set End for Step 3 : Transform the database based on fuzzy sets (Ada, 1998) Lose association between attributes in the mining approach

17 Introduction Background A fuzzy approach for fuzzy mining associate rules Fuzzy approach Fuzzy mining associate rules Experimental evaluation Conclusions Outline

18 Fuzzy approach It is composed of five steps: Step 1: T ransform the original database into one with positive integers Step 2 : Cluster values of attributes into k medoids. Step 3 : Classify attributes into k fuzzy sets Step 4 : Generate membership functions for each fuzzy set Step 5 : Transform the database based on fuzzy sets

19 Fuzzy approach: Step 2 Clustering: The clustering method considers the search space of a database with n attributes as an n-dimensional space Use the Matlab fuzzy tool box Do not lose association between attributes in the mining approach

20 Fuzzy approach: Step 3 Classify: Let {m 1, m 2, …, m k } be k medoids found from step 2, where m i = {a i1, a i2, …, a in } is the medoid i th. Let the attribute j th have a range [min j, max j ] and {a 1j, a 2j, …, a kj } be set of mid-points of the attribute j th. The k fuzzy sets of this attribute will be ranged in [min j, a 2j ], [a 1j, a 3j ], …, [a (i-1)j, a (i+1)j ], …, and [a (k-1)j, max j ] m1m1 a 11 …a j1 …a 1n ……..……… mkmk a k1 …a jn a kn min j max j a (i- 1)j a ij a (i+1)j Fuzzy set

21 Fuzzy approach: Step 4 Generate membership functions (triangular function):

22 Fuzzy approach: Step 5 Transform the database based on fuzzy sets: Let T ij be the value of the i th transaction at the j th attribute T ij = fuzzy label i th if f ij (T ij ) = max(f kj (T ij ))

23 Example of fuzzy approach ID SalaryIQ – High_S – Medium_S – Low_S Mid-pointRangeFuzzy label – 200 High_I – 165 Medium_I – 120Low_I Mid-pointRangeFuzzy label ID Low_I Low_S Medium_IMedium_S Medium_I Medium_S Low_I Low_S High_I High_S Low_I Low_S Low_I Low_S IQSalary ID IQ’s membership Salary’s membership Step 2 Steps 3, 4, 5

24 Fuzzy mining Associate rules (Attilia, 2000) It is composed of two steps: 1.Find all itemsets that have fuzzy support (FS ) above the user specified minimum support. These itemsets are called frequent itemsets. 2.Use the frequent itemsets to generate the desired rules. Let X and Y be frequent itemsets. We can determine if the rule X => Y holds by computing the fuzzy confidence FC, > and this value is larger than the user specified minimum confidence value.

25 Fuzzy mining Associate rules - cont D = {t 1, t 2, …, t n }: transactions with X is attributes and A is the corresponding fuzzy sets in X Z = X U Y, C = A U B

26 Introduction Background A fuzzy approach for fuzzy mining associate rules Experimental evaluation Conclusions Outline

27 Experiments: Synthetic datasets Using synthetic datasets of varying sizes: Name|D||T|Size (MB) D100k.T10100K103M D100k.T20100K206M D320k.T30320K3018M |D| = Number of transactions |T| = Average amount of items on transactions

28 Experiment environment Software Database : Microsoft Access 2003 Language: C++ and Visual Basic, Matlab Platform: Windows Hardware PC Pentium IV-2.66 GMhz, RAM 1GB

29 Evaluate mean of rules From database Salary and IQ, we have rules from the approach with minimum support=43% and minimum confidence = 50% as follows: Rule 1: If 1 st variable is low approximately 7000 [ 4000, 10000] then 2 nd variable is low approximately 100 [50, 120] Rule 2: If 1 st variable is medium approximately [7000, 20000] then 2 nd variable is medium approximately 140 [ 100, 165] the Apriori algorithmMining quantitative algorithm with fuzzy approach No frequent ItemsetsFrequent Itemset 1 1 st variable is low approximately 7000 [4000, 10000], 2 nd variable is low approximately 100 [50, 120] Frequent Itemset 2 1 st variable is medium approximately [7000, 20000], 2 nd variable is medium approximately 140 [ 100, 165] Minimum support = 43%

30 Evaluate mean of rules - cont the Apriori algorithmMining quantitative algorithm Frequent Itemset 1 1 st variable is 5000, 2 nd variable is 85 Frequent Itemset 2 1 st variable is 7000, 2 nd variable is 100 Frequent Itemset 3 1 st variable is 9000, 2 nd variable is 110 Frequent Itemset 4 1 st variable is 10000, 2 nd variable is 120 Frequent Itemset 5 1 st variable is 15000, 2 nd variable is 140 Frequent Itemset 6 1 st variable is 20000, 2 nd variable is 165 Frequent Itemset 7 1 st variable is 30000, 2 nd variable is 183 Frequent Itemset 1 1 st variable is low approximately 7000 [ 4000, 10000], 2 nd variable is low approximately 100 [50, 120] Frequent Itemset 2 1 st variable is high approximately [15000, 32000], 2 nd variable is high approximately 183 [140, 200] Frequent Itemset 3 1 st variable is medium approximately [7000, 20000], 2 nd variable is medium approximately 140 [ 100, 165] minimum support = 15%

31 Evaluate fuzziness ID IQ’s membership Salary’s membership ID IQ’s membership Salary’s membership AdaNew approach Using the Yager’s fuzziness with p = 1 Ada_fuzziness_Salary ≈ ≤ NewApproach_fuzziness_Salary ≈ Ada_fuzziness_IQ ≈ 0.51 ≤ NewApproach_fuzziness_IQ ≈ 0.59 The new approach is fuzzier than Ada

32 Evaluate fuzziness - cont Ada’s approachNew approach Frequent Itemset 1 1 st variable is low approximately 5000 [ 4000, 10000], 2 nd variable is low approximately 85 [50, 120] Frequent Itemset 2 1 st variable is high approximately [15000, 32000], 2 nd variable is high approximately 165 [140, 200] Frequent Itemset 3 1 st variable is medium approximately [7000, 20000], 2 nd variable is medium approximately 120 [ 100, 165] Frequent Itemset 1 1 st variable is low approximately 7000 [ 4000, 10000], 2 nd variable is low approximately 100 [50, 120] Frequent Itemset 2 1 st variable is high approximately [15000, 32000], 2 nd variable is high approximately 183 [140, 200] Frequent Itemset 3 1 st variable is medium approximately [7000, 20000], 2 nd variable is medium approximately 140 [ 100, 165] minimum support = 15% In Ada’s Approach, mid points of ranges are moved out centre values. This leads to change mean of frequent itemsets.

33 Execution time (sec.) with different minimum support thresholds NameMin_sup = 35%Min_sup = 40%Min_sup = 50% AprioriFuzzy*AprioriFuzzy *AprioriFuzzy * D100k.T D100k.T D320k.T *: do not include the transfer time NameTransferring time a database into fuzzy sets D100k.T3095 D100k.T D320k.T309112

34 Execution time (sec.) with different minimum support thresholds - cont Execution time (transfer + mining time) of the fuzzy method is better than the Apriori. Moreover, mean of rules is more “Understandable”

35 Conclusions Proposed an approach to find fuzzy sets for quantitative attributes for mining associate rules An experimental evaluation shows that the mean of rules and execution time when using the fuzzy approach in mining Associate rules are better than that of other algorithms Future work: Improve the fuzzy mining approach Develop incremental algorithms for associate analysis using Support Vector Machines

36 THANK YOU H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial Engineering 3128 CEBA Building Louisiana State University Baton Rouge, LA and