Download presentation
Presentation is loading. Please wait.
Published byJanis Rose Modified over 9 years ago
1
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang 報告人 : Huai-Ping Chu 2008/11/15
2
Outline Abstract Introduction Review of related mining algorithms The proposed algorithm An example Conclusion
3
Abstract In real applications, different items may have different support criteria to judge their importance, taxonomic relationships among items may appear, and data may have quantitative values. A fuzzy multiple-level mining algorithm for extracting knowledge implicit in quantitative transactions with multiple minimum supports of items is proposed to derive large itemsets and discover cross-level fuzzy association rules under the maximum-itemset minimum-taxonomy support constraint.
4
Introduction An association rule is expressed as the form A B, where A and B are sets of items, such that the presence of A in a transaction will imply the presence of B in the same transaction. Srikant & Agrawal proposed a method for mining association rules from data sets using quantitative and categorical attributes. Hong et al. proposed a fuzzy mining algorithm for managing quantitative data.
5
Introduction (cont.) Liu et al. proposed an approach for mining association rules with non-uniform minimum support values, which allowed users to specify different minimum supports to different items and used the lowest minimum support among all the items in the itemset as the minimum support value of the itemset. Lee, Hong & Lin proposed a simple and efficient algorithm based on the apriori approach to generate large itemsets under the maximum constraints of multiple minimum supports.
6
Introduction (cont.) Han et al. and Agrawal et al. proposed respectively algorithms to discover association rules on multiple- level taxonomic relationships among items. This paper thus proposes a fuzzy multiple-level mining algorithm with multiple supports of items for extracting implicit knowledge from transactions stored as quantitative values, which integrates fuzzy- set concepts, data-mining technologies and multiple- level taxonomy to find fuzzy association rules.
7
Review of related mining algorithms Mining multiple-level association rules. Mining association rules with multiple minimum supports.
8
1. Mining multiple-level association rules Relevant item taxonomies are usually predefined in real-word applications and can be represented as hierarchy tree. Terminal nodes on the trees represent actual items appearing in transactions; internal nodes represent classes or concepts formed from lower-level nodes.
9
The method of Han & Fu : Nodes in predefined taxonomies are first encoded using sequences of numbers and the symbol “ * ” according to their positions in the hierarchy tree. (1**) (11*) (111)(112) (12*) (2**) (21*) (22*) (211)(212)
10
A top-down progressively deepening search approach is used and exploration of “ level-crossing ” association relationships is allowed. Candidate itemsets at certain levels may thus contain items at lower levels. EX: Large items at level 2 may be paired with large items at level 1 to form candidate 2-itemsets at level 2 (such as {11*,2**}).
11
2. Mining association rules with multiple minimum supports Liu et al. proposed an approach for mining association rules with non-uniform minimum support values, allowing users to specify different minimum supports to different items. The minimum support value of an itemset is defined as the lowest minimum supports among the items in the itemset.
12
The minimum support of an item means that the occurrence frequency of the item must be larger than or equal to it for being considered in the next mining steps. If the support of an item is not larger than or equal to the support threshold, the item is not worth considering. When the minimum support value of an itemset is defined as the lowest minimum supports of the items in it, the itemset may be large, but items included int it may be small.
13
EX : Minimum support of item A is 20%. Minimum support of item B is 40%. If the support of item B is 30%, smaller than its minimum support 40%, and then the 2-itemset {A,B} should note be worth considering. It is meaningful to assign the minimum support of an itemset as the maximum of the minimum supports of the items contained in the itemset.
14
The proposed algorithm The mining algorithm for fuzzy multiple-level association rules under the maximum-itemset minimum-taxonomy support constraint of multiple minimum supports: INPUT: A set of quantitative transaction data, a taxonomy with the primitive items assigned their own minimum supports, a set of of membership functions, and a minimum confidence value. OUTPUT: A set of fuzzy multiple-level association rules under maximum constraints of multiple minimum supports.
15
Step 1: Encode the taxonomy using a sequence of numbers and the symbol “ * ”. Step 2: Translate the item names in the transaction data according to the encoding schema. Step 3: Group the items with the same first k in each transaction D i, and add the amounts of the items in the same groups in D i.
16
Step 4: Calculate the occurring count of each group in all the transactions. Remove the group with their counts less than their respective support thresholds. Step 5: Transform the quantitative value of each remaining group in each transaction data into a fuzzy set f ij represented as (f k ij1 /R k j1 + f k ij2 /R k j2 + … + f k ijh /R k jh ), k is the level number, h is the number of fuzzy regions for I k j.
17
Step 6: Collect the fuzzy regions (linquistic terms) with membership values > 0 to form the candidate set C k 1. Step 7: Check whether the value count k jl of each region R k jl in C k 1 ≧ the threshold, which is the minimum of minimum supports of the primitive items desceding from it. If R k jl satisfies the threshold, put it into the large 1-itemset (L k 1 ) for level k.
18
Step 8: Generate the candidate set C k 2 from L 1 1, L 2 1, …, L k 1 to find “level-crossing” large itemsets with satisfying following condition: Each 2-itemset in C k 2 must contain at least one item in L k 1. The two regions in a 2-itemset may not have the same item name. The two item names in a 2-itemset may not be with the hierarchy relation in the taxonomy. Both of the support values of the two large 1-itemsets comprising a candidate 2-itemset must ≧ the maximum of the minimum supports of the two large 1-itemsets.
19
Step 9: Do the following substeps for each newly formed candidate 2-itemset s with regins(s 1, s 2 ) in C k 2 : Calculate the fuzzy value of s in each transaction D i as f is = f is1 Λ f is2 Calculate the scalar cardinality of s in all the transaction data as count s = Σf is If count s ≧ the maximum of the minimum supports of the items contained in it, put s into L k 2.
20
Step 10: Repeat above similar steps and generate all large q-itemset. Step 11: Construct the fuzzy association rules for the q-itemset by the following substeps: Form all possible association rules as follows: S 1 Λ … Λ S r-1 Λ S r+1 Λ … Λ S q S r r=1 to q Calculate the confidence values of all association rules by
21
Step 12: Output the rules with confidence values ≧ the predefined confidence value.
22
An example
29
All possible association rules are formed as follows: If 2** = Middle, then 3** = Middle; If 3** = Middle, then 2** = Middle; If 21* = Middle, then 22* = Low; If 22* = Low, then 21* = Middle; If 22* = Low, then 32* = Middle; If 32* = Middle, then 22* = Low.
30
The confidence of the above association rules are calculated – If 2** = Middle, then 3** = Middle, with conf = 0.74. If 3** = Middle, then 2** = Middle, with conf = 0.69. If 21* = Middle, then 22* = Low, with conf = 0.82. If 22* = Low, then 21* = Middle, with conf = 046. If 22* = Low, then 32* = Middle, with conf = 0.97. If 32* = Middle, then 22* = Low, with conf = 1.0.
31
Assume the confidence is set at 0.8 in this example. The following three association rules are generated. If 21* = Middle, then 22* = Low, with conf = 0.82. If 22* = Low, then 32* = Middle, with conf = 0.97. If 32* = Middle, then 22* = Low, with conf = 1.0.
32
Conclusion This algorithm offers an solution for three issues that usually occur in real mining application: using different criteria to judge the importance of different items, managing taxonomic relationships among items, and dealing quantitative data sets. In this algorithm, the minimum support for an item at a higher taxonomic concept is set as the minimum of the minimum supports of the items belonging to it and the minimum support for an itemset is set as the maximum of the minimum supports of the items contained in the itemset.
33
THANK YOU !!
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.