Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.

Similar presentations


Presentation on theme: "Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework."— Presentation transcript:

1 Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 1

2 Related Work 2

3 3

4 4

5 5

6 Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 6

7 Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 7

8 Methodologies to Specify Items’ MIS values 8 Liu et. al. (KDD ’99) have introduced percentage-based methodology to specify items’ MIS values. Percentage-based Methodology: – Items’ MIS values are equivalent to the percentage of their respective support. MIS(i j ) = maximum (S(i j ) * β, LS) where, S(i j ) = support of an item i j in I LS = lowest MIS an item can have β = user-specified constant, [0, 1] This methodology still suffer from rare item problem.

9 Rare Item Problem in Percentage Based Methodology 9

10 Proposed Methodology to Specify Items’ MIS values 10

11 Experimental Results 11 Dataset 1.Synthetic dataset Total items: 870 Total number of Transactions: 1,00,000. 2.Real-world dataset. 1.Total items: 83 2.Total number of transactions: 298 Parameter values: – LS = 0.1 – α = mean of the support of all frequent items. – β = varied at 0.25, 0.5 and 0.9 Algorithms – Apriori algorithm – MSApriori – uses percentage-based methodology – IMSApriori – uses support difference-based methodology. Table 3: SD values used in different datasets.

12 Experiment 1: Analysis of MIS values specified by both methods 12 Figure: MIS values specified by percentage-based methodology in synthetic dataset. Figure: MIS values specified by support difference-based methodology in synthetic dataset.

13 Experiment 2: Generation of Frequent Patterns 13 Figure: Generation of frequent patterns in synthetic and retail datasets.

14 Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework – Contribution of this thesis Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 14

15 Improved Multiple Minimum Support Based Frequent Pattern Mining Approach es. 15

16 CFP-growth Algorithm E.g. 16 ac bd e f g h 10 87 3 33 2 Items MIS

17 CFP-growth Algorithm 2. Using the sorted list of items, an FP-tree-like structure known as MIS- tree is constructed with every scan on the transactional database. 17 Figure 19: Construction of MIS- tree. (a) Before scanning the database. (b) After scanning first transaction. (c) After scanning second transaction. (d) After scanning every transaction.

18 CFP-growth Algorithm 3.From MIS-tree, the items which cannot generate any frequent pattern are removed by using the following criterion. “Items whose support is less than the lowest MIS value among all items cannot generate any frequent pattern.” The lowest MIS value is 2. Therefore, the item ‘h’ that has support less than 2 is removed from the MIS-tree. 18 ac bd e f g h 10 87 3 33 2 Items MIS 129 98 5 32 1 Sup.

19 CFP-growth Algorithm 4.The resultant tree is known as the compact MIS-tree. 19 Figure: Compact MIS-tree created after pruning item ‘h’ from the MIS-tree.

20 CFP-growth Algorithm 5.The compact MIS-tree is mined using conditional pattern bases to discover complete set of frequent patterns. 6.Since downward closure property no longer holds, the CFP- growth builds conditional pattern bases until it is empty for a suffix pattern. 20 Figure: Mining frequent patterns from the MIS-tree.

21 Performance Issues in CFP-growth 21 1.The criterion used by CFP-growth to prune the items from the MIS-tree still considers some of those items which cannot generate any frequent pattern. CFP-growth prunes the item ‘h’ and considers ‘a, b, c, d, e, f and g’ items for generating frequent patterns. However, ‘g’ cannot generate frequent pattern as its support is less than the lowest MIS value among all remaining items. 2.Searches in some of those infrequent suffix patterns which cannot generate any frequent pattern at any higher order. ac bd e f g h 10 87 3 33 2 Items MIS 129 98 5 32 1 Sup.

22 An Improved CFP-growth Algorithm: CFP-growth++ 22

23 Correctness of the observations 23

24 Four pruning techniques 24

25 Four pruning techniques 25

26 Working of CFP-growth++ Algorithm 26

27 Working of CFP-growth++ Algorithm 27

28 Step 1: Construction of MIS-tree 28 The algorithm constructs MIS-tree using the user-specified items’ MIS values. Figure :MIS-tree constructed after scanning every transaction in the database.

29 Step 2: Construction of Compact MIS- tree 29 Using least minimum support, CFP-growth++ prunes all those items which cannot generate any frequent pattern at higher order. Figure : MIS-tree after completely pruning the items ‘g’ and ‘h’. Note that ‘g’ is not pruned in CFP-growth.

30 Step 2: Construction of Compact MIS-tree 30 Using infrequent leaf node pruning, the leaf nodes of the infrequent items are pruned from the MIS-tree. The resultant tree is known as compact MIS-tree. Figure : Compact MIS-tree generated after infrequent node pruning.

31 Step 3: Mining Compact MIS-tree 31 Using conditional minimum support and conditional closure property, compact MIS-tree is mined using conditional pattern bases to discover complete set of frequent patterns. Figure : Mining Compact MIS-tree Using Conditional Pattern Bases.

32 Experimental Results 32 Table 4: Dataset characteristics. Datasets Percentage-based methodology is used for specifying items’ MIS values. LS=minsup=0.1 β=1/α and varied α from 1 to 20.

33 Experiment 1: Generation of Frequent patterns. 33 Figure : Generation of frequent patterns in different datasets.

34 Experiment 2: Runtime Requirements 34 Figure : Runtime taken by various algorithms in different datasets.

35 Experiment 3: Scalability Test β=0.5 and LS=0.1 Experimental procedure – Dataset: Kosark – We divided the dataset into five portions of 0.2 million transactions in each part. – Each part is added to one another. 35 Figure : Runtime taken by different algorithms.

36 Summary of the Contributions TopicExisting methodology Performance problemProposed Methodology Specifying items’ MIS values Percentage- based methodology Causes rare item problem as it will not maintain uniform difference between items’ support and MIS values Support-difference based methodology Patterns do not satisfy downward closure property CFP-growth1.Constructed tree is not efficient 2.Search space is huge as it searches using those items that cannot generate frequent pattern at higher order. CFP-growth++ uses “least minimum support”, and “infrequent leaf node pruning” to construct tree effectively. In addition, uses “conditional minsup” and “conditional closure property” to effectively reduce the search space. 36

37 Summary of Contributions TopicExisting methodology Performance problemProposed Methodology Not sufficient for databases of widely varying items’ frequencies Multiple minimum support framework Generates uninteresting frequent patterns containing both very high and very low frequency items. The items within the pattern are not correlated. A new interestingness measure “item-to- pattern difference” has been extended to prune such interesting frequent patterns. Periodic- frequent pattern mining. Single minimum support and single maximum periodicity framework The rare item problem.1.The multiple minimum supports and maximum periodicity framework 2.A pattern growth algorithm 37

38 Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework Related Work Proposed Approaches – Methodology to specify items’ MIS values – An algorithm to mine frequent patterns effectively. – Mining frequent patterns in databases in which items’ frequencies vary widely. – Mining rare periodic-frequent patterns. Conclusions and Future Work 38

39 Conclusions and Future Work 39

40 Conclusions and Future Work 40

41 References

42

43

44


Download ppt "Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework."

Similar presentations


Ads by Google