Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb.

Similar presentations


Presentation on theme: "Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb."— Presentation transcript:

1 Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb

2 Overview Aims Association Rule Discovery Generalized Rule Discovery Tidsets and Diffsets Conclusion

3 Aims 1. To mine negative rules in a large database using GRD 2. To assess whether the negative rules are of potential interest to a user

4 Association Rule Discovery Rule: A => B (e.g. tea => coffee) A is the antecedent B is the consequent Aim: Searches database to find strong associations between itemsets Itemsets are subsets of the dataset e.g. tea in a supermarket

5 Association Rule Discovery (Contd.) Support of Tea => Coffee: Transactions with Tea or Coffee / |Data space| Confidence of Tea => Coffee : Transactions with Tea or Coffee /Transactions with Tea

6 Association Rule Discovery (Contd.) Generates rules based on minimum support (frequent itemsets) Further constraints can be applied, e.g. confidence (interest)

7 Generalized Rule Discovery An alternative Association Rule Discovery Uses The OPUS Algorithm for an unordered Search [Webb, 95] Generates large number of rules based on user specified constraints. Constraints include minimum support, confidence, etc.

8 Tidsets and Diffsets [Zaki, Gouda, 01] Every itemset is stored with it’s corresponding transaction set (Tidsets) Using Vertical Mining has proved to be more efficient than Horizontal Mining. TeaCoffeeMilk 111 22 33

9 Tidsets and Diffsets (contd.) Diffsets are a set of transactions that the itemset does not appear in. Diffsets are Tidsets for a negative association of an itemset. TeaCoffeeMilkDiffset (Tea) Diffset (Coffee) Diffset (Milk) 111 222 333

10 Tidsets and Diffsets (contd.) GRD calculates Tidsets for an Itemset Therefore Diffsets for an Itemset can be computed with very little extra cost. ABC~A~B~C 111 222 333 444 555

11 Conclusion Find negative correlations between Itemsets in a database. Rule: tea => ~coffee, ~tea => coffee, ~tea => ~ coffee This will be achieved by extending the GRD technique.

12 Conclusion (Contd.) Using diffsets: tidsets A = diffset ~A Negative associations can be calculated with very little additional computational overheads Assess whether the results of negative correlations are potentially interesting or not

13 Any questions?

14


Download ppt "Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb."

Similar presentations


Ads by Google