Presentation is loading. Please wait.

Presentation is loading. Please wait.

Elsayed Hemayed Data Mining Course

Similar presentations


Presentation on theme: "Elsayed Hemayed Data Mining Course"— Presentation transcript:

1 Elsayed Hemayed Data Mining Course
association Rules Elsayed Hemayed Data Mining Course

2 Outline Introduction Association Rules Mining
Measure of Rule Interestingness Market Basket Analysis Apriori Algorithm Acknowledgement: some of the material in these slides are from [Max Bramer, “Principles of Data Mining”, Springer-Verlag London Limited 2007] Association Rules

3 Introduction Association Rule: Example:
represents an association between the values of certain attributes and those of others Example: If we have a financial dataset one of the rules extracted might be as follows: IF Has-Mortgage = yes AND Bank Account Status = In credit THEN Job Status = Employed AND Age Group = Adult under 65 Association Rules

4 Association Rule Discovery: Application 1
Marketing and Sales Promotion: Let the rule discovered be {Bagels, … } --> {Potato Chips} Potato Chips as consequent => Can be used to determine what should be done to boost its sales. Bagels in the antecedent => Can be used to see which products would be affected if the store discontinues selling bagels. Bagels in antecedent and Potato chips in consequent => Can be used to see what products should be sold with Bagels to promote sale of Potato chips! Association Rules

5 Association Rule Discovery: Application 2
Supermarket shelf management. Goal: To identify items that are bought together by sufficiently many customers. Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items. Example If a customer buys diaper, then he is very likely to buy milk Association Rules

6 Association Rule Mining
ARM: Association Rule Mining. GRI: Generalised Rule Induction There are large number of possible Association Rules for a given dataset. However, a high proportion of these rules are of little (if any) value. The main difficulty with association rule mining is computational efficiency. If there are say 10 attributes, and each attributes can have 5 different values, how many rules can we have? Association Rules

7 Measures of Rule Interestingness
To distinguish between one rule and another we need some measures of rule quality. A single high quality rule linking the values of attributes in a financial dataset or the purchases made by a supermarket customer, may be of significant commercial value. Association Rules

8 Notation Used If we have a rule in the form of
If LEFT then RIGHT We define the following counters: NLEFT Number of instances matching LEFT NRIGHT Number of instances matching RIGHT NBOTH Number of instances matching both LEFT and RIGHT NTOTAL Total number of instances Association Rules

9 Basic Measures of Rule Interestingness
Confidence = NBOTH / NLEFT (Predictive Accuracy, Reliability) The proportion of right-hand sides predicted by the rule that are correctly predicted Support = NBOTH/NTOTAL The proportion of the training set correctly predicted by the rule Completeness = NBOTH/NRIGHT The proportion of the matching right-hand sides that are correctly predicted by the rule Association Rules

10 Discriminability This measures how well a rule discriminates between one class and another. Discriminability= 1 − (NLEFT − NBOTH)/(NTOTAL − NRIGHT ) = 1− (number of misclassifications produced by the rule) / (number of instances with other classifications) If the rule predicts perfectly, i.e. NLEFT = NBOTH, the value of discriminability is 1. Association Rules

11 RI: Rule Interestingness
RI = NBOTH − (NLEFT × NRIGHT /NTOTAL) RI measures the difference between the actual number of matches and the expected number if the left- and right-hand sides of the rule were independent. Generally the value of RI is positive. A value of zero would indicate that the rule is no better than chance. A negative value would imply that the rule is less successful than chance. Association Rules

12 Example If we have Then NLEFT = 65 NRIGHT = 54 NBOTH = 50 NTOTAL = 100
Confidence = NBOTH/NLEFT = 50/65 = 0.77 Support = NBOTH/NTOTAL = 50/100 = 0.5 Completeness = NBOTH/NRIGHT = 50/54 = 0.93 Discriminability = 1 − (65 − 50)/(100 − 54) = 0.67. RI = 50 – (65x54/100) = 14.9 Association Rules

13 Market Basket Example Transaction_Id Time Items_bought 101 6:35
Milk, bread, cookies, juice 792 7:38 Milk, juice 1130 8:05 Milk, eggs 1735 8:40 Bread, cookies, coffee Rule Nleft Nright Nboth Ntotal MilkJuice 3 2 4 BreadJuice 1 Milk Egg Milk Cookies Association Rules

14 Rule Interestingness Measures
Confidence = NBOTH / NLEFT Support = NBOTH/NTOTAL Completeness = NBOTH/NRIGHT Discriminability.= 1 − (NLEFT − NBOTH)/(NTOTAL − NRIGHT ) Rule Nleft Nright Nboth Ntotal Conf Supp Compl Discr RI MilkJuice 3 2 4 0.67 0.5 1.0 BreadJuice 1 0.25 Milk Egg 0.33 Milk Cookies -0.25 Association Rules

15 Measures Analysis Transaction_Id Time Items_bought 101 6:35
Milk, bread, cookies, juice 792 7:38 Milk, juice 1130 8:05 Milk, eggs 1735 8:40 Bread, cookies, coffee Rule Nleft Nright Nboth Ntotal Conf Supp Compl Discr RI MilkJuice 3 2 4 0.67 0.5 1.0 BreadJuice 1 0.25 Milk Egg 0.33 Milk Cookies -0.25 Association Rules

16 Market Basket Analysis
The rules generated for Market Basket Analysis are all of a certain restricted kind. We are interested in any rules that relate the purchases made by customers in a shop, Similar Applications: Analysis of items purchased by credit card patients’ medical records, crime data and data from satellites. Association Rules

17 Terminology A database comprising n transactions (i.e. records),
Each of which is a set of items ({milk, cheese, bread} The items in the itemset are ordered. {cheese, fish, meat}, not {meat, fish, cheese} There are m possible items that can be bought And I denotes the set of all possible items. Rule: L R with L and R are sets each containing at least one member and are disjoint. So the min cardinality of (L U R) is two Association Rules

18 Market Basket Example n=8, m=5 and I = {a, b, c, d, e},
Association Rules

19 Basic ARM Generate all supported itemsets L ∪ R (support > minsub) with cardinality at least two. For each such itemset generate all the possible rules with at least one item on each side and retain those for which confidence ≥ minconf. For m items then we have 2^m-m-1 possible itemsets of at least cardinality 2. If m=20, Num = 1, 048, 555, If m=100, Num = 10^30 Association Rules

20 Apriori Algorithm Apriori algorithm shows how association rules could be generated in a realistic timescale, at least for relatively small databases. Its idea is based on the theorem that: If there are no supported itemsets of cardinality k, Then there are no supported subsets of cardinality k+1 or larger Association Rules

21 Apriori Algorithm Idea
Generate the supported itemsets in ascending order of cardinality, i.e. all those with one element first, then all those with two elements, then all those with three elements etc. At each stage, the set Lk of supported items of cardinality k is generated from the previous set Lk−1. If Lk is ∅, then no need to generate Lk+1 or higher Association Rules

22 Generating Supported Itemsets Example
For database with 100 items Construct C1 (one element itemset). We have 100 itemsets Count the support in the database to calculate L1, the supported itemset Let L1 be {a}, {b}, {c}, {d}, {e}, {f}, {g} and {h} Generate C2 from L1 Count the support Calculate L2, the supported itemset Association Rules

23 Generating C2 There are 28 possible itemsets of cardinality 2 that can be formed from the items a, b, c, , h. They are {a, b}, {a, c}, {a, d}, {a, e}, {a, f}, {a, g}, {a, h}, {b, c}, {b, d}, {b, e}, {b, f}, {b, g}, {b, h}, {c, d}, {c, e}, {c, f}, {c, g}, {c, h}, {d, e}, {d, f}, {d, g}, {d, h}, {e, f}, {e, g}, {e, h}, {f, g}, {f, h}, {g, h} Association Rules

24 Generating Supported Itemsets Example – cont.
Assume L2 = {{a, c}, {a, d}, {a, h}, {c, g}, {c, h}, {g, h}} Then C3 = {a, c, d}, {a, c, h}, {a, d, h} and {c, g, h} But Itemsets {a, c, d} and {a, d, h} is not possible, because their subsets {c, d} and {d, h} are not members of L2. So C3 is only {a, c, h} and {c, g, h} Assume L3= {{a, c, h}, {c, g, h}} C4 is empty so is L4 and L5, L6…etc and the process ends The set of all supported itemsets with at least two members is the union of L2 and L3, i.e. {{a, c}, {a, d}, {a, h}, {c, g}, {c, h}, {g, h}, {a, c, h}, {c, g, h}}. Generate the candidate rules from each of these and determine which of them have a confidence value greater than or equal to minconf. Association Rules

25 Generating Rules for a Supported Itemset
If supported itemset L ∪ R has k elements, we can generate all the possible rules L → R systematically from it and then check the value of confidence for each one. Generate all possible right-hand sides in turn. Each one must have at least one and at most k−1 elements. Having generated the right-hand side of a rule all the unused items in L∪R must then be on the left- hand side. Association Rules

26 Generating Rules Example
For itemset {c, d, e} there are 6 possible rules that can be generated, as listed below. Only one of the rules has a confidence value greater than or equal to minconf (i.e. 0.8). Association Rules

27 Speeding up the generation process
Transferring members of a supported itemset from the left-hand side of a rule to the right-hand side cannot increase the value of rule confidence. If the original rule is A ∪ B → C Then a new rule is A → B ∪ C Since support(A) ≥ support(A ∪ B), then confidence(A → B ∪ C) ≤ confidence(A ∪ B → C). Thus: Any superset of an unconfident right-hand itemset is unconfident. Any (non-empty) subset of a confident right-hand itemset is confident. Association Rules

28 Speeding up the generation process
For the previous example There is no need to consider c→ ed, e→ cd since their right- hand subset ce→ d, is unconfident. What about the others? The process stop when there is no more confident itemsets Association Rules

29 Summary Introduction Association Rules Mining
Measure of Rule Interestingness Market Basket Analysis Apriori Algorithm Association Rules


Download ppt "Elsayed Hemayed Data Mining Course"

Similar presentations


Ads by Google