Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.

Similar presentations


Presentation on theme: "Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data."— Presentation transcript:

1 Chap 6: Association Rules

2 Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data. Example : supermarket transaction => barcode,website automatically record purchase data  These data provides POSSIBLE interaction among each item.  Supermarket buying @ transaction data might provides consumer buying pattern!

3 Association Analysis  Association analysis is a popular data mining technique aimed at discovering novel and interesting relationships between the data objects present in a database.  Is used to estimate the probability of whether a person will purchase a product given that they own particular product or group of products  “Market Basket Analysis” looks at transactions to see which products get bought together.

4 Association Rules  Known as “market-basket analysis”.  Aims  to find regularities behaviors ~ to find set of products that are frequently be bought together!  Rules structure: { X ^ Y }  Z Antecedent @ priori consequence @ apriori  Example: “if costumer bought milk and eggs, they often bought sugar too!” association rules: (milk ^ eggs)  {sugar}

5 Apriori Algorithm  Method to find frequent patterns, associations and causal structures among set of items.  Main concept ~ frequent itemsets  itemset that appeared more and related with another item How? Using support and confidence value.  Given item {X,Y,Z}: Support (S): probability that a transaction X ^ Y  Z contain all given transaction (X,Y,Z). Measures how often the rules occur in database Confidence (C) : conditional probability that a transaction X ^ Y  Z contains only item Z. sometimes known as “accuracy” Measures the strength of the rules

6 Support and Confidence Concept  Given items X,Y with T transaction, if rule X  Y therefore: Support : Confidence:  The support and confidence value is ranges between 0 and 1.  Only rule that exceed minimum support will be generated. = transaction that contain every item in A and B / (total number of transactions) =Transaction that contain every item in A and B / transaction that contain the items in A

7 Support and Confidence Concept  Example: (A ^ B)  D IDItems 1A,D,E 2A,B,C 3A,B,C,D 4A,B,E,C 5A,C,B,D support confidence

8 Illustrating Apriori Algorithm Principles  Collect single item counts, find the combination of k itemsets and evaluate until finish. IDItems 1Bread, Milk 2Cheese, Diaper, Bread, Eggs 3Cheese, Coke, Diaper, Milk 4Cheese, Bread, Diaper, Milk 5Coke, Bread, Diaper, Milk Given minimum support, s = 3

9 Illustrating Apriori Algorithm Principles (cont.)  Convert into single itemsets. (with s=3) IDCounts Bread4 Coke2 Cheese3 Milk4 Diaper4 Eggs1 IDCounts Bread, Milk3 Bread, Cheese2 Bread, Diaper3 Milk, Cheese2 Milk, Diaper3 Cheese, Diaper3 S >= 3 * Prune items COKE & EGGS because the count < 3 (1 st itemsets)(2 nd itemsets)

10 Illustrating Apriori Algorithm Principles (cont.)  Support And Confidence. (with s=3) RelationsLiftSupport(%)Confidence(%)Transaction CountRule 20.960753Milk ==> Diaper 20.960753Diaper ==> Milk 20.960753Milk ==> Bread 20.960753Bread ==> Milk 21.360753Diaper ==> Cheese 21.3601003Cheese ==> Diaper 20.960753Diaper ==> Bread 20.960753Bread ==> Diaper

11 Interpreting Support and Confidence  Confidence measure the strength the rules, whereas support measures how often it should occur in the database. For example, look at Diaper  Cheese. With a confidence of 75%, this indicates that this rule holds 75% of the time it could. That is, ¾ times that Diaper occur, so does Cheese. The support value of 60% indicate that, this rules exists almost 60% of the all transaction.

12 Example(Association Rules) Rule A  D C  A A  C B & C  D Support 2/5 1/5 Confidence 2/3 2/4 2/3 1/3 A B C A C D B C D A D E B C E

13 Implication? Checking Account 5003,500 1,0005,000 No Yes NoYes Saving Account 4,000 6,000 10,000 Support(SVG  CK) = 50% Confidence(SVG  CK) = 83% Lift(SVG  CK) = 0.83/0.85 < 1

14 Lift is equal to the confidence factor divided by the expected confidence. Lift is a factor by which the likelihood of consequent increases given an antecedent. Expected confidence is equal to the number of consequent transactions divided by the total number of transactions. A creditable rule has a large confidence factor, a large level of support, and a value of lift greater than 1. Rules having a high level of confidence but little support should be interpreted with caution. Apriori Algorithm Principles (cont.)


Download ppt "Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data."

Similar presentations


Ads by Google