Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume.

Similar presentations


Presentation on theme: "Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume."— Presentation transcript:

1 Association Rule.

2 Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume all data are categorical.  Initially used for Market Basket Analysis to find how items purchased by customers are related.

3 Transaction data: supermarket data  Market basket transactions: t1: {bread, cheese, milk} t2: {apple, eggs, salt, yogurt} … tn: {biscuit, eggs, milk}  Concepts: An item: an item/article in a basket I: the set of all items sold in the store A transaction: items purchased in a basket; it may have TID (transaction ID) A transactional dataset: A set of transactions

4 The model: rules  A transaction t contains X, a set of items (itemset) in I, if X  t.  An association rule is an implication of the form: X  Y, where X, Y  I, and X  Y =   An itemset is a set of items. E.g., X = {milk, bread, cereal} is an itemset.  A k-itemset is an itemset with k items. E.g., {milk, bread, cereal} is a 3-itemset

5 Rule strength measures  Support: The rule holds with support sup in T (the transaction data set) if sup% of transactions contain X  Y. sup = Pr(X  Y)= Count (X  Y)/total count.  Confidence: The rule holds in T with confidence conf if conf% of transactions that contain X also contain Y. conf = Pr(Y | X)=support(X,Y)/support(X).  An association rule is a pattern that states when X occurs, Y occurs with certain probability.

6  Goal of Association Rule. Find all rules that satisfy the user- specified minimum support (minsup) and minimum confidence (minconf).

7 An Example.  Transaction data  Assume: minsup = 30% minconf = 80%  An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7]  Association rules from the itemset: Clothes  Milk,Chicken[sup = 3/7, conf = 3/3] …… Clothes, Chicken  Milk[sup = 3/7, conf = 3/3] t1: Beef, Chicken, Milk t2:Beef, Cheese t3:Cheese, Boots t4:Beef, Chicken, Cheese t5:Beef, Chicken, Clothes, Cheese, Milk t6:Chicken, Clothes, Milk t7:Chicken, Milk, Clothes

8 Data set.  This data set related to retail industry.  The data set contains information of each transaction with the transaction ids.  Each row represent a single transaction,i.e information of a single customer.  For example if a row present the data like this- {Bread sandwich,Milk,Egg,Butter}, it means this customer has taken those mentioned item in a single transaction.

9 Objective.  Here our main objective is to find out the pattern of buying from this huge data base  The discovery of such association rule can help people to develop marketing strategies by gaining insight into, which items are frequently purchased together by customer.  Here we have taken the following parameters,  Minsup=.08  Minconf=.40  Mincorr=.30

10 Analysis.  The spreadsheet showing the frequently item set with the support values.  From the table it is clear that Fluid milk has the maximum frequencies followed by Bananas,Salad vegetable, Eggs etc.  This means most of the customers has taken these three items into their basket.

11  The fifth rule has got highest confidence value 58.83424%,which means 58% of customers who are taking Eggs also taking Fluid milk.  Similarly 54% of customers who are taking Tomatoes also taking Salad vegetables.  Same way 52% of customer who are taking Bread Sandwiches also taking Fluid milk.

12 Rule Graph.  This will represent the entire Association rules Graphically, which will help us to understand the entire process in a single snapshot.  In this graph, the support values for the Body and Head portions of each association rule are indicated by the sizes and colors of each circle.  The thickness of each line indicates the confidence value (conditional probability of Head given Body) for the respective association rule.  The sizes and colors of the circles in the center, above the Implies label, indicate the joint support (for the co-occurrences) of the respective Body and Head components of the respective association rules.

13  In the graphical summary the strongest support value was found for Fluid milk associated with Bananas, Bread sandwiches, and Eggs.  From the graph it is also clear that Fluid milk and Eggs has got the highest confidence value (thickness of these rule is very high).

14 3D Rule Graph.  The above graph is the 3D version of the earlier graph.  From the graph it is clear that Fluid milk and Eggs have the highest confidence value compared to any other items.

15 Conclusion.  According to the rule Fluid milk, Bananas, Bread sandwiches, Eggs, Salad Vegetables, Grapes, Fruit juice these items are frequently taken by customers into their basket.  Also the rule suggest that more than 50% of customers who are buying Fluid milk also buying Eggs and Bread sandwiches.  All the above information can be utilized for better marketing strategies.  For example retailer can arrange those frequently brought items very close to each other in the super market so that customer can get all these items easily.  Some new products (related to previous items) can also be placed nearby which will attract to the customers.

16 Thank You. Krishnendu Kundu (Statistician) StatSoft India. Email- kkundu@statsoftindia.comkkundu@statsoftindia.com Mobile - +919873119520


Download ppt "Association Rule.. Association rule mining  It is an important data mining model studied extensively by the database and data mining community.  Assume."

Similar presentations


Ads by Google