Presentation is loading. Please wait.

Presentation is loading. Please wait.

60-520:Seminar Presented By Robyat Hossain University of Windsor February 10, 2006 Market Basket Analysis: An Approach to Association Rule Mining in Multiple.

Similar presentations


Presentation on theme: "60-520:Seminar Presented By Robyat Hossain University of Windsor February 10, 2006 Market Basket Analysis: An Approach to Association Rule Mining in Multiple."— Presentation transcript:

1 60-520:Seminar Presented By Robyat Hossain University of Windsor February 10, 2006 Market Basket Analysis: An Approach to Association Rule Mining in Multiple Store Environment. _______________________________________

2 Market Basket Analysis: An Approach to Association Rule Mining in Multiple Store Environment. ______________________________________________________________ Agenda Introduction General concept Discussion on proposed ideas Conclusion

3 Market Basket Analysis: An Approach to Association Rule Mining in Multiple Store Environment. ______________________________________________________________ Introduction  MBA  Purpose Peoples attitude  Benefit Inside of store environment layout changes Outside of store product bundling direct marketing Outside of realm of marketing stock inventory operation strategis

4 Market Basket Analysis: An Approach to Association Rule Mining in Multiple Store Environment. ___________________________________________________ General Concept  The approach  Methods, Measures, Results.  Literature overview on Traditional Association rules.

5 Market Basket Analysis General Concept: methods _____________________________  Method: Transaction 1: Frozen pizza, cola, milk Transaction 2: Milk, potato chips Transaction 3: Cola, frozen pizza Transaction 4: Milk, pretzels Transaction 5: Cola, pretzels Frozen PizzaMilkCola Potato Chips Pretzel s Frozen Pizza21200 Milk13111 Cola21301 Potato Chips01010 Pretzels01102 Hints that frozen pizza and cola may sell well together, and should be placed side-by-side in the convenience store..  Results: we could derive the association rules: If a customer purchases Frozen Pizza, then they will probably purchase Cola. If a customer purchases Cola, then they will probably purchase Frozen Pizza.

6 Market Basket Analysis General Concept: Measures _____________________________ Measures:  Suport Suport = (containing the item combination) /( total number of record.) Let the rule Is "If a customer purchases Cola, then they will purchase Frozen Pizza" The support for this = 2 (number of transaction that include both Cola and Frozen Pizza is) / 5(total records ) = 40%.  Confidence: Confidence of a rule = the support for the combination / the support for the condition. For the rule "If a customer purchases Milk, then they will purchase Potato Chips" confidence = support for the combination (Potato Chips + Milk) is 20%/ support for the condition (Milk) is 60%, =33% Suport and confidence are used to select the association rules.

7 Market Basket Analysis Discussion on proposed ideas _______________________________________________________  Problem Definition Assumptions Four Definitions ( Definition 1,2,3,4)  Algorithm General concepts developing the algorithm Discussion on several key steps  Performance Evaluation Data generation Performance measure Simulation study

8 Market Basket Analysis Proposed ideas :Problem Definition _____________________________________________________ Assumptions: D = a database, contains transactional records from multiple stores over time period T Let I={1 1, 1 2,..., I r } be the set of product items in D where I k (1 ≤ k ≤ r ) is the identifier for the kth item. Let X be a set of items in I. We refer X as a k itemset if |X| = k. denote the set of transactions S(subset of I) in D, contain itemset X. Let {T1, T2,..., Tm} be the set of mutually disjoint time intervals (periods) Let P={P1, P2,..., Pq} set of stores, where Pj (1 ≤ j≤ q) denotes the jth store in the store chain Definition 1. The support of X, denoted by sup(X, D), is the fraction of transactions containing X in database D; i.e., sup(X, D) = |W(X, D)|/|D|. For a specified support threshold σ s, X is a frequent itemset if sup(X, D)≥σ s. Definition 2. Let X be an itemset in I with context V X, and the subset of transactions in D whose timestamps t and store identifiers p satisfy V X. We define the relative support of X w.r.t the context V X, denoted by For a given threshold for relative support σ r, if a frequent itemset X satisfies, we call X a relative-frequent (RF) itemset.

9 Market Basket Analysis Problem Definition _____________________________________________________ Let and be the sets of the stores and times that item I k is sold, V I k = S k X R k as the context of item I k ;(combination of stores & time where sold) the context of itemset X, consists of two items I k and I k’ denoted by V X = V I k ∩ V I k’.(common stores & time shared by items in x) Definition 3. Consider two itemsets X and Y. The relative support of X with respect to the context, denoted by is defined as. The confidence of rule X => Y, denoted by conf (X => Y), is defined as Definition 4. Let Z be an RF itemset, where Z = XuY, and. Given a confidence threshold σ c, if conf(X => Y)≥ σ c, we call X => Y a store-chain (SC) association rule, and Vxuy as the context of the rule.

10 Market Basket Analysis Algorithm _____________________________________________________

11 Market Basket Analysis Algorithm: General concepts ___________________________ We use RF k,F k, C k to denote the set of all relative-frequent k-itemsets, frequent k-itemsets,candidate k-itemsets. As the first step of the algorithm, we build a table, called the PT table, In the first phase, we scan the database for the first time and build a two- dimensional table, called the TS table. The entries denoted by TS(Ti, Pj), records the number of transactions that occur at store Pj in period Ti. In the kth phase of the algorithm, we first derive C k, and, then, generate F k by evaluating their supports, through scanning the database and removing all infrequent itemsets and also generate RF k from F k. Using this TS table and the PT table for a given itemset X, we can determine the | |.

12 Market Basket Analysis Algorithm :Key steps _______________________________ Methods of  Building the PT(Store-Time) table  Building of TS (transaction)table  Finding of RF k (Relative –frequent itemset)  Generating the store-chain association rule

13 Market Basket Analysis Algorithm: Building the PT table _____________________________________________________ Hold time & store information for each item PT table from bit matrices for individual items Bit Matrices for Items I1 I2 I3 Bit matrix for Item I2 T1T2T3T4T5T6 P1 111111 P2 110000 P3 000111 P4 110001 P5 011111 P6 000011 Bit matrix for Item I1 T1T2T3T4T5T6 P1111000 P2011100 P3111110 P4001110 P5011100 P6111100 Bit matrix for Item I3 T1T2T3T4T5T6 P1001111 P2111001 P3001111 P4000111 P5111111 P6011110 PT table for Item I1 I2 I3 I1I2I3 P1 401 3 P2 1 2 534 6 P3 61 41 3 P4 1 3 63 61 4 P5 1 2 51 20 P6 51 51 2 6

14 Market Basket Analysis Algorithm: Building the PT table _____________________________________________________ The method to compute the jth row of PT table for Itemset X PT table for Item I1 I2 I1I2 P1 40 P2 1 2 53 P3 61 4 P4 1 3 63 6 P5 1 2 51 2 P6 51 5 PT table for Itemset Itemset X(I1,I2) P1 P21 2 3 P3 P4 P5 P6

15 Market Basket Analysis Algorithm: Building the TS table _____________________________________________________ The TS table  At first phase of the algorithm, build the TS table,  The entries denoted by TS(Ti, Pj), records the number of transactions that occur at store Pj in period Ti.  Using the TS and PT tables for itemset X, we can determine the value | | by summing all the values in the entries of the TS table The process of constructing the table is described in lines 2 through 4 in algorithm T1T2T3T4T5T6 P123 5422356 P2934212394723 P343418704359 P4213243341021 P5324230643432 P6451216901265 An example of TS table

16 Here is an example of the computation process: Suppose,15 periods( T1 to T15), #of transactions are 19, 17, 14, 25, 20, 17, 15, 27, 21, 20, 22, 18, 25, 21, and 19, selling periods of product A,T1 to T10, and involved in 60 transactions selling periods of product B,T6 to T15, and involved in 80 transactions 50 transactions containing both A and B, and sold in periods T6 to T10. To compute supports and relative supports for itemsets {A}, {B}, and {A, B}, = 60, = 80, and = 50 and |D| = 300, sup({A}, D) = 60/300 = 0.2, sup({B}, D) = 80/300 = 0.267, and sup({A, B}, D) = 50/ 300 = 0.167. Bases for relative support are | | = 195, | |=205, and | |= 100, = 60/195 = 0.308, = 80/205 = 0.39, = 50/100 = 0.5 Suppose we set rs at 0.1 and rr at 0.35. Then, {A}, {B}, and {A, B} are frequent. {A} is not relative-frequent, but {B} and {A, B} are relative-frequent. Market Basket Analysis Algorithm: Computing RF k,F k _____________________________________________________

17 Market Basket Analysis Performance Evaluation: Data ______________________________________________ Data generation D ij is number of transaction of store i & period j N i is number of products in store i

18 Market Basket Analysis Performance Evaluation: Measures _____________________________ Performance measures T hree measures (errors type A,B,C) for assessing the magnitudes of the deviations in support, confidence, and the number of association rules when we use the traditional association rules for the store-chain data. Type A error measures the relative difference in the support levels = =(.03 -.02 )/.03 = 33.33% Type B is used to compare the difference in confidence levels = conf(X=>Y) - conf V( X=>Y ))/conf (X=>Y), where conf V(X=>Y) is the rule confidence computed by the traditional methods. The type C error is used to compare the relative difference in the numbers of rules generated by the two methods.

19 Market Basket Analysis Performance Evaluation: Simulation _____________________________ Simulation Study The effect of number of stores & periods The effect of stores size The effect of different product replacement rate Summarization

20 Market Basket Analysis Performance Evaluation: Simulation _____________________________ The table Data_Set_Table used in different case for simulation study Data_Set_Table: Data set #of stores #of periods Range of store sizes Product replacement rate 1 5 5 50–100 0.001 2 1010 50–100 0.001 3 5050 50–100 0.001 4 5050 10–100 0.001 5 50 50 50–100 0.001 6 5050 90–1000.001 7 50 50 50–100 0.001 8 50 50 50–100 0.005 950 50 50–100 0.010

21 Market Basket Analysis Performance Evaluation: Simulation _____________________________ Effect analysis for # of stores & periods on type A,B,C error based on data sets 1,2,3 in Data- set_Table

22 Market Basket Analysis Performance Evaluation: Simulation _____________________________ Effect analysis for store size based on data sets 4,5,6 in Data-set_Table

23 Market Basket Analysis Performance Evaluation: Simulation _____________________________ Effect analysis for product replacement ratio based on data sets 7,8,9 in Data-set_Table

24 Market Basket Analysis Performance Evaluation: Simulation _____________________________  The traditional association rules may not be able to extract all important purchasing patterns for a multistore chain, when there are large numbers of stores and periods, a large variation in store sizes, and high product replacement ratios.  we find that the proposed algorithm requires larger process times, but the differences are not substantial. This result is reasonable, because the proposed algorithm requires one more scan of the data than does the Apriori algorithm, and also requires additional basic operations in each phase of the algorithm.

25 Market Basket Analysis Conclusion _____________________________ Traditional Associations rule mining was introduced in 1993 by Agrawal et al. fail in multi store environment. To overcome this, store-chain association rule is proposed where store and period are involved. A simulation is used to compare the proposed and traditional methods. we have seen that algorithm is computationally efficient. The outcomes would be used for general or local marketing strategies. Moreover for product procurement, inventory, distribution strategies for store-chain. This is the promising research area in data mining,lot of work is left to do like generating store-chain association rules accurately and efficiently, to extend this algorithm for multiple levels in distributed environment. environment

26 Market Basket Analysis Refferences _____________________________________ [1] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994, pp. 478–499. [2] J.M. Ale, G.H. Rossi, An approach to discovering temporal association rules, Proceedings of the 2000 ACM Symposium on Applied Computing (Vol. 1), Villa Olmo, Como, Italy, 2000, pp. 294– 300. [3]yen-Liang Chen, Kwei tang, Ren-jie Shen and Ya-Han Hu, “market basket analysis in a multiple store environment” Decision support System, 40(2)(2005) 339-354 [4] H. Lu, L. Feng, J. Han, Beyond intra-transaction association analysis: mining multi- dimensional inter-transaction association rules, ACM Transactions on Information Systems 18 (4) (2000) 423– 454. [5] J.F. Roddick, M. Spiliopoilou, A survey of temporal knowledge discovery paradigms and methods, IEEE Transactions on Knowledge and Data Engineering 14 (2002) 750– 767. [6] E. Clementini, P.D. Felice, K. Koperski, Mining multiplelevel spatial association rules for objects with a broad boundary, Data and Knowledge Engineering 34 (3) (2000) 251– 270.


Download ppt "60-520:Seminar Presented By Robyat Hossain University of Windsor February 10, 2006 Market Basket Analysis: An Approach to Association Rule Mining in Multiple."

Similar presentations


Ads by Google