Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 349: Market Basket Data Mining All about beer and diapers.

Similar presentations


Presentation on theme: "CS 349: Market Basket Data Mining All about beer and diapers."— Presentation transcript:

1 CS 349: Market Basket Data Mining All about beer and diapers.

2 Overview What is Data Mining Market Baskets How fast does it run? What does it do?

3 What is Data Mining? Statistics Data Analysis Machine Learning Databases

4 Types of Data that can be Mined market basket classification time series text

5 Applications of Market Basket supermarkets data with boolean attributes –census data: single vs married word occurrence

6 Some Measures of the Data number of baskets : N number of items : M average number of items per basket: W (width)

7 Aspects of Market Basket Mining What is interesting? How do you make it run fast?

8 What is Interesting? (first try) Itemset I = set of items association rule - A -> B support(I) = fraction of baskets that contain I confidence(A->B) = probability that a basket contains B given that it contains A

9 How do you find Itemsets with high support? Apriori algorithm, Agrawal et al (1993) Find all itemsets with support > s 1-itemset = itemset with 1 item … k-itemset = itemset with k items large itemset = itemset with support > s candidate itemset = itemset that may have support > s

10 Apriori Algorithm start with all 1-itemsets go through data and count their support and find all “large” 1-itemsets combine them to form “candidate” 2- itemsets go through data and count their support and find all “large” 2-itemsets combine them to form “candidate” 3- itemsets …

11 Run Time k passes over data where k is the size of the largest candidate itemset Memory chunking algorithm ==> 2 passes over data on disk but multiple in memory Toivonen 1996 gives statistical technique 1 + e passes (but more memory) Brin 1997 - Dynamic Itemset Counting 1 + e passes (less memory)

12 But what is really interesting? A->B Support = P(AB) Confidence = P(B|A) Interest = P(AB)/P(A)P(B) Implication Strength = P(A)P(~B)/P(A~B)

13 But what is really really interesting? Causality Surprise

14 Summary What is Data Mining? Market Baskets Finding Itemsets with high support Finding Interesting Rules


Download ppt "CS 349: Market Basket Data Mining All about beer and diapers."

Similar presentations


Ads by Google