Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.

Similar presentations


Presentation on theme: "Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small."— Presentation transcript:

1 Frequent-Itemset Mining

2 Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day. Fundamental problem What sets of items are often bought together? Application If a large number of baskets contain both hot dogs and mustard, we can use this information in several ways. How?

3 Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; – Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}

4 Beer and Diapers The story (urban legend?) tells us that Walmart discovered the rule {Diapers} --> {Beer} What’s the explanation here?

5 On-Line Purchases Amazon.com offers several million different items for sale, and has several tens of millions of customers. Baskets = Customers, Items = Books, DVDs, etc. Motivation: Find out what items are bought together. Baskets = Books, DVDs, etc. Items = Customers Motivation: Find out similar customers. Slide based on www.mmds.org

6 Words and Documents Baskets = sentences; Items = words in those sentences. Motivation: Find words that appear together unusually frequently, i.e., linked concepts. Baskets = sentences, Items = documents containing those sentences. Motivation: Items that appear together too often could represent plagiarism. Slide based on www.mmds.org

7 Genes Baskets = people; Items = genes or blood-chemistry factors. Motivation: Detect combinations of genes that result in diabetes Slide based on www.mmds.org

8 Support and Confidence

9 Why Use Support and Confidence? Support – A rule that has very low support may occur simply by chance. – Support is often used to eliminate uninteresting rules. – Support also has a desirable property that can be exploited for the efficient discovery of association rules. Confidence – Measures the reliability of the inference made by a rule. – For a rule X  Y, the higher the confidence, the more likely it is for Y to be present in transactions that contain X. – Confidence provides an estimate of the conditional probability of Y given X.

10 Example: Frequent Itemsets Items={milk, coke, pepsi, beer, juice}. Support = 3 baskets. B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j},, {b,c}, {c,j}. {m,b} Slide based on www.mmds.org

11 Association Rules If-then rules about the contents of baskets. {i 1, i 2,…,i k } → j means: “if a basket contains all of i 1,…,i k then it is likely to contain j.” Confidence of this association rule is the probability of j given i 1,…,i k. Example B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} An association rule: {m, b} → c. – Confidence = 2/4 = 50%. Slide based on www.mmds.org

12 Scale of Problem WalMart sells 100,000 items and can store billions of baskets. The Web has over 100,000,000 words and billions of pages. Slide based on www.mmds.org

13 Interest The interest of an association rule X → Y is the absolute value of the amount by which the confidence differs from the probability of Y being in a given basket. Example B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} For association rule {m, b} → c, item c appears in 5/8 of the baskets. Interest = |2/4 - 5/8| = 1/8 --- not very interesting. Slide based on www.mmds.org


Download ppt "Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small."

Similar presentations


Ads by Google