Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.

Slides:



Advertisements
Similar presentations
1 CPS : Information Management and Mining Association Rules and Frequent Itemsets.
Advertisements

Data Mining of Very Large Data
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
IDS561 Big Data Analytics Week 6.
MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.
Introduction to Data Mining
1 Association Rules Market Baskets Frequent Itemsets A-Priori Algorithm.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining, Frequent-Itemset Mining
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
1 Association Rules Market Baskets Frequent Itemsets A-priori Algorithm.
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
Data Mining, Frequent-Itemset Mining. Data Mining Some mining problems Find frequent itemsets in "market-basket" data – "50% of the people who buy hot.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining Part 1 Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Fast Algorithms for Association Rule Mining
1 “Association Rules” Market Baskets Frequent Itemsets A-priori Algorithm.
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining An Introduction.
Market Basket Analysis 포항공대 산업공학과 PASTA Lab. 석사과정 신원영.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
Association Rule By Kenneth Leung. Data Mining The process of extracting valid, previously unknown, comprehensible, and actionable information from large.
Frequent Itemsets and Association Rules 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 3: Frequent Itemsets.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Supermarket shelf management – Market-basket model:  Goal: Identify items that are bought together by sufficiently many customers  Approach: Process.
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
The Three Analytics Techniques. Decision Trees – Determining Probability.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 CPS216: Advanced Database Systems Data Mining Slides created by Jeffrey Ullman, Stanford.
Jeffrey D. Ullman Stanford University.  2% of your grade will be for answering other students’ questions on Piazza.  18% for Gradiance.  Piazza code.
Mining Frequent Patterns. What Is Frequent Pattern Analysis? Frequent pattern: a pattern (a set of items, subsequences, substructures, etc.) that occurs.
Elsayed Hemayed Data Mining Course
Data Analytics CMIS Short Course part II Day 1 Part 1: Clustering Sam Buttrey December 2015.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Association Rules Carissa Wang February 23, 2010.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Elective-I Examination Scheme- In semester Assessment: 30 End semester Assessment :70 Text Books: Data Mining Concepts and Techniques- Micheline Kamber.
Jerry Post Copyright © Database Management Systems: Data Mining Market Baskets Association Rules.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
MIS2502: Data Analytics Association Rule Mining Jeremy Shafer
Data Mining – Association Rules
A Research Oriented Study Report By :- Akash Saxena
The Shopping Basket Analysis Tool
Frequent Itemsets Association Rules
CPS216: Advanced Database Systems Data Mining
Market Basket Many-to-many relationship between different objects
Data Mining Association Analysis: Basic Concepts and Algorithms
Market Baskets Frequent Itemsets A-Priori Algorithm
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent patterns and Association Rules
MIS2502: Data Analytics Association Rule Mining
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
MIS2502: Data Analytics Association Rule Learning
Association Analysis: Basic Concepts
Presentation transcript:

Frequent-Itemset Mining

Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small set of the items, e.g., the things one customer buys on one day. Fundamental problem What sets of items are often bought together? Application If a large number of baskets contain both hot dogs and mustard, we can use this information in several ways. How?

Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; – Produce dependency rules which will predict occurrence of an item based on occurrences of other items. Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer} Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}

Beer and Diapers The story (urban legend?) tells us that Walmart discovered the rule {Diapers} --> {Beer} What’s the explanation here?

On-Line Purchases Amazon.com offers several million different items for sale, and has several tens of millions of customers. Baskets = Customers, Items = Books, DVDs, etc. Motivation: Find out what items are bought together. Baskets = Books, DVDs, etc. Items = Customers Motivation: Find out similar customers. Slide based on

Words and Documents Baskets = sentences; Items = words in those sentences. Motivation: Find words that appear together unusually frequently, i.e., linked concepts. Baskets = sentences, Items = documents containing those sentences. Motivation: Items that appear together too often could represent plagiarism. Slide based on

Genes Baskets = people; Items = genes or blood-chemistry factors. Motivation: Detect combinations of genes that result in diabetes Slide based on

Support and Confidence

Why Use Support and Confidence? Support – A rule that has very low support may occur simply by chance. – Support is often used to eliminate uninteresting rules. – Support also has a desirable property that can be exploited for the efficient discovery of association rules. Confidence – Measures the reliability of the inference made by a rule. – For a rule X  Y, the higher the confidence, the more likely it is for Y to be present in transactions that contain X. – Confidence provides an estimate of the conditional probability of Y given X.

Example: Frequent Itemsets Items={milk, coke, pepsi, beer, juice}. Support = 3 baskets. B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} Frequent itemsets: {m}, {c}, {b}, {j},, {b,c}, {c,j}. {m,b} Slide based on

Association Rules If-then rules about the contents of baskets. {i 1, i 2,…,i k } → j means: “if a basket contains all of i 1,…,i k then it is likely to contain j.” Confidence of this association rule is the probability of j given i 1,…,i k. Example B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} An association rule: {m, b} → c. – Confidence = 2/4 = 50%. Slide based on

Scale of Problem WalMart sells 100,000 items and can store billions of baskets. The Web has over 100,000,000 words and billions of pages. Slide based on

Interest The interest of an association rule X → Y is the absolute value of the amount by which the confidence differs from the probability of Y being in a given basket. Example B 1 = {m, c, b}B 2 = {m, p, j} B 3 = {m, b}B 4 = {c, j} B 5 = {m, p, b}B 6 = {m, c, b, j} B 7 = {c, b, j}B 8 = {b, c} For association rule {m, b} → c, item c appears in 5/8 of the baskets. Interest = |2/4 - 5/8| = 1/8 --- not very interesting. Slide based on