10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.

Slides:

Advertisements

Similar presentations

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.

Advertisements

Association Rule Mining

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

Data Mining Techniques Association Rule

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.

Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.

LOGO Association Rule Lecturer: Dr. Bo Yuan

IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department

Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.

Chase Repp.  knowledge discovery  searching, analyzing, and sifting through large data sets to find new patterns, trends, and relationships contained.

Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.

Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.

Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.

Chapter 5: Mining Frequent Patterns, Association and Correlations

Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.

Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms

4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.

Association Analysis: Basic Concepts and Algorithms.

Data Mining Association Analysis: Basic Concepts and Algorithms

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.

Fast Algorithms for Association Rule Mining

Mining Association Rules

Mining Association Rules

Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.

Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.

Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.

Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.

1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.

Sequential Pattern Mining

Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.

Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.

Data Mining Find information from data data ? information.

ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.

Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.

HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.

Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.

CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.

Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.

1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.

Data Mining Association Analysis: Basic Concepts and Algorithms

Association rule mining

Frequent Pattern Mining

William Norris Professor and Head, Department of Computer Science

Data Mining Association Analysis: Basic Concepts and Algorithms

Association Rule Mining

Data Mining Association Analysis: Basic Concepts and Algorithms

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

Data Mining Association Analysis: Basic Concepts and Algorithms

I don’t need a title slide for a lecture

Association Rule Mining

Association Analysis: Basic Concepts and Algorithms

Market Basket Analysis and Association Rules

Department of Computer Science National Tsing Hua University

Association Analysis: Basic Concepts

Presentation transcript:

10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications

10 -2 Basics Data Mining vs. KDD (Knowledge Discovery in Database) KDD Process

10 -3 Basics Data Mining: Discovery of patterns and relationships amongst various variables in a database. Association rules: To represent the patterns or correlations discovered by Data Mining. Basic form: where x 1 : item; X or X  {y} : transaction Examples: Bread, Milk → Eggs Temp_High, Cloud_None → Burn_High

10 -4 Basics Evaluating the strength of associations support, sup( X) : number of transactions that contain X confidence, conf( X  y ): probability that a transaction having X also contains y, or sup(X  {y})/ sup(X)

10 -5 Basics Example association rule: R1: Burger, Fries → Cola Burger, Fries and Cola appear together 100 times in the dataset: sup({Burger, Fries, Cola}) = 100 Burger & Fries appear together with or without Cola is 125. Then the rule confidence is calculated as: conf(R1) = (sup({Burger, Fries, Cola})/ sup({Burger, Fries}) x 100 = (100 / 125) x 100 = 80%

10 -6 Basics Association rule mining can be translated to frequent pattern mining –Calculate sup(X  {y}) –Calculate sup(X) Mining frequent patterns –A transaction (example) is treated as a set of items (item set) Mining frequent sequential patterns –A transaction is treated as a sequence of items (item sequence)

10 -7 Mining Frequent Patterns Apriori algorithm (Agrawal & Srikant, 1994) Step 1: Database Scan to produce C 1 i = 1; Produce C i, set of candidate patterns with length i Associate a support with each element in C i Step 2: Frequent patterns generation Produce L i from C i by keeping elements of sup≥ “minsup”) Step 3: Candidate patterns generation Produce C i+1 from self-joining of L i Prune patterns in C i+1 using “minimum support follows rule: All subsets of a given frequent pattern must be frequent patterns”

10 -8 Mining Frequent Patterns Step 4: Database scan to “compute support” Associate a support with each element in C i+1 Step 5: Iteration i ++; Repeat Steps 2, 3 and 4, until no more L i can be generated Step 6: Frequent patterns collection Step 7: Rule generation and pruning Generate rule Prune the rule if conf < “minconf”

10 -9 Mining Frequent Patterns Example of frequent patterns mining (minsup=2) L1 Self-joining C1C1 L1L1 L2L2 C2C2 C2C2 Scan D C3C3 L3L3 Scan D and pruning Scan D L2 Self-joining Pruning D: Database

Mining Frequent Patterns Mined frequent patterns: L = L1 ∪ L2 ∪ L3 = {{1}, {2}, {3}, {5}, {1 3}, {2 3}, {2 5}, {3 5}, {2 3 5}} Rule generation: {2 5} → 3 [conf = 2/3= 67%] {3 5} → 2 [conf = 2/2= 100%] {2, 3} → 5 {1} → 3; {2} → 3; {5} → 3 {2} → 5; {3} → 5 {3} → 2; {5} → 2

Mining Frequent Patterns Rule generalization can be beyond binary value rule: X  y Generalization Original can be beyond binary value can be a set of values can be a set of items can be a set of values

Mining Frequent Patterns Advantages Offer different approach to other data mining methods, such as classification and clustering Does not ‘generalize’ into a class, but predicts actual items Uses accurate statistical measures to evaluate rules, no inherent ‘error rate’ Take very simple form and are intuitive

Mining Frequent Patterns Disadvantages The amount of computation depends critically on minimum support specified pre-mine – which often ends up as a ‘guess’ Computational issue due to amount of sweeps required over dataset – especially if dataset is too large to read into memory; ‘thrashing’ must be avoided

Mining Frequent Sequential Patterns Apriori algorithm on frequent sequential patterns mining –Each frequent pattern is a sequence of items Step 3: Candidate patterns generation by joining has to consider sequential order –Given two sequential patterns: R= and S= –if s j-1 = r j, j = 2, …, k-1, then generate sequential pattern –If r j-1 = s j, j = 2, …, k-1, then generate sequential pattern

Mining Frequent Sequential Patterns Step 7: Rule generation has to consider sequential order –Given two sequential patterns: S 1 = and S 2 = –we produce a sequential rule: t 1, t 2, …, t k-1  t k with conf(t1,t2,…,tk-1  tk) = sup(S1) / sup(S2)

Mining Frequent Sequential Patterns Frequent sequential patterns mining based on database partitioning and perfect hashing

Mining Frequent Sequential Patterns Rule generation Sequential Rule Confidence B,C  E1 A  C1 B  C0.67 B  E1 C  E0.67

Applications Wal-Mart has used the technique for years to mine POS data and arrange their store to maximize sales from such analysis Medical databases to discover commonly occurring diseases amongst groups of people Lottery results databases, to discover those lucky combinations of numbers

Discussions Clustering algorithm in Open Sesame! –Attribute-based representation of events –Attribute-based similarity measure for clusters –Hierarchical clustering of event sequences –Generalization, e.g., “A ∧ B ∧ C” generalized to “A ∧ B” “A ∨ B” generalized to “A ∨ B ∨ C ” ontology –Specialization