Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.

Slides:



Advertisements
Similar presentations
Mining Sequence Data. © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequence Data ObjectTimestampEvents A102, 3, 5 A206, 1 A231 B114,
Advertisements

gSpan: Graph-based substructure pattern mining
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Mining Graphs.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Association Analysis (7) (Mining Graphs)
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Data Mining Association Analysis: Basic Concepts and Algorithms
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Anomaly Detection Figures for Chapter 10 Introduction to Data Mining by Tan,
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
Data Mining Association Rules: Advanced Concepts and Algorithms
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining: Exploring Data Figures for Chapter 3 Introduction to Data Mining by Tan, Steinbach,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Association Analysis: Advanced Concepts Figures for Chapter 7 Introduction to.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Data Mining Association Rules: Advanced Concepts and Algorithms
What Is Sequential Pattern Mining?
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Modul 8: Sequential Pattern Mining
Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes for Chapter 7 By Gun Ho Lee Intelligent Information Systems Lab.
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
Sequential Pattern Mining
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Data Mining Association Rules: Advanced Concepts and Algorithms
1 1 MSCIT 5210: Knowledge Discovery and Data Mining Acknowledgement: Slides modified by Dr. Lei Chen based on the slides provided by Tan, Steinbach, Kumar.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
S EQUENTIAL P ATTERNS & THE GSP A LGORITHM BY : J OE C ASABONA.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Business Intelligence Technologies – Data Mining ` Lecture 2 Market Basket Analysis, Association Rules 1.
Gspan: Graph-based Substructure Pattern Mining
Sequential Pattern Mining
Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Rules: Advanced Concepts and Algorithms
A new algorithm for gap constrained sequence mining
Frequent Pattern Mining
Advanced Pattern Mining 02
Association Analysis: Advance Concepts
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules: Advanced Concepts and Algorithms
CS 548 Sequence Mining Showcase By Bian Du, Wa Gao, and Cam Jones
Data Mining Association Analysis: Basic Concepts and Algorithms
Presentation transcript:

Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequence Data ObjectTimestampEvents A102, 3, 5 A206, 1 A231 B114, 5, 6 B172 B217, 8, 1, 2 B281, 6 C141, 8, 7 Sequence Database:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Examples of Sequence Data Sequence Database SequenceElement (Transaction) Event (Item) CustomerPurchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc Web DataBrowsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc Event dataHistory of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C Sequence E1 E2 E1 E3 E2 E3 E4 E2 Element (Transaction) Event (Item)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Formal Definition of a Sequence l A sequence is an ordered list of elements (transactions) s = –Each element contains a collection of events (items) e i = {i 1, i 2, …, i k } –Each element is attributed to a specific time or location l Length of a sequence, |s|, is given by the number of elements of the sequence l A k-sequence is a sequence that contains k events (items)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Examples of Sequence l Web sequence: l Sequence of initiating events causing the nuclear accident at 3-mile Island: ( l Sequence of books checked out at a library:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Formal Definition of a Subsequence l A sequence is contained in another sequence (m ≥ n) if there exist integers i 1 < i 2 < … < i n such that a 1  b i1, a 2  b i1, …, a n  b in l The support of a subsequence w is defined as the fraction of data sequences that contain w l A sequential pattern is a frequent subsequence (i.e., a subsequence whose support is ≥ minsup) Data sequenceSubsequenceContain? Yes No Yes

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequential Pattern Mining: Definition l Given: –a database of sequences –a user-specified minimum support threshold, minsup l Task: –Find all subsequences with support ≥ minsup

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequential Pattern Mining: Challenge l Given a sequence: –Examples of subsequences:,,, etc. l How many k-subsequences can be extracted from a given n-sequence? n = 9 k=4: Y _ _ Y Y _ _ _ Y

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Sequential Pattern Mining: Example Minsup = 50% Examples of Frequent Subsequences: s=60% s=80% s=60%

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Extracting Sequential Patterns l Given n events: i 1, i 2, i 3, …, i n l Candidate 1-subsequences:,,, …, l Candidate 2-subsequences:,, …,,, …, l Candidate 3-subsequences:,, …,,, …,,, …,,, …

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Generalized Sequential Pattern (GSP) l Step 1: –Make the first pass over the sequence database D to yield all the 1- element frequent sequences l Step 2: Repeat until no new frequent sequences are found –Candidate Generation:  Merge pairs of frequent subsequences found in the (k-1)th pass to generate candidate sequences that contain k items –Support Counting:  Make a new pass over the sequence database D to find the support for these candidate sequences –Candidate Elimination:  Eliminate candidate k-sequences whose actual support is less than minsup

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ GSP Example

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Timing Constraints (I) {A B} {C} {D E} <= m s <= x g >n g x g : max-gap n g : min-gap m s : maximum span Data sequenceSubsequenceContain? Yes No Yes No x g = 2, n g = 0, m s = 4

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Mining Sequential Patterns with Timing Constraints l Approach 1: –Mine sequential patterns without timing constraints –Postprocess the discovered patterns l Approach 2: –Modify GSP to directly prune candidates that violate timing constraints

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Frequent Subgraph Mining l Extend association rule mining to finding frequent subgraphs l Useful for Web Mining, computational chemistry, bioinformatics, spatial data sets, etc

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Graph Definitions

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Representing Transactions as Graphs l Each transaction is a clique of items

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Representing Graphs as Transactions

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Edge Growing

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Apriori-like Algorithm l Find frequent 1-subgraphs l Repeat –Candidate generation  Use frequent (k-1)-subgraphs to generate candidate k-subgraph –Candidate pruning  Prune candidate subgraphs that contain infrequent (k-1)-subgraphs –Support counting  Count the support of each remaining candidate –Eliminate candidate k-subgraphs that are infrequent In practice, it is not as easy. There are many other issues

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: Dataset

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Adjacency Matrix Representation The same graph can be represented in many ways

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Graph Isomorphism l A graph is isomorphic if it is topologically equivalent to another graph

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Graph Isomorphism l Test for graph isomorphism is needed: –During candidate generation step, to determine whether a candidate has been generated –During candidate pruning step, to check whether its (k-1)-subgraphs are frequent –During candidate counting, to check whether a candidate is contained within another graph