Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.

Slides:



Advertisements
Similar presentations
Sequential PAttern Mining using A Bitmap Representation
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data e Web Mining Paolo Gobbo
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Rule Discovery from Time Series Presented by: Murali K. Kadimi.
Mining Multiple-level Association Rules in Large Databases
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Ch 2 Discovering Association Rules COMP 578 Data Warehousing & Data Mining.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
Ch5 Mining Frequent Patterns, Associations, and Correlations
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Mining High Utility Itemset in Big Data
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Data Mining Association Rules: Advanced Concepts and Algorithms
Modul 8: Sequential Pattern Mining
Sequential Pattern Mining
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
연관규칙탐사, 박종수 1 연관 규칙 탐사와 그 응용 성신여자대학교 전산학과 박 종수
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Data Mining Find information from data data ? information.
Association Rule Mining
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: İlkcan Keleş.
Spring 2016 Presentation by: Julianne Daly
Association rule mining
Byung Joon Park, Sung Hee Kim
Mining Sequential Patterns
Market Basket Many-to-many relationship between different objects
Mining Sequential Patterns
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
Association Rule Mining
Mining Sequential Patterns
Fast Algorithms for Mining Association Rules
Presentation transcript:

Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser

Topics ► Introduction ► Problem Statement ► Finding Sequential Patterns (The Algorithm) ► Sequence Phase (AprioriAll, AprioriSome, DynamicSome ► Conclusion (Performance/FE Questions/Future work).

Introduction ► Bar code data allows us to store massive amounts of sales data (basket data). ► We would like to extract sequential buying patterns. ► E.g. Video rental: Star Wars; Empire Strikes Back; Return of the Jedi. ► Can include multiple elements. ► E.g. Sheet and pillow case; comforter; drapes.

Problem Statement ► Given a database of transactions: ► Each transaction: - Customer ID - Transaction Time - Set of items purchased * No two transactions have same customer ID and transaction time * Don’t consider quantity of items purchased. Item was either purchased or not purchased Item was either purchased or not purchased

Problem Statement (terms) ► Itemset: non-empty set of items. Each item set mapped to an integer. ► Sequence: Ordered list of itemsets. ► Customer-Sequence: List of customer transactions ordered by increasing transaction time. ► (A customer supports a sequence if the sequence is contained in the customer-sequence.) ► Support for a sequence: Fraction of total customers that support a sequence. ► Maximal Sequence: A sequence that is not contained in any other sequence. ► Large Sequence: Sequence that meets minisup.

Example CustomerIDTransactionTimeItemsBought 1 June 25 ' June 30 ' June 10 '93 10,20 2 June 15 ' June 20 '93 40,60,70 3 June 25 '93 30,50, June 30 '93 40,70 4 July 25 ' June 12 '93 90Customer ID ID Customer Sequence 1 { (30) (90) } 2 { (10 20) (30) ( ) } 3 { (30) (50) (70) } 4 { (30) (40 70) (90) } 5 { (90) } Maximal sequences with support > 25% { (30) (90) } { (30) (40 70) } Note: Use Minisup of 25% { (10 20) (30) } Does not have minisup (Only supported by Cust. 2) { (30) }, { (70) }, { (30) (40) } … are not maximal.

The Algorithm ► Sort Phase ► Litemset Phase ► Transformation Phase ► Sequence Phase (The Biggie) ► Maximal Phase

Sort Phase ► Sort the database: ► Customer ID as the major key. ► Transaction-Time as the minor key. CustomerIDTransactionTimeItemsBought 1 June 25 ' June 30 ' June 10 '93 10,20 2 June 15 ' June 20 '93 40,60,70 3 June 25 '93 30,50, June 30 '93 40,70 4 July 25 ' June 12 '93 90

Litemset Phase ► Litemset (Large Itemset): ► Supported by fraction of customers greater than minisup. CustomerIDTransactionTimeItemsBought 1 June 25 ' June 30 ' June 10 '93 10,20 2 June 15 ' June 20 '93 40,60,70 3 June 25 '93 30,50, June 30 '93 40,70 4 July 25 ' June 12 '93 90 Large Itemsets Mapped To (30)1 (40)2 (70)3 (40 70) 4 (90)5

Transformation Phase Customer ID Original Customer Sequence Transformed Custumer Sequence After Mapping 1 { (30) (90) } 2 { (10 20) (30) ( ) } 3 { (30) (50) (70) } <{(30),(70)}><{1,3}> 4 { (30) (40 70) (90) } 5 { (90) } <{(90)}><{5}> ► Replace each transaction with all litemsets contained in the transaction. ► Transactions with no litemsets are dropped. (Still considered for support counts) Note: (10 20) dropped because of lack of support. ( ) replaced with set of litemsets {(40),(70),(40 70)} (60 does not have minisup)

Sequence Phase ► Finds Large Sequences: ► AprioriAll ► AprioriSome ► DynamicSome ► Stay Tuned….

Maximal Phase ► Find maximal sequences among large sequences. ► k-sequence: sequence of length k ► S set of all large sequences ► for (k=n; k>1; k--) do foreach k-sequence s k do foreach k-sequence s k do Delete from S all subsequences of s k Delete from S all subsequences of s k Authors claim data-structures and an algorithm exist to do this efficiently. (hash trees)

Sequence Phase ► Sequence phase finds the large sequences. ► Two families of algorithms: *****CountSome**********CountAll*****

Final Exam Question #2 ► There were two types of algorithms presented to find sequential patterns, CountSome and CountAll. What was the main difference between the two algorithms? ► CountAll (AprioriAll) is careful with respect to minimum support, careless with respect to maximality. CountSome (AprioriSome) is careful with respect to maximality, careless with respect to minimum support.

AprioriAll L 1 = {large 1-sequences} for (k = 2; L k-1 ≠ {}; k++) do begin C k = New candidates generated from L k-1 foreach customer-sequence c in the database do Increment the count of all candidates in C k that are contained in c. L k = Candidates in C k with minimum support. end Answer = Maximal Sequences in U k L k Notation: L k : Set of all large k-sequences C k : Set of candidate k-sequences

AprioriAll (Example) Customer Sequences L11-SeqSupport <1>4 <2>2 <3>4 <4>4 <5>4L22-SeqSupport L33-SeqSupport L44-SeqSupport Minisup = 25% Answer:,,

AprioriSome (Forward Phase) L 1 = {large 1-sequences} C 1 = L 1 last = 1 for (k = 2; C k-1 ≠ {} and L last ≠ {}; k++) do begin if (L k-1 known) then C k = New candidates generated from L k-1 else C k = New candidates generated from C k-1 if (k==next(last)) then begin // (next k to count?) foreach customer-sequence c in the database do Increment the count of all candidates in C k that are contained in c. L k = Candidates in C k with minimum support. last = k; end

AprioriSome (Backward Phase) for (k--; k>=1; k--) do if (L k not found in forward phase) then begin Delete all sequences in C k contained in some L i i>k; foreach customer-sequence c in D T do Increment the count of all candidates in C k that are contained in c L k = Candidates in C k with minimum support end else // l k already known Delete all sequences in C k contained in some L i i>k; Answer = U k L k //(Maximal Phase not Needed) Notation: D T ; Transformed database

DynamicSome (Just the basics) ► Similar to AprioriSome ► AprioriSome generates C k from C k-1 ► DynamicSome generates C k “on the fly”

Final Exam Question #1 ► What was the greatest hardware concern regarding the algorithms contained in the paper? ► Main memory capacity. When there is little main memory, or many potentially large sequences, the benefits of AprioriSome vanish.

Final Exam Question #3 ► How did the two best sequence mining algorithms (AprioriAll and AprioriSome) perform compared with each other? Take into consideration memory, speed, and usefulness of the data.

Final Exam Question #3 ► Memory: In terms of main memory usage, AprioriAll is better. In terms of secondary storage access, AprioriSome is better.

Final Exam Question #3 ► Speed: With sufficient memory, as minimum support decreases the difference between AprioriAll and AprioriSome increases. (AprioriSome is better.) More large sequences not maximal are generated.

Final Exam Question #3 ► Usefulness of the data: For the problem of finding maximal large sequences, the answer is “Precisely the same.”. However, AprioriAll finds all large sequences, while AprioriSome discards some large sequences that aren’t maximal. AprioriAll, then, generates more “useful” data. “The user may want to know the ratio of the number of people who bought the first k + 1 items in a sequence to the number of people who bought the first k items.”