Data Mining: Concepts and Techniques 1 Mining Sequence Patterns in Transactional Databases CS240B --UCLA Notes by Carlo Zaniolo Based on those by J. Han.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.
Multi-dimensional Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Sequence Databases & Sequential Patterns
Mining Sequential Patterns Dimitrios Gunopulos, UCR.
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
1 Mining Association Rules in Large Databases Association rule mining Algorithms for scalable mining of (single-dimensional Boolean) association rules.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
CS490D: Introduction to Data Mining Prof. Chris Clifton
Presented by Yaron Gonen. Outline Introduction Problems definition and motivation Previous work The CAMLS Algorithm Overview Main contributions Results.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
Ch5 Mining Frequent Patterns, Associations, and Correlations
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2015/10/14.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Panagiotis Papapetrou, George Kollios, Stan Sclaroff, Dimitrios Gunopulos Department of Computer Science Boston University University of California, Riverside.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Data Mining: Principles and Algorithms Mining Sequence Patterns
Sequential Pattern Mining
Jian Pei and Runying Mao (Simon Fraser University)
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Information Management course
Association rule mining
Advanced Pattern Mining 02
Data Mining: Concepts and Techniques
Jiawei Han Department of Computer Science
Association Rule Mining
Data Mining: Concepts and Techniques — Chapter 8 — 8
Data Warehousing Mining & BI
Association Rule Mining
Presentation transcript:

Data Mining: Concepts and Techniques 1 Mining Sequence Patterns in Transactional Databases CS240B --UCLA Notes by Carlo Zaniolo Based on those by J. Han

2 Sequence Databases & Sequential Patterns zTransaction databases, time-series databases vs. sequence databases zFrequent patterns vs. (frequent) sequential patterns zApplications of sequential pattern mining yCustomer shopping sequences: xFirst buy computer, then CD-ROM, and then digital camera, within 3 months. yMedical treatments, natural disasters (e.g., earthquakes), science & eng. processes, stocks and markets, etc. yTelephone calling patterns, Weblog click streams yDNA sequences and gene structures

3 What Is Sequential Pattern Mining? zGiven a set of sequences, find the complete set of frequent subsequences A sequence database A sequence : An element may contain a set of items. Items within an element are unordered and we list them alphabetically. is a subsequence of SIDsequence

4 Subsequence is a subsequence of Def: S1 is a subsequence of S2 if S1 can be obtained from S2 by eliminating some of its elements. This is a partial order, not a lattice. No proper union and intersection operations A sequence database SIDsequence The pattern Has support 2 in our Database.

5 The Apriori Property of Sequential Patterns zA basic property: Apriori (Agrawal & Sirkant’94) yIf a sequence S is not frequent yThen none of the super-sequences of S is frequent:antimonotonicity yE.g, is infrequent  so do and SequenceSeq. ID Given support threshold min_sup =2

6 GSP—Generalized Sequential Pattern Mining zGSP (Generalized Sequential Pattern) mining algorithm yproposed by Agrawal and Srikant, EDBT’96 zOutline of the method yInitially, every item in DB is a candidate of length- 1 yfor each level (i.e., sequences of length-k) do xscan database to collect support count for each candidate sequence xgenerate candidate length-(k+1) sequences from length-k frequent sequences using Apriori yrepeat until no frequent sequence or no candidate can be found zMajor strength: Candidate pruning by Apriori

7 Finding Length-1 Sequential Patterns zExamine GSP using an example zInitial candidates: all singleton sequences y,,,,,,, zScan database once, count support for candidates SequenceSeq. ID min_sup =2 CandSup

8 GSP: Generating Length-2 Candidates 51 length-2 Candidates Without Apriori property, 8*8+8*7/2=92 candidates Apriori prunes 44.57% candidates

9 The GSP Mining Process … … … … 1 st scan: 8 cand. 6 length-1 seq. pat. 2 nd scan: 51 cand. 19 length-2 seq. pat. 10 cand. not in DB at all 3 rd scan: 46 cand. 19 length-3 seq. pat. 20 cand. not in DB at all 4 th scan: 8 cand. 6 length-4 seq. pat. 5 th scan: 1 cand. 1 length-5 seq. pat. Cand. cannot pass sup. threshold Cand. not in DB at all SequenceSeq. ID min_sup =2

10 Candidate Generate-and-test: Drawbacks zA huge set of candidate sequences generated. yEspecially 2-item candidate sequence. zMultiple Scans of database needed. yThe length of each candidate grows by one at each database scan. zInefficient for mining long sequential patterns. yA long pattern grow up from short patterns yThe number of short patterns is exponential to the length of mined patterns yWindows can be used to limit the search yMaximum intervals can be imposed between items. zNo efficient algorithm at hand for data streams.

11 From Sequential Patterns to Structured Patterns zSets, sequences, trees, graphs, and other structures yTransaction DB: Sets of items x{{i 1, i 2, …, i m }, …} ySeq. DB: Sequences of sets: x{, …} ySets of Sequences: x{{, …, }, …} ySets of trees: {t 1, t 2, …, t n } ySets of graphs (mining for frequent subgraphs): x{g 1, g 2, …, g n } zMining structured patterns in XML documents, bio- chemical structures, etc.

12 Episodes and Episode Pattern Mining zOther methods for specifying the kinds of patterns ySerial episodes: A  B yParallel episodes: A & B yRegular expressions: (A | B)C*(D  E) zMethods for episode pattern mining yVariations of Apriori-like algorithms, e.g., GSP yDatabase projection-based pattern growth xSimilar to the frequent pattern growth without candidate generation

13 Periodicity Analysis zPeriodicity is everywhere: tides, seasons, daily power consumption, etc. zFull periodicity yEvery point in time contributes (precisely or approximately) to the periodicity zPartial periodicit: A more general notion yOnly some segments contribute to the periodicity xJim reads NY Times 7:00-7:30 am every week day zCyclic association rules yAssociations which form cycles zMethods yFull periodicity: FFT, other statistical analysis methods yPartial and cyclic periodicity: Variations of Apriori-like mining methods

14 Sequential Pattern Mining Algorithms zConcept introduction and an initial Apriori-like algorithm yAgrawal & Srikant. Mining sequential patterns, ICDE’95 zApriori-based method: GSP (Generalized Sequential Patterns: Srikant & EDBT’96) zPattern-growth methods: FreeSpan & PrefixSpan (Han et Pei, et zVertical format-based mining: SPADE Leanining’00) zConstraint-based sequential pattern mining (SPIRIT: Garofalakis, Rastogi, Pei, Han, CIKM’02) zMining closed sequential patterns: CloSpan (Yan, Han &

15 Ref: Mining Sequential Patterns  R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT ’ 96. zH. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. DAMI:97. zM. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning,  J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. ICDE'01 (TKDE ’ 04). zJ. Pei, J. Han and W. Wang, Constraint-Based Sequential Pattern Mining in Large Databases, CIKM'02. z X. Yan, J. Han, and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large Datasets. SDM'03. zJ. Wang and J. Han, BIDE: Efficient Mining of Frequent Closed Sequences, ICDE'04. zH. Cheng, X. Yan, and J. Han, IncSpan: Incremental Mining of Sequential Patterns in Large Database, KDD'04. zJ. Han, G. Dong and Y. Yin, Efficient Mining of Partial Periodic Patterns in Time Series Database, ICDE'99. zJ. Yang, W. Wang, and P. S. Yu, Mining asynchronous periodic patterns in time series data, KDD'00.