The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Graph Mining Laks V.S. Lakshmanan
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Rakesh Agrawal Ramakrishnan Srikant
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and.
Multi-dimensional Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequential Pattern Mining
1 Association Rule Mining Instructor Qiang Yang Slides from Jiawei Han and Jian Pei And from Introduction to Data Mining By Tan, Steinbach, Kumar.
Sequence Databases & Sequential Patterns
Mining Sequential Patterns Dimitrios Gunopulos, UCR.
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Business Systems Intelligence: 4. Mining Association Rules Dr. Brian Mac Namee (
1 Mining Association Rules in Large Databases Association rule mining Algorithms for scalable mining of (single-dimensional Boolean) association rules.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques 1 Mining Sequence Patterns in Transactional Databases CS240B --UCLA Notes by Carlo Zaniolo Based on those by J. Han.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, and Jiawei Han SIGMOD 2002 Presented by: Eddie Date: 2002/12/23.
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Performance and Scalability: Apriori Implementation.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar GNET 713 BCB Module Spring 2007.
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2015/10/14.
BACKGROUND Many phenomena happen in predictable cycles : CPU clock, presidential elections, moon cycle Periodicity : tendency for events to occur in a.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Mining Patterns in Long Sequential Data with Noise Wei Wang, Jiong Yang, Philip S. Yu ACM SIGKDD Explorations Newsletter Volume 2, Issue 2 (December 2000)
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Data Mining: Principles and Algorithms Mining Sequence Patterns
Sequential Pattern Mining
Reducing Number of Candidates
Data Mining: Concepts and Techniques
Association rule mining
Data Mining: Concepts and Techniques
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Data Mining: Concepts and Techniques — Chapter 8 — 8
Data Warehousing Mining & BI
Association Rule Mining
Presentation transcript:

The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide2 9/11/2006 Sequential Patterns Administrative Paper presentation schedule: Han, Bin, Kernel method in Analyzing Biological Data, Nov 6 th Barker, Brett, Data Mining in Systems Biology, Nov 8 th Leung, Daniel, High performance in Data Mining, Nov 13 th Ku, Matthew, Data Mining in Proteomics, Nov 15th Lin, Cindy, Integrating Biological Data, Nov 20 th Jia, Yi, Analyzing Bionetworks, Nov 22th

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide3 9/11/2006 Sequential Patterns Sequential Pattern Mining Why sequential pattern mining? GSP algorithm FreeSpan and PrefixSpan Boarder Collapsing Constraints and extensions

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide4 9/11/2006 Sequential Patterns Sequence Databases and Sequential Pattern Analysis (Temporal) order is important in many situations Time-series databases and sequence databases Frequent patterns  (frequent) sequential patterns Applications of sequential pattern mining Customer shopping sequences: First buy computer, then CD-ROM, and then digital camera, within 3 months. Medical treatment, natural disasters (e.g., earthquakes), science & engineering processes, stocks and markets, telephone calling patterns, Weblog click streams, DNA sequences and gene structures

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide5 9/11/2006 Sequential Patterns What Is Sequential Pattern Mining? Given a set of sequences, find the complete set of frequent subsequences A sequence database A sequence : An element may contain a set of items. Items within an element are unordered and we list them alphabetically. is a subsequence of Given support threshold min_sup =2, is a sequential pattern SIDsequence

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide6 9/11/2006 Sequential Patterns Challenges on Sequential Pattern Mining A huge number of possible sequential patterns are hidden in databases A mining algorithm should Find the complete set of patterns satisfying the minimum support (frequency) threshold Be highly efficient, scalable, involving only a small number of database scans Be able to incorporate various kinds of user-specific constraints

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide7 9/11/2006 Sequential Patterns A Basic Property of Sequential Patterns: Apriori A basic property: Apriori (Agrawal & Sirkant’94) If a sequence S is not frequent Then none of the super-sequences of S is frequent E.g, is infrequent  so do and SequenceSeq. ID Given support threshold min_sup =2

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide8 9/11/2006 Sequential Patterns Basic Algorithm : Breadth First Search (GSP) L=1 While (Result L != NULL) Candidate Generate Prune Test L=L+1

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide9 9/11/2006 Sequential Patterns Finding Length-1 Sequential Patterns Initial candidates: all singleton sequences,,,,,,, Scan database once, count support for candidates min_sup = SequenceSeq. ID CandSup

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide10 9/11/2006 Sequential Patterns The Mining Process … … … … 1 st scan: 8 cand. 6 length-1 seq. pat. 2 nd scan: 51 cand. 19 length-2 seq. pat. 10 cand. not in DB at all 3 rd scan: 46 cand. 19 length-3 seq. pat. 20 cand. not in DB at all 4 th scan: 8 cand. 6 length-4 seq. pat. 5 th scan: 1 cand. 1 length-5 seq. pat. Cand. cannot pass sup. threshold Cand. not in DB at all SequenceSeq. ID min_sup =2

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide11 9/11/2006 Sequential Patterns Generating Length-2 Candidates 51 length-2 Candidates Without Apriori property, 8*8+8*7/2=92 candidates Apriori prunes 44.57% candidates

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide12 9/11/2006 Sequential Patterns The SPADE Algorithm SPADE (Sequential PAttern Discovery using Equivalent Class) developed by Zaki 2001 A vertical format sequential pattern mining method A sequence database is mapped to a large set of Item: Sequential pattern mining is performed by growing the subsequences (patterns) one item at a time by Apriori candidate generation

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide13 9/11/2006 Sequential Patterns The SPADE Algorithm

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide14 9/11/2006 Sequential Patterns Bottlenecks of GSP and SPADE A large set of candidates could be generated 1,000 frequent length-1 sequences generate s huge number of length-2 candidates! Multiple scans of database in mining Breadth-first search Mining long sequential patterns

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide15 9/11/2006 Sequential Patterns Pattern Growth (prefixSpan) Prefix and Suffix (Projection),, and are prefixes of sequence Given sequence PrefixSuffix (Prefix-Based Projection)

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide16 9/11/2006 Sequential Patterns Example Sequence_id Sequence An Example ( min_sup=2): PrefixSequential Patterns,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide17 9/11/2006 Sequential Patterns PrefixSpan (the example to be continued) Step1: Find length-1 sequential patterns; :4, :4, :4, :3, :3, :3 pattern support Step2: Divide search space; six subsets according to the six prefixes; Step3: Find subsets of sequential patterns; By constructing corresponding projected databases and mine each recursively.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide18 9/11/2006 Sequential Patterns Example Find sequential patterns having prefix : Scan sequence database S once. Sequences in S containing are projected w.r.t to form the -projected database. Scan -projected database once, get six length-2 sequential patterns having prefix : :2, :4, :2, :4, :2, :2 Recursively, all sequential patterns having prefix can be further partitioned into 6 subsets. Construct respective projected databases and mine each. e.g. -projected database has two sequences : and.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide19 9/11/2006 Sequential Patterns Example to be continued PrefixProjected(suffix) databasesSequential Patterns,,,,,,,,,,,,,,,,,, Sequence_id SequenceProjected(suffix) databases

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide20 9/11/2006 Sequential Patterns PrefixSpan Algorithm PrefixSpan( , i, S|  ) 1.Initially  is a single frequent element in S 2.Scan S|  once, find the set of frequent items b such that b can be assembled to the last element of  to form a sequential pattern; or can be appended to  to form a sequential pattern. 3.For each frequent item b, appended it to  to form a sequential pattern  ’, and output  ’; 4.For each  ’, construct  ’-projected database S|  ’, and call PrefixSpan(  ’, i+1,S|  ’). Main Idea: Use frequent prefixes to divide the search space and to project sequence databases. only search the relevant sequences.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide21 9/11/2006 Sequential Patterns CloSpan: Mining Closed Sequential Patterns A closed sequential pattern s: there exists no superpattern s’ such that s’ כ s, and s’ and s have the same support Motivation: reduces the number of (redundant) patterns but attains the same expressive power Using Backward Subpattern and Backward Superpattern pruning to prune redundant search space CloSpan: Mining closed sequential pattern in large datasets, Yan et al, SDM’03 Backward subpattern Backward superpattern

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide22 9/11/2006 Sequential Patterns CloSpan: Performance Comparison with PrefixSpan

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide23 9/11/2006 Sequential Patterns Noise-tolerant Sequence Patterns There are noises in real-world sequences data Biological sequences Gene expression profiles Web-log collection Compatibility matrix is introduced to tolerate certain level of noise Yang et al. Mining Long Sequential Patterns in a Noisy Environment, SIGMOD’01

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide24 9/11/2006 Sequential Patterns Approximate Match When you observe d1 Spread count as d1: 90%, d2: 5%, d3: 5% Compatibility Matrix

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide25 9/11/2006 Sequential Patterns Match The degree to which pattern P is retained/reflected in sequence S M(P,S) = P(P|S) M(P, S) =  C(p,s) when when l S =l P M(P,S) = max over all possible when l S >l P Example PSM d1d1d1d30.9*0 d1d2 0.9*0.8 d1d2d1d30.9*0.05 d1d2d2d30.1*0.05 d1d2d1d2d30.9*0.8

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide26 9/11/2006 Sequential Patterns Calculate Max over all Dynamic Programming M(p 1 p 2..p i, s 1 s 2 …s j )= Max of M(p 1 p 2..p i-1, s 1 s 2 …s j-1 ) * C(p i,s j ) M(p 1 p 2..p i, s 1 s 2 …s j-1 ) O(l P *l S ) When compatibility Matrix is sparse O(l S )

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide27 9/11/2006 Sequential Patterns Match in a Sequence

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide28 9/11/2006 Sequential Patterns Match in D Average over all sequences in D

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide29 9/11/2006 Sequential Patterns Anti-Monotone If compatibility matrix is identity matrix, match = support Theorem: the match of a pattern P in a symbol sequence S is less than or equal to the match of any subpattern of P in S Corollary: the match of a pattern P in a sequence database D is less than or equal to the match of any subpattern of P in D Can use any support based algorithm More patterns match so require efficient solution Sample based algorithms Border collapsing of ambiguous patterns

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide30 9/11/2006 Sequential Patterns Chernoff Bound Given sample size=n, sample mean = μ, and we know that the range of the data is R, then we have: population mean is μ    = sqrt([R 2 ln(1/  )]/2n) with probability 1-  (almost certain) Can the estimation be replaced by normal due to the law of large number? Distribution free More conservative Sample size: fit in memory Restricted spread : For pattern P= p 1 p 2..p L R=min (match[p i ]) for all 1  i  L Frequent Patterns min_match +  min_match -  Infrequent patterns

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide31 9/11/2006 Sequential Patterns Algorithm Scan DB: O(N*L s *m) Find the match of each individual symbol Take a random sample of sequences N, # of sequence, L s, average sequence length, m: # of symbols Identify borders that embrace the set of ambiguous patterns O(m Lp * |S| * Lp * n) Min_match   existing methods for association rule mining L p is the length of the largest patter, S, average length in sample sequence, n # of samples Locate the border of frequent patterns via border collapsing

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide32 9/11/2006 Sequential Patterns Border Collapsing If memory can not hold the counters for all ambiguous counters Probe-and-collapse : binary search Probe patterns with highest collapsing power until memory is filled If memory can hold all patterns up to the 1/x layer the space of of ambiguous patterns can be narrowed to at least 1/x of the original one where x is a power of 2 If it takes a level-wise search y scans of the DB, only O(log x y) scans are necessary when the border collapsing technique is employed

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide33 9/11/2006 Sequential Patterns Border Collapsing

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide34 9/11/2006 Sequential Patterns Episodes and Episode Pattern Mining Other methods for specifying the kinds of patterns Serial episodes: A  B Parallel episodes: A & B Regular expressions: (A | B)C Methods for episode pattern mining First find all frequent serial and parallel episode Combine frequent serial and parallel episode to derive general episode or regular expressions Discovery of Frequent Episodes in Event Sequences, Mannila, et al., Data Mining and Knowledge Discovery, 1, pp , 97

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide35 9/11/2006 Sequential Patterns Periodicity Analysis Periodicity is everywhere: tides, seasons, daily power consumption, etc. Full periodicity Every point in time contributes (precisely or approximately) to the periodicity Partial periodicit: A more general notion Only some segments contribute to the periodicity Jim reads NY Times 7:00-7:30 am every week day Cyclic association rules Associations which form cycles Methods Full periodicity: FFT, other statistical analysis methods Partial and cyclic periodicity: Variations of Apriori-like mining methods

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide36 9/11/2006 Sequential Patterns Periodic Pattern Full periodic pattern ABC ABC ABC Partial periodic pattern ABC ADC ACC ABC Pattern hierarchy ABC ABC ABC DE DE DE DE ABC ABC ABC DE DE DE DE ABC ABC ABC DE DE DE DE

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide37 9/11/2006 Sequential Patterns Periodic Pattern Recent Achievements Partial Periodic Pattern Asynchronous Periodic Pattern Meta Pattern InfoMiner/InfoMiner+/STAMP

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide38 9/11/2006 Sequential Patterns Constraint-Based Seq. Pattern Mining Constraint-based sequential pattern mining Constraints: User-specified, for focused mining of desired patterns How to explore efficient mining with constraints? — Optimization Classification of constraints Anti-monotone: E.g., value_sum(S) 10 Monotone: E.g., count (S) > 5, S  {PC, digital_camera} Succinct: E.g., length(S)  10, S  {Pentium, MS/Office, MS/Money} Convertible: E.g., value_avg(S) 160, max(S)/avg(S) 5 Inconvertible: E.g., avg(S) – median(S) = 0

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide39 9/11/2006 Sequential Patterns From Sequential Patterns to Structured Patterns Sets, sequences, trees, graphs, and other structures Transaction DB: Sets of items {{i 1, i 2, …, i m }, …} Sets of Sequences: {{, …, }, …} Sets of trees: {t 1, t 2, …, t n } Sets of graphs (mining for frequent subgraphs): {g 1, g 2, …, g n } Mining structured patterns in XML documents, bio-molecule structures, etc.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide40 9/11/2006 Sequential Patterns References: Sequential Pattern Mining Methods R. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, 3-14, Taipei, Taiwan. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT’96. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu, "FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining", Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'00), Boston, MA, August H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1: , J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth", Proc Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide41 9/11/2006 Sequential Patterns References: Sequential Pattern Mining Methods B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98, , Orlando, FL. S. Ramaswamy, S. Mahajan, and A. Silberschatz. On the discovery of interesting patterns in association rules. VLDB'98, , New York, NY. M.J. Zaki. Efficient enumeration of frequent sequences. CIKM’98. Novermber M.N. Garofalakis, R. Rastogi, K. Shim: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. VLDB 1999: , Edinburgh, Scotland. Wei Wang, Jiong Yang, Philip S. Yu: Mining Patterns in Long Sequential Data with Noise. SIGKDD Explorations 2(2): (2000) Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han: Mining Long Sequential Patterns in a Noisy Environment. SIGMOD Conference 2002

Mining Biological Data KU EECS 800, Luke Huan, Fall’06 slide42 9/11/2006 Sequential Patterns References: Periodic Pattern Mining Methods Jiawei Han, Wan Gong, Yiwen Yin: Mining Segment-Wise Periodic Patterns in Time-Related Databases. KDD 1998: Jiawei Han, Guozhu Dong, Yiwen Yin: Efficient Mining of Partial Periodic Patterns in Time Series Database. ICDE 1999: Jiong Yang, Wei Wang, Philip S. Yu: Mining asynchronous periodic patterns in time series data. KDD 2000: Wei Wang, Jiong Yang, Philip S. Yu: Meta-patterns: Revealing Hidden Periodic Patterns. ICDM 2001: Jiong Yang, Wei Wang, Philip S. Yu: Infominer: mining surprising periodic patterns. KDD 2001: