Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Gianfranco Chicco, Roberto Napoli Federico Piglione, Petru Postolache.
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TANGENT: A Novel, “Surprise-me”, Recommendation Algorithm.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Web usage mining: extracting unexpected periods from web.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Sequential Pattern Mining
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology CONTOUR: an efficient algorithm for discovering discriminating.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Wireless Sensor Network Wireless Sensor Network Based.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A self-organizing neural network using ideas from the immune.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Plagiarism Detection Technique for Java Program Using.
A Fuzzy k-Modes Algorithm for Clustering Categorical Data
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Efficient Optimal Linear Boosting of a Pair of Classifiers.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan Mark W. Isken 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Concept Frequency Distribution in Biomedical Text Summarization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Discovering Interesting Usage Patterns in Text Collections:
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A clustering-based approach for prediction of cardiac.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Geoffrey I. Webb.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
A new algorithm for gap constrained sequence mining
Association Rule Mining
Presentation transcript:

Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National Yunlin University of Science and Technology A new algorithm for gap constrained sequence mining 2004 ACM Symposium on Applied Computing

Intelligent Database Systems Lab Outline Motivation Objective Introduction Sequential Patterns Mining The CCSM Algorithm Experimental Evaluation Conclusions N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Motivation The sequence mining  Finding frequent sequential patterns in a database of time-stamped events Temporal gap between events occurring  However pushing down such constraint is critical for most sequence mining algorithms N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Objective Describe CCSM (Cache-based Constrained Sequence Miner)  A new level-wise algorithm that overcomes the troubles usually related to this kind of constraints  Intersection of idlists to compute the support of candidate sequences  Use an effective cache that stores intermediate idlists for future reuse N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Introduction The problem of mining frequent sequential patterns was introduced by Agraval and Srikant  GSP Frequent Sequential Patterns (FSP) Frequent Pattern (FP) FP v.s. FSP  Transaction occurring v.s. subsequence number N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Introduction FP v.s. FSP  FP  intra-transaction patterns  FSP  inter-transaction sequential patterns FSP : 1. count-based or intersection-based support 2. GSP is use count-based + level-wise visit N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Sequential Patterns Mining Problem statement Apriori property and constraints Contiguous sequences Constraints enforcement N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Problem statement DEFINITION 1. (Sequence of Events)  Let be a set of m distinct items  An event (itemset) is a non-empty subset of Ι  A sequence is a temporally ordered list of events ,  length |k| of a sequence k is the number of items, is called a k-sequence N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Problem statement DEFINITION 2. (Subsequence)  is contained in (denoted as ) if there exist integers such that DEFINITION 3. (Database)  A temporal database is a collection of input sequences. N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Problem statement DEFINITION 4. (Gap constrained occurrence of a sequence)  Let a input sequence, with time-stamped  The gap between two consecutive events is thus defined as  occurs in under max and min gap constraints, denoted as, if there exists integers, N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Problem statement DEFINITION 5. (Support and Constraints)  The support of a sequence pattern α, denoted as σ(α) Is the number of distinct input sequences such that If a max/min gap constraint, the “occurrence” is DEFINITION 6. (Sequential pattern mining)  Give a sequential database and a positive integer min_sup (a user-specified threshold)  the squential mining problem Fining all patterns α along with their corresponding supports, such that σ(α) >= min_sup N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Apriori property and constraints Apriori property :  All the subsequences of a frequent sequence are frequents FSP constraint C is anti-monotone if and only if for any sequence β satisfying C  All the subsequences α of β satisfy C as well  ‘the constraint on min gap is anti-monotone’  but max gap is not anti-monotone N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Contiguous sequences DEFINITION 7. (Contiguous subsequence)  a sequence and a subsequence  αis a contiguous subsequence of β, denoted as is one of the following holds : 1. α is obtained from β by dropping an item from either 2. α is obtained from β by dropping an item from, where | | >=2 ; 3. α is a contiguous subsequence of, and is a contiguous subsequence of β ; N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Contiguous sequences LEMMA 8.  If we use the concept of contiguous subsequence ( ), the max gap constraint becomes anti- monotone as well.  So, if β is a frequent sequential pattern that satisfies the max_gap constraint, then every α, α β, is frequent N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Contiguous sequences DEFINITION 9. (Prefix/Suffix subsequence)  a sequence of length k = |α|, let (k – 1) –prefix(α)((k – 1)-suffix(α))  event without ambiguity, due to order of items within events N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Constraints enforcement We generate a candidate k-sequence α from a pair of frequent (k-1)-sequence Share with α either a (k-2)-prefix or a (k-2)-suffix N.Y.U.S.T. I.M. Example

Intelligent Database Systems Lab The CCSM Algorithm CCSM starts with a count-based phase  Extracts F 1 and F 2 Then, intersection-based can start  Candidate generation  Idlist intersection  Idlist caching N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Candidate generation N.Y.U.S.T. I.M. Example

Intelligent Database Systems Lab Idlist intersection N.Y.U.S.T. I.M. Example

Intelligent Database Systems Lab Idlist caching N.Y.U.S.T. I.M. Example

Intelligent Database Systems Lab Idlist caching N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Idlist caching N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experimental Evaluation Linux box equipped with a 450MHz Pentium II processor, 512MB of RAM and an IDE HD. The datasets used were CS11, and CS21, two synthetic datasets generated with the publicly available IBM quest dataset generator. N.Y.U.S.T. I.M. datasetcustomerAvg TransactionAvg length CS11100, CS21100,000205

Intelligent Database Systems Lab Experimental Evaluation Different values of the max_gap constraint N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experimental Evaluation Execution times of CCSM and cSPADE N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Experimental Evaluation N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Conclusions CCSM  A new FSP algorithm  level-wise + Intersection-based  cache N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Personal opinion N.Y.U.S.T. I.M.

Intelligent Database Systems Lab Review FP v.s. FSP Problem statement CCSM  First  count-based  Second  intersection-based Candidate generate Idlist intersection Idlist caching N.Y.U.S.T. I.M.