Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National."— Presentation transcript:

1 Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National Yunlin University of Science and Technology A new algorithm for gap constrained sequence mining 2004 ACM Symposium on Applied Computing

2 Intelligent Database Systems Lab Outline Motivation Objective Introduction Sequential Patterns Mining The CCSM Algorithm Experimental Evaluation Conclusions N.Y.U.S.T. I.M.

3 Intelligent Database Systems Lab Motivation The sequence mining  Finding frequent sequential patterns in a database of time-stamped events Temporal gap between events occurring  However pushing down such constraint is critical for most sequence mining algorithms N.Y.U.S.T. I.M.

4 Intelligent Database Systems Lab Objective Describe CCSM (Cache-based Constrained Sequence Miner)  A new level-wise algorithm that overcomes the troubles usually related to this kind of constraints  Intersection of idlists to compute the support of candidate sequences  Use an effective cache that stores intermediate idlists for future reuse N.Y.U.S.T. I.M.

5 Intelligent Database Systems Lab Introduction The problem of mining frequent sequential patterns was introduced by Agraval and Srikant  GSP Frequent Sequential Patterns (FSP) Frequent Pattern (FP) FP v.s. FSP  Transaction occurring v.s. subsequence number N.Y.U.S.T. I.M.

6 Intelligent Database Systems Lab Introduction FP v.s. FSP  FP  intra-transaction patterns  FSP  inter-transaction sequential patterns FSP : 1. count-based or intersection-based support 2. GSP is use count-based + level-wise visit N.Y.U.S.T. I.M.

7 Intelligent Database Systems Lab Sequential Patterns Mining Problem statement Apriori property and constraints Contiguous sequences Constraints enforcement N.Y.U.S.T. I.M.

8 Intelligent Database Systems Lab Problem statement DEFINITION 1. (Sequence of Events)  Let be a set of m distinct items  An event (itemset) is a non-empty subset of Ι  A sequence is a temporally ordered list of events ,  length |k| of a sequence k is the number of items, is called a k-sequence N.Y.U.S.T. I.M.

9 Intelligent Database Systems Lab Problem statement DEFINITION 2. (Subsequence)  is contained in (denoted as ) if there exist integers such that DEFINITION 3. (Database)  A temporal database is a collection of input sequences. N.Y.U.S.T. I.M.

10 Intelligent Database Systems Lab Problem statement DEFINITION 4. (Gap constrained occurrence of a sequence)  Let a input sequence, with time-stamped  The gap between two consecutive events is thus defined as  occurs in under max and min gap constraints, denoted as, if there exists integers, N.Y.U.S.T. I.M.

11 Intelligent Database Systems Lab Problem statement DEFINITION 5. (Support and Constraints)  The support of a sequence pattern α, denoted as σ(α) Is the number of distinct input sequences such that If a max/min gap constraint, the “occurrence” is DEFINITION 6. (Sequential pattern mining)  Give a sequential database and a positive integer min_sup (a user-specified threshold)  the squential mining problem Fining all patterns α along with their corresponding supports, such that σ(α) >= min_sup N.Y.U.S.T. I.M.

12 Intelligent Database Systems Lab Apriori property and constraints Apriori property :  All the subsequences of a frequent sequence are frequents FSP constraint C is anti-monotone if and only if for any sequence β satisfying C  All the subsequences α of β satisfy C as well  ‘the constraint on min gap is anti-monotone’  but max gap is not anti-monotone N.Y.U.S.T. I.M.

13 Intelligent Database Systems Lab Contiguous sequences DEFINITION 7. (Contiguous subsequence)  a sequence and a subsequence  αis a contiguous subsequence of β, denoted as is one of the following holds : 1. α is obtained from β by dropping an item from either 2. α is obtained from β by dropping an item from, where | | >=2 ; 3. α is a contiguous subsequence of, and is a contiguous subsequence of β ; N.Y.U.S.T. I.M.

14 Intelligent Database Systems Lab Contiguous sequences LEMMA 8.  If we use the concept of contiguous subsequence ( ), the max gap constraint becomes anti- monotone as well.  So, if β is a frequent sequential pattern that satisfies the max_gap constraint, then every α, α β, is frequent N.Y.U.S.T. I.M.

15 Intelligent Database Systems Lab Contiguous sequences DEFINITION 9. (Prefix/Suffix subsequence)  a sequence of length k = |α|, let (k – 1) –prefix(α)((k – 1)-suffix(α))  event without ambiguity, due to order of items within events N.Y.U.S.T. I.M.

16 Intelligent Database Systems Lab Constraints enforcement We generate a candidate k-sequence α from a pair of frequent (k-1)-sequence Share with α either a (k-2)-prefix or a (k-2)-suffix N.Y.U.S.T. I.M. Example

17 Intelligent Database Systems Lab The CCSM Algorithm CCSM starts with a count-based phase  Extracts F 1 and F 2 Then, intersection-based can start  Candidate generation  Idlist intersection  Idlist caching N.Y.U.S.T. I.M.

18 Intelligent Database Systems Lab Candidate generation N.Y.U.S.T. I.M. Example

19 Intelligent Database Systems Lab Idlist intersection N.Y.U.S.T. I.M. Example

20 Intelligent Database Systems Lab Idlist caching N.Y.U.S.T. I.M. Example

21 Intelligent Database Systems Lab Idlist caching N.Y.U.S.T. I.M.

22 Intelligent Database Systems Lab Idlist caching N.Y.U.S.T. I.M.

23 Intelligent Database Systems Lab Experimental Evaluation Linux box equipped with a 450MHz Pentium II processor, 512MB of RAM and an IDE HD. The datasets used were CS11, and CS21, two synthetic datasets generated with the publicly available IBM quest dataset generator. N.Y.U.S.T. I.M. datasetcustomerAvg TransactionAvg length CS11100,000105 CS21100,000205

24 Intelligent Database Systems Lab Experimental Evaluation Different values of the max_gap constraint N.Y.U.S.T. I.M.

25 Intelligent Database Systems Lab Experimental Evaluation Execution times of CCSM and cSPADE N.Y.U.S.T. I.M.

26 Intelligent Database Systems Lab Experimental Evaluation N.Y.U.S.T. I.M.

27 Intelligent Database Systems Lab Conclusions CCSM  A new FSP algorithm  level-wise + Intersection-based  cache N.Y.U.S.T. I.M.

28 Intelligent Database Systems Lab Personal opinion N.Y.U.S.T. I.M.

29 Intelligent Database Systems Lab Review FP v.s. FSP Problem statement CCSM  First  count-based  Second  intersection-based Candidate generate Idlist intersection Idlist caching N.Y.U.S.T. I.M.


Download ppt "Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National."

Similar presentations


Ads by Google