Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.

Similar presentations


Presentation on theme: "Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to."— Presentation transcript:

1 Data Mining Techniques Sequential Patterns

2 Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to collect and store massive amounts of sales data, referred to as the basket data A record in such data typically consists of the transaction date and the items bought in the transaction Very often, data records also contain customer-id, particularly when the purchase has been made using a credit card or a frequent-buyer card Catalog companies also collect such data using the orders they receive

3 Sequential Pattern Mining An example of such a pattern is that customers typically rent “Star Wars ( 星際大戰 )”, then “Empire Strikes Back ( 帝國大反擊 )”, and then “Return of the Jedi ( 絕地大反攻 )” These rentals need not be consecutive –Customers who rent some other videos in between also support this sequential pattern Elements of a sequential pattern need not be simple items –“Computer Science and Programming Language”, followed by “Data Structure”, followed by “System Programs and Operating Systems” is an example of a sequential pattern in which the elements are sets of items

4 Sequential Pattern Mining Given Transaction Time, Customer Id, Items Bought Original Database Answer Set

5 Definition The length of a sequence is the number of itemsets in the sequence A sequence of length k is called a k-sequence The support for an itemset i is defined as the fraction of customers who bought the items in i in a single transaction The itemset i and the 1-sequence have the same support An itemset with minimum support is called a large (frequent) itemset or litemset

6 AprioriAll Algorithm Each itemset in a large sequence must have minimum support Any large sequence must be a list of litemsets Finding all sequential patterns in five phases –Sort Phase –Litemset Phase –Transformation Phase –Sequence Phase –Maximal Phase

7 AprioriAll Algorithm: Sort Phase Customer-Sequence Version of the Database

8 AprioriAll Algorithm: Litemset Phase Apriori/DHP FP Growth min_sup_count=2

9 AprioriAll Algorithm: Transformation Phase

10 AprioriAll Algorithm: Sequence Phase Customer SequencesLarge 1-Sequences Large 2-Sequences Large 3-Sequences Large 4-Sequences Maximal Large Sequences 2

11 Sequence Phase: Candidate Generation

12 AprioriAll Algorithm: Maximal Phase The sequence is contained in, since (3)  (3 8), (4 5)  (4 5 6) and (8)  (8) The sequence is not contained in (and vice versa) –The former represents items 3 and 5 being bought one after the other –The latter represents items 3 and 5 being bought together. In a set of sequences, a sequence s is maximal if s is not contained in any other sequence.

13 AprioriAll Algorithm With minimum support set to 25%, i.e., a minimum support of 2 customers – and are maximal – which is only supported by customer 2 does not have minimum support –,,,,, and, though having minimum support, are not in the answer because they are not maximal. Answer Set

14 Summary

15 Discussions AprioriAll algorithm will generate a huge set of candidate sequences –If there are 1000 frequent sequences of length-1, the algorithm will generate 1000 × 1000 + (1000 × 999) / 2 = 1,499,500 candidate sequences Many scans of databases in mining Difficulties at mining long sequential patterns

16 Research Topics Time-Interval Sequential Patterns Time-Gap Sequential Patterns Non-redundant Sequential Patterns Constrained Sequential Pattern Mining Multi-dimensional Sequential Patterns Generalized Sequential Patterns Incremental Mining Sequential Patterns Data Stream Sequential Pattern Mining Interactive Mining Sequential Patterns

17 Exercise 6 A Sequence Database (min-sup = 50%) 40 30 20 10 Customer sequence SID


Download ppt "Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to."

Similar presentations


Ads by Google