Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)

Similar presentations


Presentation on theme: "ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)"— Presentation transcript:

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
Sequential Pattern Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business

2 Given: A Transaction Database { cid, tid, date, item }
Sequential Patterns Given: A Transaction Database { cid, tid, date, item } Find: inter-transaction patterns among customers Example: customers typically rent “ Star Wars”, then “Empire Strikes Back” and then “Return of the Jedi” Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

3 Sequential Patterns cid tid date item 1 1 01/01/2000 30
/01/ /02/ /01/ ,70 /02/ /03/ ,60,70 /01/ ,50,70 /01/ /02/ ,70 /03/ /01/ Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

4 Itemset : is a non-empty set of items, e.g., {30} , {40, 70}.
Sequential Patterns Itemset : is a non-empty set of items, e.g., {30} , {40, 70}. Sequence: is an ordered list of itemsets, e.g. <{30} {40,70}> , <{40,70} {30} >. Size of sequence is the number of itemsets in that sequence. Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

5 Each transaction of a customer can be viewed as an itemset
Sequential Patterns cid tid date item /01/ /02/ /01/ ,70 /02/ /03/ ,60,70 /01/ ,50,70 /01/ /02/ ,70 /03/ /01/ Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining. Each transaction of a customer can be viewed as an itemset A customer’s sequences contains the customer’s ordered itemsets

6 Sequential Patterns cid customer sequence 1 <{30} {90} >
<{30} {90} > <{40,70} {30} {40,60,70}> <{30,50,70}> <{30} {40,70} {90}> <{90}> Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

7 E.g., <{3} {4,5} {8}> is contained in < {3,8}{4,5,6} {8}>
Sequential Patterns Sequence <a1 a2 ….an> is contained in sequence <b1 b2 ….bm> if there exist indexes i1<i2….<in such that a1 bi1, a2 bi2, …, and an bin. E.g., <{3} {4,5} {8}> is contained in < {3,8}{4,5,6} {8}> Is <{3} {4,5} {8}> contained in <{7} {3,8} {9}{4,5,6} {8}> ? Is <{3} {4,5} {8}> contained in <{7} {9} {4,5,6} {3,8} {8}> ? Is <{3} {4,5} {8}> contained in <{7} {9} {3,8}{4,5,6} > ? Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

8 A customer supports sequence s if s is contained in the
Sequential Patterns cid customer sequence <{30} {90} > <{40,70} {30} {40,60,70}> <{30,50,70}> <{30} {40,70} {90}> <{90}> A customer supports sequence s if s is contained in the sequence for this customer. E.g., customers 1 and 4 support sequence <{30} {90}> Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

9 The support for a sequence s is defined as the fraction of
Sequential Patterns cid customer sequence <{30} {90} > <{40,70} {30} {40,60,70}> <{30,50,70}> <{30} {40,70} {90}> <{90}> The support for a sequence s is defined as the fraction of total customers who support s . E.g., customers 1 and 4 support sequence <{30} {90}> Supp(<{30} {90}>) = 2/5 = 40% Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

10 Supp(<{40,70}>) = 2/5 = 40% Supp({40,70}) = 3/10 = 30%
Sequential Patterns cid customer sequence <{30} {90} > <{40,70} {30} {40,60,70}> <{30,50,70}> <{30} {40,70} {90}> <{90}> Supp(<{40,70}>) = 2/5 = 40% Supp({40,70}) = / = 30% Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

11 Sequential Patterns Mining
Given: A Transaction Database { cid, tid, date, item } Find: All sequences that have support larger than user-specified minimum support Apriori property: if a sequence is large then all sequences contained in that sequence should be large. Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

12 Sequential Patterns Mining
Identify all Large 1-Sequences Repeat until there is no more Candidate k-Sequences Identify all Candidate k-Sequences using Large (k-1)-Sequences Join:Two large (k-1)-sequences, L1 amd L2, that are joinable must satisfy the following conditions: L1(1)=L2(1) and L1(2)=L2(2) and …. L1(K-2)=L2(K-2) L1(K-1) L2(K-1) Prune :prune candidate k-sequences generated in step 2-1 that have sub-sequences not large. Determine Large k-Sequences from Candidate k-Sequences

13 Sequential Patterns Mining
cid customer sequence <{30} {90} > <{40,70} {30} {40,60,70}> <{30,50,70}> <{30} {40,70} {90}> <{90}> Minimum Support: 40% Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

14 Sequential Patterns Mining
cid customer sequence 1 <{30} {90} > 2 <{40,70} {30} {40,60,70}> 3 <{30,50,70}> 4 <{30} {40,70} {90}> 5 <{90}> Minimum Support: 40% Large 1-Sequence: <{30}> support=4/5=80% <{40}> support=2/5=40% <{70}> support=3/5=60% <{90}> support=3/5=60% <{40,70}> support=2/5=40% Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

15 Sequential Patterns Mining
Large 1-Sequence: <{30}> support=4/5=80% <{40}> support=2/5=40% <{70}> support=3/5=60% <{90}> support=3/5=60% <{40,70}> support=2/5=40% Candidate 2-Sequence: <{30} {40}> <{30} {70}> <{30} {90}> <{30} {40,70}> <{40} {30}> <{40} {70}> <{40} {90}> <{40} {40,70}> <{70} {30}> <{70} {40}> <{70} {90}> <{70} {40,70}> <{90} {30}> <{90} {40}> <{90} {70}> <{90} {40,70}> <{40,70} {30}> <{40,70} {40}> <{40,70} {70}> <{40,70} {90}> Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.

16 Sequential Patterns Mining
Candidate 2-Sequence: <{30} {40}> <{30} {70}> <{30} {90}> <{30} {40,70}> <{40} {30}> <{40} {70}> <{40} {90}> <{40} {40,70}> <{70} {30}> <{70} {40}> <{70} {90}> <{70} {40,70}> <{90} {30}> <{90} {40}> <{90} {70}> <{90} {40,70}> <{40,70} {30}> <{40,70} {40}> <{40,70} {70}> <{40,70} {90}> Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining. Large 2-Sequence: <{30} {40}> support=2/5=40% <{30} {70}> support=2/5=40% <{30} {90}> support=2/5=40% <{30} {40,70}> support=2/5=40%

17 Sequential Patterns Mining
Large 2-Sequence: <{30} {40}> support=2/5=40% <{30} {70}> support=2/5=40% <{30} {90}> support=2/5=40% <{30} {40,70}> support=2/5=40% Candidate 3-Sequence: <{30} {40} {70}> <{30} {40} {40,70}> <{30} {70} {40}> <{30} {70} {40,70}> <{30} {40,70} {40}> <{30} {40,70} {70}> <{30} {40} {90}> <{30} {90} {40}> <{30} {70} {90}> <{30} {90} {70}> <{30} {90} {40,70}> <{30} {40,70} {90}> Prune: All sub-sequences of a candidate k-sequence should be large. Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining. Candidate 3-Sequence: No candidate 3-sequence. Stop.

18 What is a sequential pattern?
Summary What is a sequential pattern? What is support for a sequential pattern? How to mine sequential patterns? What are the similarities and dissimilarities between association rules and sequential patterns mining? Part 1: Research Area (1 slide, 3-3) Lots of research have been done to solve the first two problems. Little has been done for the third problem -- incremental data mining.


Download ppt "ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)"

Similar presentations


Ads by Google