Download presentation

Presentation is loading. Please wait.

Published byAnaya Lutter Modified about 1 year ago

1
Zhou Zhao, Da Yan and Wilfred Ng The Hong Kong University of Science and Technology Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases

2
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

3
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

4
Background Uncertain data are inherent in many real world applications Sensor network RFID tracking Sensor 2: AB Sensor 1: BC Prob. = 0.9 Prob. = 0.1 CBA Readings:

5
Background Uncertain data are inherent in many real world applications Sensor network RFID tracking Reader B Reader C Reader A t1: (A, 0.95) t2: (B, 0.95), (C, 0.05)

6
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

7
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

8
Problem Definition

9
Pruning rules for p-FSP

10
Early Validating Suppose that pattern α is p-frequent on D’ ⊆ D, then α is also p-frequent on D If α is p-FSP in D 11, then α is p-FSP in D.

11
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

12
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

13
Sequence-level probabilistic model Sequence IDInstancesProbability s1s1 1 = ABC1 s2s2 1 = AB s2 2 = BC 0.9 0.05 Possible WorldProbability pw 1 = {s1 1, s1 2 } pw 2 = {s1 1, s2 2 } pw 3 = {s1 1 } DB: Possible World Space:

14
Prefix-projection of PrefixSpan SIDSequence s1ABCBC s2BABC s3AB s4BC SIDSequence s1_BCBC s2_BC s3_B SIDSequence s1_CBC s2_C s3_ D D| A D| AB A B

15
P-FSP anti-monotonicity.

16
SeqU-PrefixSpan Algorithm SeqU-PrefixSpan recursively performs pattern-growth from the previous pattern α to the current β = α e, by appending an p-frequent element e ∈ D | α We can stop growing a pattern α for examination, once we find that α is p-infrequent

17
Sequence Projection Seq-InstancesProb. si 1 = ABCBC0.3 si 2 = BABC0.2 si 3 = AB0.4 si 4 = BC0.1 Seq-InstancesProb. si 1 = _BCBC0.3 si 2 = _BC0.2 si 3 = _B0.4 A Seq-InstancesProb. si 1 = _CBC0.3 si 2 = _BC0.2 si 3 = _0.4 B si si| A si| B

18
Seq-InstancesProb. si 1 = _BCBC0.3 si 2 = _BC0.2 si 3 = _B0.4

19
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

20
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

21
Element-level probabilistic model Possible WorldProbability pw 1 = {B,AB} pw 2 = {C,AB} pw 3 = {AB,AB} pw 4 = {AC,AB} Sequence ID Probabilistic Elements s1s1[1]={(A,0.95)} s1[2]={(B,0.95),(C,0.05)} s2s2[1]={(A,1)}, s2[2] = {(B,1)} DB: Possible World Space:

22
Possible world explosion Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} Seq-InstanceProb.Seq-InstanceProb. pw 1 (si)=ABCB pw 2 (si)=ABCA pw 3 (si)=ABAB pw 4 (si)=ABAA pw 5 (si)=ACCB pw 6 (si)=ACCA pw 7 (si)=ACAB pw 8 (si)=ACAA 0.0056 0.0504 0.0084 0.0756 0.0224 0.2016 0.0336 0.3024 pw 9 (si)=BBCB pw 10 (si)=BBCA pw 11 (si)=BBAB pw 12 (si)=BBAA pw 13 (si)=BCCB pw 14 (si)=BCCA pw 15 (si)=BCAB pw 16 (si)=BCAA 0.0024 0.0216 0.0036 0.0324 0.0096 0.0864 0.0144 0.1296 # of possible instances is exponential to sequence length

23
ElemU-PrefixSpan Algorithm

24
Sequence Projection possuffixPr. 0_si[1]si[2]si[3]si[4]1 possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_ B Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)}

25
Sequence Projection Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_

26
Sequence Projection possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_ possuffixPr. 3_si[4] A Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)}

27
Sequence Projection possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_ A possuffixPr. 3_si[4] 4_0.1584 Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)}

28

29
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

30
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

31
Efficiency of SeqU-PrefixSpan Efficiency on the effects of size of database number of seq-instances length of sequence

32
Efficiency of ElemU-PrefixSpan Efficiency on the effects of size of database number of element-instances length of sequence

33
ElemU-PrefixSpan v.s. Full Expansion Efficiency on the effects of size of database number of element-instances length of sequence

34
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

35
Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

36
We formulate the problem of mining p-SFP in uncertain databases. We propose two new U-PrefixSpan algorithms to mine p- FSPs from data that conform to our probabilistic models. Experiments show that our algorithms effectively avoid the problem of “possible world explosion”.

37
Thank you!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google