Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhou Zhao, Da Yan and Wilfred Ng The Hong Kong University of Science and Technology Mining Probabilistically Frequent Sequential Patterns in Uncertain.

Similar presentations


Presentation on theme: "Zhou Zhao, Da Yan and Wilfred Ng The Hong Kong University of Science and Technology Mining Probabilistically Frequent Sequential Patterns in Uncertain."— Presentation transcript:

1 Zhou Zhao, Da Yan and Wilfred Ng The Hong Kong University of Science and Technology Mining Probabilistically Frequent Sequential Patterns in Uncertain Databases

2 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

3 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

4 Background Uncertain data are inherent in many real world applications Sensor network RFID tracking Sensor 2: AB Sensor 1: BC Prob. = 0.9 Prob. = 0.1 CBA Readings:

5 Background Uncertain data are inherent in many real world applications Sensor network RFID tracking Reader B Reader C Reader A t1: (A, 0.95) t2: (B, 0.95), (C, 0.05)

6 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

7 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

8 Problem Definition

9 Pruning rules for p-FSP

10 Early Validating Suppose that pattern α is p-frequent on D’ ⊆ D, then α is also p-frequent on D If α is p-FSP in D 11, then α is p-FSP in D.

11 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

12 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

13 Sequence-level probabilistic model Sequence IDInstancesProbability s1s1 1 = ABC1 s2s2 1 = AB s2 2 = BC 0.9 0.05 Possible WorldProbability pw 1 = {s1 1, s1 2 } pw 2 = {s1 1, s2 2 } pw 3 = {s1 1 } DB: Possible World Space:

14 Prefix-projection of PrefixSpan SIDSequence s1ABCBC s2BABC s3AB s4BC SIDSequence s1_BCBC s2_BC s3_B SIDSequence s1_CBC s2_C s3_ D D| A D| AB A B

15 P-FSP anti-monotonicity.

16 SeqU-PrefixSpan Algorithm SeqU-PrefixSpan recursively performs pattern-growth from the previous pattern α to the current β = α e, by appending an p-frequent element e ∈ D | α We can stop growing a pattern α for examination, once we find that α is p-infrequent

17 Sequence Projection Seq-InstancesProb. si 1 = ABCBC0.3 si 2 = BABC0.2 si 3 = AB0.4 si 4 = BC0.1 Seq-InstancesProb. si 1 = _BCBC0.3 si 2 = _BC0.2 si 3 = _B0.4 A Seq-InstancesProb. si 1 = _CBC0.3 si 2 = _BC0.2 si 3 = _0.4 B si si| A si| B

18 Seq-InstancesProb. si 1 = _BCBC0.3 si 2 = _BC0.2 si 3 = _B0.4

19 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

20 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

21 Element-level probabilistic model Possible WorldProbability pw 1 = {B,AB} pw 2 = {C,AB} pw 3 = {AB,AB} pw 4 = {AC,AB} Sequence ID Probabilistic Elements s1s1[1]={(A,0.95)} s1[2]={(B,0.95),(C,0.05)} s2s2[1]={(A,1)}, s2[2] = {(B,1)} DB: Possible World Space:

22 Possible world explosion Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} Seq-InstanceProb.Seq-InstanceProb. pw 1 (si)=ABCB pw 2 (si)=ABCA pw 3 (si)=ABAB pw 4 (si)=ABAA pw 5 (si)=ACCB pw 6 (si)=ACCA pw 7 (si)=ACAB pw 8 (si)=ACAA 0.0056 0.0504 0.0084 0.0756 0.0224 0.2016 0.0336 0.3024 pw 9 (si)=BBCB pw 10 (si)=BBCA pw 11 (si)=BBAB pw 12 (si)=BBAA pw 13 (si)=BCCB pw 14 (si)=BCCA pw 15 (si)=BCAB pw 16 (si)=BCAA 0.0024 0.0216 0.0036 0.0324 0.0096 0.0864 0.0144 0.1296 # of possible instances is exponential to sequence length

23 ElemU-PrefixSpan Algorithm

24 Sequence Projection possuffixPr. 0_si[1]si[2]si[3]si[4]1 possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_ B Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)}

25 Sequence Projection Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)} possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_

26 Sequence Projection possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_ possuffixPr. 3_si[4] A Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)}

27 Sequence Projection possuffixPr. 1_si[2]si[3]si[4] 2_si[3]si[4] 4_ A possuffixPr. 3_si[4] 4_0.1584 Probabilistic Elements si[1] = {(A,0.7), (B,0.3)} si[2] = {(B,0.2),(C,0.8)} si[3] = {(C,0.4),(A,0.6)} si[4] = {(B,0.1), (A,0.9)}

28

29 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

30 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

31 Efficiency of SeqU-PrefixSpan Efficiency on the effects of size of database number of seq-instances length of sequence

32 Efficiency of ElemU-PrefixSpan Efficiency on the effects of size of database number of element-instances length of sequence

33 ElemU-PrefixSpan v.s. Full Expansion Efficiency on the effects of size of database number of element-instances length of sequence

34 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

35 Outline Background Problem Definition Sequential-Level U-PrefixSpan Element-Level U-PrefixSpan Experiments Conclusion

36 We formulate the problem of mining p-SFP in uncertain databases. We propose two new U-PrefixSpan algorithms to mine p- FSPs from data that conform to our probabilistic models. Experiments show that our algorithms effectively avoid the problem of “possible world explosion”.

37 Thank you!


Download ppt "Zhou Zhao, Da Yan and Wilfred Ng The Hong Kong University of Science and Technology Mining Probabilistically Frequent Sequential Patterns in Uncertain."

Similar presentations


Ads by Google