Presentation is loading. Please wait.

Presentation is loading. Please wait.

Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.

Similar presentations


Presentation on theme: "Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang."— Presentation transcript:

1 Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang

2 Introduction Discover frequent serial episodes to find relationships between events. Discover frequent serial episodes to find relationships between events. - explain the problems that cause a particular event - predict future result Episode : a partially ordered collection of events occurring together. Episode : a partially ordered collection of events occurring together. - the user defines “ how close is close enough ” - win : the width of the time window

3 Three classes of episodes Introduced by Mannila et al. Introduced by Mannila et al. Serial episodes Serial episodes - patterns of a total order in the sequence Parallel episodes Parallel episodes - no constraints on the relative order Composite episodes Composite episodes - serial combination of parallel episodes

4 Examples : episodes

5 Algorithms (old) Presented by Mannila et al. Presented by Mannila et al. Finding parallel and serial episodes that are frequent enough. Finding parallel and serial episodes that are frequent enough. WINEPI WINEPI - consider the support of an episode MINEPI MINEPI - consider the number of minimal occurrences of an episode of an episode

6 WINEPI Consider the Sequence S=A 3 A 4 B 5 B 6. Consider the Sequence S=A 3 A 4 B 5 B 6. support : the number of sliding windows with width = win. support : the number of sliding windows with width = win. Given win =3, there are six windows : Given win =3, there are six windows : W 1 =A 3, W 2 =A 3 A 4, W 3 =A 3 A 4 B 5, W 4 =A 4 B 5 B 6, W 5 =B 5 B 6, W 6 =B 6. is supported by two windows. is supported by two windows.

7 MINEPI Consider the Sequence S=A 3 A 4 B 5 B 6. Consider the Sequence S=A 3 A 4 B 5 B 6. minimal occurrences : an interval that contains episode α, but no proper sub- interval does. minimal occurrences : an interval that contains episode α, but no proper sub- interval does. has mo support 2. has mo support 2. - interval [3,3] and [4,4]. has mo support 1. has mo support 1. - interval [4,5].

8 Complex sequences Several events occurring at one time Several events occurring at one time Example : Example : A temporal database is a complex sequence with temporal attributes. A temporal database is a complex sequence with temporal attributes. ADBABECEABFACEBDFD

9 Algorithms (new) Extend the algorithm to deal with complex sequences. Extend the algorithm to deal with complex sequences. MINEPI+ MINEPI+ - depth-first enumeration to generate the frequent episodes by equalJoin and temporalJoin. EMMA EMMA - Episodes Mining using Memory Anchor - utilizes memory anchors to accelerate mining task

10 More about MINEPI Breath-first manner Breath-first manner - enumerate longer episodes from shorter ones Parameters Parameters - maxwin : maximum window width for an episode - minsup : minimal frequent for “ frequent episode ” Temporal Join Temporal Join - connects events from different time intervals

11 Example : MINEPI S = A 1 A 2 B 3 A 4 B 5, maxwin =4, minsup =2 S = A 1 A 2 B 3 A 4 B 5, maxwin =4, minsup =2 Find frequent 1-episode first Find frequent 1-episode first - mo (A)={[1,1],[2,2],[4,4]}, mo (B)={[3,3],[5,5]} Temporal Join with maxwin =4 Temporal Join with maxwin =4 - possibles of : [1,3],[2,3],[2,5],[4,5] - mo( )={[2,3],[4,5]} (choose minimal ones) - support( )={[1,4],[2,5],[4,5]} - support count = 3, counting distinct start point

12 MINEPI+ Must deal with complex sequences. Must deal with complex sequences. Depth-first manner for memory saving Depth-first manner for memory saving Equal Join Equal Join - connects events at the same interval Bound List Bound List For a serial episode P= For a serial episode P= - {[ts i,te i ] : S contains P in time [ts i,te i ]} For an event YFor an event Y - {[t i,t i ] : S contains P in time t i }

13 Example : bound list maxwin = 4. maxwin = 4. Bound list of : {[1,4],[3,6]}. Bound list of : {[1,4],[3,6]}. Bound list of : {[4,4],[6,6]}. Bound list of : {[4,4],[6,6]}. 12345678 ADBABECEABFACEBDFD

14 Operations Given P= and an event f. Given P= and an event f. - P.boundlist = {[ts 1,te 1 ], …,[ts n,te n ]} - f.boundlist = {[ts’ 1,ts’ 1 ],…,[ts’ m,ts’ m ]} Equal Join : P 1 =P ⊙ f=. Equal Join : P 1 =P ⊙ f=. - P 1.boundlist are [ts i,te i ] such that te i =ts’ j for some j (1 ≦ j ≦ m) te i =ts’ j for some j (1 ≦ j ≦ m) Temporal Join : P 2 =P . f=. Temporal Join : P 2 =P . f=. - P 2.boundlist are [ts i,ts ’ j ] such that ts’ j -ts i te i for some j (1 ≦ j ≦ m) ts’ j -ts i te i for some j (1 ≦ j ≦ m)

15 Drawbacks of MINEPI+ Huge amount of combinations Huge amount of combinations - Consider |I| 1-frequent episodes - O(|I| 2 ) checking for temporal joins and equal joins Unnecessary joins Unnecessary joins - should skip temporal joins for a prefix if the number of extendable matching bounds < minsup × |TDB| of extendable matching bounds < minsup × |TDB| Duplicate joins Duplicate joins - episode need 4+1 joins : → → → → → → → → → →

16 EMMA Divide into three phases Divide into three phases (I) Mining frequent itemset in the complex sequence. (II) Encode each frequent itemset with a unique ID, and construct a encoded horizontal database. (III) Mining episodes in the encoded database. Depth-First Search Depth-First Search Memory Anchor Memory Anchor - utilize the boundlists to access information - timelists of frequent itemsets are their boundlists

17 Example : database minsup = 5 minsup = 5

18 Combine episodes Only combine existing episodes with a “ local ” frequent 1-tuple episode. Only combine existing episodes with a “ local ” frequent 1-tuple episode. - overcome the huge amount of generations Projected boundlist (PBL) Projected boundlist (PBL) - episode #3= has boundlist {[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]} {[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]} - given maxwin = 4, the projected boundlist is {[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]} {[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]} - note that |TDB|=16

19 Example : PBL #3.timelist={1,2,4,8,11,14,15}. #3.timelist={1,2,4,8,11,14,15}. 1 → [2,4] 2 → [3,5] 4 → [5,7] 8 → [9,11] 11 → [12,14] 14 → [15,16] 15 → [16,16] with maxwin = 4 and |TDB|=16.

20 Local frequent ID A local frequent ID has boundlist that can match into other episode ’ s PBL. A local frequent ID has boundlist that can match into other episode ’ s PBL. - #3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]} - #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]} Record boundlist of ID when examining. Record boundlist of ID when examining. - get the boundlist immediately at temporal join - = then.boundlist = {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]} {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

21 Example : temporal join #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}. #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}. Recall the construction of #3.PBL Recall the construction of #3.PBL 1 → [2,4] : [3,3] in it 2 → [3,5] : [3,3] in it (take minimal) 4 → [5,7] : [5,5] in it 8 → [9,11] : [9,9] in it 11 → [12,14] : [12,12] in it 14 → [15,16] : [16,16] in it 15 → [16,16] : [16,16] in it Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]} Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}

22 Procedure : emmajoin Recursively extend the episodes Recursively extend the episodes - until no more serial episodes can be extended Avoid unnecessary checking in MINEPI+ Avoid unnecessary checking in MINEPI+ - stop when the number of extendable bounds for a serial episode is less than minsup × |TDB|. serial episode is less than minsup × |TDB|. Example : #2=. Example : #2=. - #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]} - #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16) - do not need to extend #2 if minsup = 5

23 Example : emmajoin #3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}. #3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}. #7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}. #7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}. #9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}. #9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}. Call emmajoin to extend each 1-tuple episodes Call emmajoin to extend each 1-tuple episodes #3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}. #3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}. Find local frequent IDs in #3.PBL. Find local frequent IDs in #3.PBL.

24 Example : emmajoin (cont.) minsup = 5, maxwin = 4. minsup = 5, maxwin = 4. By temporal Join : By temporal Join : -.BL={} -.BL={[1,4],[8,11],[11,14],[14,15]} -.BL={} -.BL={[1,4],[8,11],[11,14]} -.BL={[1,3],[4,6],[8,9],[11,12],[14,16]} - is generated from prefix #3 - recursively call emmajoin to extend - recursively call emmajoin to extend -.PBL={[4,4],[7,7],[10,11],[13,14]} - there are no local frequent IDs since minsup =5 Back to call emmajoin for episode #7. Back to call emmajoin for episode #7.

25 Experiments On a dataset composed of 10 stocks. On a dataset composed of 10 stocks. Parameters : maxwin / minsup. Parameters : maxwin / minsup. - more running time when maxwin increases - more running time when minsup decreases - since the number of frequent episodes increases EMMA runs faster than MINEPI+. EMMA runs faster than MINEPI+. MINEPI+ uses lesser space than EMMA. MINEPI+ uses lesser space than EMMA. - EMMA needs large memory as minsup decreases

26 Conclusion Modify MINEPI to MINEPI+ Modify MINEPI to MINEPI+ - for mining episodes in a complex sequence Propose EMMA Propose EMMA - avoid the drawbacks of MINEPI+ EMMA is more efficient than MINEPI+. EMMA is more efficient than MINEPI+. Future work Future work - only discussed serial episodes - parallel and composite episodes remain to be solved


Download ppt "Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang."

Similar presentations


Ads by Google