Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and.

Similar presentations


Presentation on theme: "1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and."— Presentation transcript:

1 1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and Reasoning

2 2 Outline of the Presentation  Background  Temporal Association Rule Mining w.r.t Precise Match  Temporal Association Rule Mining w.r.t Fuzzy Match  Experiments  Conclusions  References  Q & A

3 3 Background  Temporal Association Rule: Association rules along with their temporal intervals E.g.: “turkey  pumpkin pie” is a temporal association rule along with the temporal interval “within the week before thanksgiving”.  Why interested in temporal association rule mining? We may discover different association rules regarding different time intervals. Some association rules may hold during some intervals but not during others.  this may lead to useful information.

4 4 Calendar Schema  Relational schema R = (f n :D n, f n-1 :D n-1, …. f 1 :D 1 ) together with a valid constraint f i : a calendar unit name like year, month, etc. D i : a finite subset of the positive integers. A constraint valid: a boolean function on Dn×…×D 1 specifying which combinations of the values in Dn×…×D 1 are “valid”.

5 5 Calendar Schema  E.g.:a calendar schema (year:{1995,1996,..2002}, month:{1,2,..,12}, day:{1,2,..,31}) with the constraint valid that evaluates to True only if the combination gives a valid date. e.g., is not valid  Simply stated, a calendar schema is determined by a hierarchy of calendar concept e.g.: (year, month, day)

6 6 Calendar Pattern  Defines a set of time intervals based on the calendar schema  e.g: is a calendar pattern based on the calendar schema corresponding to the time intervals consisting of all the 16th days of all months in year 2000  Time intervals or periodic cycles can be easily described by calendar patterns with appropriate calendar schemas. E.g.: the periodic cycle “every seven days” can be expressed by a calendar pattern, where 1<=i<=7, under the calendar schema R = (week, day) depending on which day the cycle starts

7 7 Problem Formulation  Given a calendar schema R, a set T of timestamped transactions and a match ratio (optional), we want to discover all interesting association rules w.r.t. Precise Match Fuzzy Match  Assumption: we are not interested in the association rules that only hold during basic time intervals. Indeed, such rules do not reveal much interesting information in terms of time. E.g.: if the calendar schema is (year, month, day), we are not going to find the association rules hold during each single day. -- Basic time interval: a calendar pattern with no wild-card symbol

8 8 Problem Formulation  Temporal Association Rule w.r.t. Precise Match: Given a calendar schema R and a set T of timestamped transactions, a temporal association rule (r,e) hold if and only if the association rule r holds for each basic time interval t covered by star calendar pattern e. -- Star calendar pattern: a calendar pattern with at least one wild- card symbol E.g., given the calendar schema (year, month, Thursday), we may have a temporal association rule (turkey  pumpkin pie, ) that holds w.r.t precise match. The rule means that the association rule (turkey  pumpkin pie) holds on all Thanksgiving days, which is the 4 th Thursday in November of every year.

9 9 Problem Formulation  Temporal Association Rule w.r.t. Fuzzy Match: Given a calendar schema R, a set T of timestamped transactions and a match ratio m, a temporal association rule (r,e) hold if and only if the association rule r holds for at least 100m% of basic time interval t covered by star calendar pattern e. E.g., given the calendar schema (year, month, Thursday) and match ratio m=0.8, we may have a temporal association rule (turkey  pumpkin pie, ) that holds w.r.t fuzzy match. This means that the association rule (turkey  pumpkin pie) holds on at least 80% of Thanksgiving days.

10 10 Temporal Association Rule Mining  Two sub-problems Finding all large itemsets for all the star calendar patterns on the given calendar schema (based on Apriori [AS94]) – crux of the discovery of temporal association rules. Generating temporal association rules using the large itemsets and their calendar patterns – the same as traditional association rule generation approach [AS94].

11 11 critical step! Because fewer candidate large itemsets, less time for phase II needed. The same as traditional approach The same here Outline of the Algorithm (for both precise and fuzzy match)

12 12 Phase III for Precise Match  After the basic time interval e 0 is processed in pass k, the large k- itemsets for all the calendar patterns e that covers e 0 are updated as follows, If L k (e) is updated for the first time (i.g.,L k (e) = NULL), let L k (e)=L k (e 0 ) Else L k (e) = L k (e) L k (e 0 )  E.g.: given calendar patterns (1995, *, 1) and (*,2,*) and L2(1995, *, 1) = {AB, DE} and L2(*,2,*) = {AB, BC, DE}. s uppose after processing basic time interval (1995,2,1), we get L2 = {AC, BC, DE}  L2(1995, *, 1) = {DE}  L2(*,2,*) = {BC, DE}  So after all the basic time intervals are processed, the set of large k-itemsets for each calendar pattern could be discovered.

13 13 Phase III for Fuzzy Match  Associate a counter c_update with each candidate for each star calendar pattern.  Counters are initially set to 1  When L k (e 0 ) is used to update L k (e) in phase III, the counters of the itemsets in L k (e) that are also in L k (e 0 ) are increment by 1  Suppose there are totally N basic time intervals covered by e and this is the nth update to L k (e), an itemset cannot be large for e if its counter c_update does not satisfy c_update + (N-n)>= mN

14 14 Phase III for Fuzzy Match  Example: Calendar schema R = (week, day) fuzzy match ratio m = 0.8 Consider calendar pattern, suppose there are only 5 basic time intervals covered. (N=5) This is the 3 rd time that L2( ) is updated (n=3) So we only keep the itemsets with c_update >= mN – (N-n) = 2

15 15 Candidate Generation (Phase I)  Direct-Apriori: A naïve approach to generate candidate itemsets is to treat each basic time interval individually and directly apply Apriori’s candidate generation approach.  For both precise match and fuzzy match

16 16 Candidate Generation for Precise Match: Temporal AprioriGen  Since we are not interested in the large itemsets for basic time intervals, if a C k (e 0 ) cannot be large for any of the star calendar patterns that cover the basic time interval e 0, simply ignore it.  So, we can generate the candidate C k (k>1) as follows

17 17 Candidate Generation for Precise Match: Temporal AprioriGen  Example: Consider the calendar schema R = (week:{1,..,5}, day:{1,..,7}). Suppose we already have L2( )={AB,AC,AD,AE,BC,BD,CD,CE}; L2( ) = {AB,AC,AD,BC,BD,CE}; L2( )={AB,AC,AD,BD,CD}. By using temporal aprioriGen C3( )={ABC,ABD}; C3( )={ABD,ACD}; C3( )=C3( ) U C3( )={ABC,ABD,ACD} By using Direct-Apriori, C3( ) = {ABC,ABD,ACD,ACE,BCD}

18 18 Candidate Generation for Precise Match: Horizontal Pruning  If an itemset l in C k (e 0 ) does not appear in any of the tentative L k (e 1 ), where e 1 is a 1-star pattern that covers e 0, then l cannot be large for any star pattern e that covers e 0. Therefore, we drop l from C k (e 0 )

19 19 Example: suppose when the basic time interval is being processed, we already have L3( )={ABD}; L3( )={ABD,ACD}. we get C3( ) = {ABC,ABD,ACD} after using temporal aprioriGen, we can further prune it by C3( )=C3( ) (L3( ) U L3( )) = {ABD, ACD} Candidate Generation for Precise Match: Horizontal Pruning

20 20 Candidate Generation for Fuzzy Match: Temporal AprioriGen Temporal AprioriGen for precise match cannot be directly applied to solve the fuzzy match problem, because an itemset may be large for a star calendar e even if it is not large for any 1-star pattern covered by e. For example: Consider a schema R = (week, day) and fuzzy ratio m = 0.8. We can see: and is large and is not large

21 21 Candidate Generation for Fuzzy Match: Temporal AprioriGen  Change the temporal aprioriGen to apply to fuzzy match as follows,  Change blue underline part to L k-1 (e) when memory is the critical resource.

22 22 Candidate Generation for Fuzzy Match: Temporal AprioriGen Example: Suppose we already have L2( ) = {AB,AC,AD,AE,BD,CD,CE}; L2( ) = {AB,AC,AD,BC,BD,CE}; L2( ) = {AB,AC,AD,BD,CD}; L2( ) = {AB,AD,BD,CD,AC,AE} L T = L2( ) L2( ) = {AB,AC,AD,BD,CE} C3( ) = aprioriGen(L T ) = {ABD} Similarly, we can get C3( )={ABD,ACD} and C3( )={ABD,ACE}  C3( ) = C3( ) U C3( ) U C3( ) ={ABD,ACD,ACE}

23 23 Candidate Generation for Fuzzy Match: Horizontal Pruning  The pruning idea is to discard the candidate itemsets that cannot be large for calendar pattern e even if they are large for basic time interval e 0.

24 24 Candidate Generation for Fuzzy Match: Horizontal Pruning Example: Suppose we already have: C3( ) = {ABD,ACD,ACE} L3( ), L3( ), L3( ) have been updated once. C3( )={ABD,ABE}; C3( )={ABD,ACD} C3( )={ABD}; then C3( ) can be pruned as C3( ) = C3( ) (C3( )UC3( )UC3( )) = {ABD,ACD}

25 25 Experiments  Real Data set : Data file consists of homepage request records, each of which contains attribute values describing the request and the person who sent the request. Data file records are from Jan 30 to Mar 31,2000 Calendar schema used R = {week, day, timeofday}, where timeofday contains Data set contains 777,480 transactions, 23.4 items per transaction on average.

26 26 Experiments  Real data set result:

27 27 Experiments  Synthetic Data Set: Extend the data generator propose in [AS94] to incorporate temporal features.

28 28 Experiments  Synthetic Data Set Result:

29 29 Conclusions  Develops a new representation mechanism for temporal association rules on the basis of calendars and identify two classes of interesting temporal association rules w.r.t. precise match and fuzzy match.  The representation requires less prior knowledge and resulting time intervals are easier to understand  Extend the algorithm Apriori and develop two optimization techniques to discover both classes of temporal association rules  Experiments show that the optimization techniques are effective

30 30 Possible Future Works  It requires for a calendar schema (f n,f n-1,…,f 1 ), each calendar unit of f i is uniquely contained in a unit of f i+1, where 0<i<n E.g., (year, month, week) is NOT allowed because a week may not be contained in a unique month  Consider temporal patterns in other data mining problems such as clustering, etc.

31 31 Y. Li, P. Ning, X. S. Wang, and S. Jajodia. Discovering calendar-based temporal association rules. In the Eighth International Symposium on Temporal Representation and Reasoning (TIME 01) [AS94] R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. VLDB 94 S. Ramaswamy, S. Mahajan and A. Silberschatz. On the discovery of interesting patterns in association rules. VLDB98 References

32 32 Questions and Answers Any Questions?


Download ppt "1 Discovering Calendar-based Temporal Association Rules SHOU Yu Tao May. 21 st, 2003 TIME 01, 8th International Symposium on Temporal Representation and."

Similar presentations


Ads by Google