Download presentation

Presentation is loading. Please wait.

Published byDennis Gibbs Modified about 1 year ago

1
TEMPORAL ASSOCIATION RULE MINING Prepared by : Ajit Padukone, Komal Kapoor

2
Outline Association Rule Mining Applications Temporal Association Rule Mining Existing Techniques and their Limitations Problem Statement Proposed Approach –Finding Maximal Valid Time Intervals –Finding All Temporally Frequent Itemset Future Work

3
Motivation Association Rule Mining {onion, potatoes} => {burgers} {bread, milk} => {butter} Transaction Data Frequent itemsets : {onion,potatoes,burgers}, {bread,milk,butter}

4
Applications Retail Data Analysis Web Usage Mining Intrusion Detection Bioinformatics

5
Spatial Association Rule Mining Extract spatial predicates Find all frequent patterns/predicates/sets Generate strong rules E.g. {Contains(Port),crosses(WaterBody)} Source : Vania Borgony, Enhancing Spatial Association Rule Mining in Geographic Databases, lume.ufrgs.br

6
Temporal Association Rule Mining Chapter 10 of the reference book defines two types of temporal references: Transaction Time Valid Time Time attribute for association rules can also be defined in an analogous way.

7
Existing Technique – Apriori Algorithm Apriori Algorithm finds the frequent item sets in a set of transaction which satisfy the minimum support threshold. Support of the item set is defined as the proportion of transactions in the data set which contain the itemset. Algorithm: Find all k-itemsets that have transaction support above minimum support (frequent k-itemsets) Generate candidate k+1-itemsets using large k-itemsets Prune the candidate k+1-itemsets to obtain frequent k+1-itemsets which have a transaction support above minimum support If size(frequent k+1-itemsets) > 0, Repeat

8
Apriori Algorithm (contd.) Transactio n Items 1A, B, C 2B, C, F 3B, F, G 4A, C, D, F 5C, D, E, G 6A, B, E, G 7B, C, F, G 8A, B, G 9A, B, F, G 10C, F,G Universal Set of Items = { A, B, C, D, E, F, G } Minimum support = 30 % (3 transactions) Table 1: Transaction Database Item SetCount { A,B }4 { A,C }2 { A,E }1 { A,F }2 { A,G }3 { B,C }3 { B,E }1 { B,F }4 { B,G }5 { C,E }1 { C,F }4 { C,G }3 { E,F }0 { E,G }2 { F,G }3 Item SetCount { A,B,G }4 { B,F,G }3 { C,F,G }2 { B,C,F }2 { B,C,G }1 Step 2: 2 – itemsets. All 2 itemsets with { D } or { E } as one of the subsets are pruned. Non-struck out ones are frequent. Step 3: 3 – itemsets. All 3 itemsets with non- frequent 2-item sets as subsets have been pruned. Non-struck out ones are frequent. Step 1: 1 – itemsets. Non-struck out ones are frequent. ItemCount { A }5 { B }7 { C }6 { D }2 { E }2 { F }6 { G }7

9
Limitation The Apriori Algorithm finds the frequent itemsets in the transaction database which satisfy the minimum support threshold for the entire transaction database. What about those itemsets which are highly frequent over a limited period of time and not over the entire set of transactions? For e.g. – Turkey-> Pumpkin Pie (Halloween) The itemsets extracted using the Apriori Algorithm, might not be valid for the entire period over which association rule mining has been performed.

10
Related Work X. Chen and I. Petrounias, Mining Temporal Features in Association Rules, Proc. Third European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '99). Yingjiu Li, Peng Ning, X. Sean Wang, Sushil Jajodia, Discovering Calendar-based Temporal Association Rules, journal Data & Knowledge Engineering - Special issue: Temporal representation and reasoning archive Volume 44 Issue 2, February Kang et. al., Discovering Flow Anomalies: A SWEET Approach, Eighth IEEE International Conference on Data Mining, ICDM

11
The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12 th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12 th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12 th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12 th Dec-2009, 11 th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12 th Dec-2009, 12 th hr {{soap, shampoo, comb, toothbrush}} Temporal Association Rule Mining

12
The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12 th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12 th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12 th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12 th Dec-2009, 11 th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12 th Dec-2009, 12 th hr {{soap, shampoo, comb, toothbrush}} Temporal Association Rule Mining

13
Problem Statement Definitions : Support of an itemset I over interval (t i,t j ) = frequency of I in the interval (t i,t j )/Total number of transaction during the interval (t i,t j ) Valid Time Interval for itemset I: the time interval during which the support of I over the interval is greater than a threshold (lmin_sup) Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = Valid Time Intervals

14
Problem Statement Definitions : Support of an itemset I over interval (t i,t j ) = frequency of I in the interval (t i,t j )/Total number of transaction during the interval (t i,t j ) Valid Time Interval for itemset I: the time interval during which the support of I is greater than a threshold (lmin_sup) Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = Maximal Valid Time Intervals

15
Problem Statement (contd.) Given: Transaction data D in the format (TU, {T 1,T 2, …,T k }) Where TU-> Time Unit T i -> Transaction Find: All temporally frequent itemsets along with their maximal valid time intervals.

16
Problem Statement (contd.) So now, along with finding the frequent itemsets we have to find the maximal valid time intervals for each frequent itemset. Complexity of the naive approach for finding maximal valid time intervals for each frequent itemset: O(n 2 ) Where, n= |D|

17
Finding Maximal Valid Time Intervals Definition : Valid/Supporting Time Unit for I: Time Unit during which the support of I is greater than lmin_supp. Non-valid/Non-Supporting Time Unit for I: Time Unit during which the support of I is less than lmin_supp

18
Finding Maximal Valid Time Intervals Lemma 1: Each valid time interval TU i,TU j should contain atleast 1 valid/supporting time unit for I. Lemma 2: If an interval (TU i,TU j ) is not valid for I then the interval (TU i,TU j+1 ) where TU j+1 is a non-valid time unit cannot be valid. Lemma 3: If an interval (TU i,TU j ) is valid for I then the interval (TU i,TU j+1 ) where TU j+1 is a valid time unit would be valid Using Lemma 3, collapse continuous runs of supporting time units into 1 unit with the average density

19
Finding Maximal Valid Time Intervals (contd.) Given : Item set I, Transaction data D, lmin_sup Part 1: Find_maximal_valid_time_intervals(I,D,lmin_sup) Find STU={TU a 1,TU a 2,…,TU a n } such than TU a k is a supporting time unit for I For i = 1 to n For j=n to i+1 IF is_valid_time_interval(TU a i,TU a j,D,lmin_sup) break; End Lemma 1,

20
Finding Maximal Valid Time Intervals (contd.) Given : Item set I, Transaction data D, lmin_sup Part 2: start = TU a i-1 +1, finish=TU a j+1 -1 low = start, high = TU a j While low <= TU a i and end < = finish IF is_valid_time_interval(low,high) high = high +1 Else low = low+1 End Lemma 2

21
Finding Maximal Valid Time Intervals (contd.)

22
Finding Maximal Valid Time Intervals (contd.) Complexity: O(n ’2 + n) Further iterations…

23
Finding All Temporally Frequent Itemset Given: Transaction data D, lmin_sup, UI (Universal Itemset) C->Generate_1-item_candidate_sets(UI,D) Interval = (1, |D|) While (|C|>0) For each candidate set c in C max_valid_intervals-> find_maximal_valid_time_interval(c,D,lmin_sup) If |max_valid_intervals|>0 temp_freq_sets.add( ) End If |temp_freq_sets| > 0 C-> generate_new_candidate_sets(temp_freq_sets, D,lmin_sup) Else C-> null End

24
Pruning in Candidate Set Generation Transactions Item Set T1T2T3T4T5T6T7T8T9 L-2a-b a-c C-3a-b-c

25
Future Work Find cyclic valid time intervals Identify interesting maximal valid time intervals

26
Questions?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google