Download presentation

Presentation is loading. Please wait.

1
**TEMPORAL ASSOCIATION RULE MINING**

Prepared by : Ajit Padukone, Komal Kapoor

2
**Outline Association Rule Mining Applications**

Temporal Association Rule Mining Existing Techniques and their Limitations Problem Statement Proposed Approach Finding Maximal Valid Time Intervals Finding All Temporally Frequent Itemset Future Work

3
**Motivation Association Rule Mining {onion, potatoes} => {burgers}**

{bread, milk} => {butter} Transaction Data Frequent itemsets : {onion,potatoes,burgers}, {bread,milk,butter} Transaction ID Items 1 bread, milk, butter, cheese, chips 2 onion, capsicum, potatoes, burgers 3 bread, milk, yogurt, butter 4 onion, potatoes, ketchup, burgers 5 soap, shampoo, comb, toothbrush method for discovering interesting relations between variables in large databases, large scale transaction data supermarkets. The supermarket industry was extremely interested in analyzing the shopping behaviors of customers to find out patterns promotionalpricing or product placements.

4
**Applications Retail Data Analysis Web Usage Mining Intrusion Detection**

Bioinformatics

5
**Spatial Association Rule Mining**

Extract spatial predicates Find all frequent patterns/predicates/sets Generate strong rules E.g. {Contains(Port),crosses(WaterBody)} Source : Vania Borgony, Enhancing Spatial Association Rule Mining in Geographic Databases, lume.ufrgs.br Spatial predicates represent materialized spatial relationships between geographic entities such as close, far, contains, within, touches etc.

6
**Temporal Association Rule Mining**

Chapter 10 of the reference book defines two types of temporal references: Transaction Time Valid Time Time attribute for association rules can also be defined in an analogous way.

7
**Existing Technique – Apriori Algorithm**

Apriori Algorithm finds the frequent item sets in a set of transaction which satisfy the minimum support threshold. Support of the item set is defined as the proportion of transactions in the data set which contain the itemset. Algorithm: Find all k-itemsets that have transaction support above minimum support (frequent k-itemsets) Generate candidate k+1-itemsets using large k-itemsets Prune the candidate k+1-itemsets to obtain frequent k+1-itemsets which have a transaction support above minimum support If size(frequent k+1-itemsets) > 0, Repeat

8
**Apriori Algorithm (contd.)**

Universal Set of Items = { A, B, C, D, E, F, G } Minimum support = 30 % (3 transactions) Item Set Count { A,B } 4 { A,C } 2 { A,E } 1 { A,F } { A,G } 3 { B,C } { B,E } { B,F } { B,G } 5 { C,E } { C,F } { C,G } { E,F } { E,G } { F,G } Transaction Items 1 A, B, C 2 B, C, F 3 B, F, G 4 A, C, D, F 5 C, D, E, G 6 A, B, E, G 7 B, C, F, G 8 A, B, G 9 A, B, F, G 10 C, F,G Item Count { A } 5 { B } 7 { C } 6 { D } 2 { E } { F } { G } Item Set Count { A,B,G } 4 { B,F,G } 3 { C,F,G } 2 { B,C,F } { B,C,G } 1 Step 3: 3 – itemsets. All 3 itemsets with non-frequent 2-item sets as subsets have been pruned. Non-struck out ones are frequent. Step 1: 1 – itemsets. Non-struck out ones are frequent. Table 1: Transaction Database Step 2: 2 – itemsets. All 2 itemsets with { D } or { E } as one of the subsets are pruned. Non-struck out ones are frequent.

9
Limitation The Apriori Algorithm finds the frequent itemsets in the transaction database which satisfy the minimum support threshold for the entire transaction database. What about those itemsets which are highly frequent over a limited period of time and not over the entire set of transactions? For e.g. – Turkey-> Pumpkin Pie (Halloween) The itemsets extracted using the Apriori Algorithm, might not be valid for the entire period over which association rule mining has been performed.

10
Related Work X. Chen and I. Petrounias, Mining Temporal Features in Association Rules, Proc. Third European Conf. Principles and Practice of Knowledge Discovery in Databases (PKDD '99). Yingjiu Li, Peng Ning, X. Sean Wang, Sushil Jajodia, Discovering Calendar-based Temporal Association Rules , journal Data & Knowledge Engineering - Special issue: Temporal representation and reasoning archive Volume 44 Issue 2, February 2003. Kang et. al., Discovering Flow Anomalies: A SWEET Approach, Eighth IEEE International Conference on Data Mining, ICDM

11
**Temporal Association Rule Mining**

The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12th Dec-2009, 11th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12th Dec-2009, 12th hr {{soap, shampoo, comb, toothbrush}}

12
**Temporal Association Rule Mining**

The book also defines ‘Time instants’ or ‘Time Intervals’ ‘chronon’ and ‘duration’ e.g. 12th Dec-2009, 11:20:24 {bread, milk, butter, cheese, chips} 12th Dec-2009, 11:27:04 {onion, capsicum, potatoes, burgers} 12th Dec-2009, 12:05:44 {soap, shampoo, comb, toothbrush} 12th Dec-2009, 11th hr {{bread, milk, butter, cheese, chips}, {onion, capsicum, potatoes, burgers}} 12th Dec-2009, 12th hr {{soap, shampoo, comb, toothbrush}} Time Unit (chronon)

13
**Problem Statement Definitions : Valid Time Intervals**

Support of an itemset I over interval (ti,tj) = frequency of I in the interval (ti,tj)/Total number of transaction during the interval (ti,tj) Valid Time Interval for itemset I: the time interval during which the support of I over the interval is greater than a threshold (lmin_sup) Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = 0.5 0.3 0.4 0.5 0.7 0.6 0.2 0.8 Valid Time Intervals

14
**Problem Statement Definitions : Maximal Valid Time Intervals**

Support of an itemset I over interval (ti,tj) = frequency of I in the interval (ti,tj)/Total number of transaction during the interval (ti,tj) Valid Time Interval for itemset I: the time interval during which the support of I is greater than a threshold (lmin_sup) Maximal Valid Time Interval: A valid interval for an itemset I which not contained in any other valid time interval for I. Temporally Frequent itemset: A itemset which has atleast one valid time interval associated with it. Lmin_sup = 0.5 0.3 0.4 0.5 0.7 0.6 0.2 0.8 Maximal Valid Time Intervals

15
**Problem Statement (contd.)**

Given: Transaction data D in the format (TU, {T1,T2,…,Tk}) Where TU-> Time Unit Ti-> Transaction Find: All temporally frequent itemsets along with their maximal valid time intervals.

16
**Problem Statement (contd.)**

So now, along with finding the frequent itemsets we have to find the maximal valid time intervals for each frequent itemset. Complexity of the naive approach for finding maximal valid time intervals for each frequent itemset: O(n2) Where, n= |D|

17
**Finding Maximal Valid Time Intervals**

Definition : Valid/Supporting Time Unit for I: Time Unit during which the support of I is greater than lmin_supp. Non-valid/Non-Supporting Time Unit for I: Time Unit during which the support of I is less than lmin_supp. . 0.3 0.4 0.5 0.7 0.6 0.2 0.8

18
**Finding Maximal Valid Time Intervals**

Lemma 1: Each valid time interval TUi,TUj should contain atleast 1 valid/supporting time unit for I. Lemma 2: If an interval (TUi,TUj) is not valid for I then the interval (TUi,TUj+1) where TUj+1 is a non-valid time unit cannot be valid. Lemma 3: If an interval (TUi,TUj) is valid for I then the interval (TUi,TUj+1) where TUj+1 is a valid time unit would be valid. 0.3 0.4 0.5 0.7 0.6 0.2 0.8 Using Lemma 3, collapse continuous runs of supporting time units into 1 unit with the average density 0.3 0.4 0.6 0.2 0.75

19
**Finding Maximal Valid Time Intervals (contd.)**

Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_sup Part 1: Find_maximal_valid_time_intervals(I,D,lmin_sup) Find STU={TUa1,TUa2,…,TUan} such than TUak is a supporting time unit for I For i = 1 to n For j=n to i+1 IF is_valid_time_interval(TUai,TUaj,D,lmin_sup) break; End Lemma 1,3 0.3 0.4 0.6 0.7 0.2 0.75 0.3 0.4 0.6 0.7 0.2 0.75

20
**Finding Maximal Valid Time Intervals (contd.)**

Given: Item set I, Transaction data D <TUi, {T1,T2, …,Tn}>, lmin_sup Part 2: start = TUai-1+1 , finish=TUaj+1-1 low = start, high = TUaj While low <= TUai and end < = finish IF is_valid_time_interval(low,high) high = high +1 Else low = low+1 End Lemma 2

21
**Finding Maximal Valid Time Intervals (contd.)**

0.3 0.4 0.6 0.7 0.2 0.75 0.3 0.4 0.6 0.7 0.2 0.75 0.3 0.4 0.6 0.7 0.2 0.75

22
**Finding Maximal Valid Time Intervals (contd.)**

Further iterations… 0.3 0.4 0.6 0.7 0.2 0.75 0.3 0.4 0.6 0.7 0.2 0.75 0.3 0.4 0.6 0.7 0.2 0.75 Complexity: O(n’2 + n)

23
**Finding All Temporally Frequent Itemset**

Given: Transaction data D <TUi,{T1, T2, …,Tn}>, lmin_sup, UI (Universal Itemset) C->Generate_1-item_candidate_sets(UI,D) Interval = (1, |D|) While (|C|>0) For each candidate set c in C max_valid_intervals-> find_maximal_valid_time_interval(c,D,lmin_sup) If |max_valid_intervals|>0 temp_freq_sets.add(<c,max_valid_intervals>) End If |temp_freq_sets| > 0 C-> generate_new_candidate_sets(temp_freq_sets , D,lmin_sup) Else C-> null

24
**Pruning in Candidate Set Generation**

Transactions Item Set T1 T2 T3 T4 T5 T6 T7 T8 T9 L-2 a-b a-c C-3 a-b-c

25
**Future Work Find cyclic valid time intervals**

Identify interesting maximal valid time intervals

26
Questions?

Similar presentations

Presentation is loading. Please wait....

OK

Association Rules Carissa Wang February 23, 2010.

Association Rules Carissa Wang February 23, 2010.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on applied operational research definition Ppt on right to education in india Ppt on green building technology Ppt on vision and mission Ppt on hydrogen fuel cell Ppt on decimals for class 4 Ppt on astronomy and astrophysics magazine Ppt on pin diode Ppt on balanced diet Ppt on teaching learning materials