Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.

Similar presentations


Presentation on theme: "Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen."— Presentation transcript:

1 Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen

2 Outline Motivation Objective Definition Adaptive Load Shedding in Data Stream Performace Results Conclusion 2008/3/192Yi-Chun Chen

3 Motivation Finding frequent itemsets plays an important role in analyzing data streams Only assuming that the machinery itself is fast enough to handle all incoming transactions without incurring any unwanted latencies 2008/3/19Yi-Chun Chen3

4 (Cont.) The arrival rate of data streams usually exceeds the system capacity Algorithms mining from data streams must cope with system overload situations 2008/3/19Yi-Chun Chen4

5 Objective Given a processing capacity C of a mining system and a data stream DS with high arrival rates Load(DS) : the workload of the system If, a load shedding is invoked Guarantee Discover a set of patterns closely approximates to the set of actual frequent itemsets 2008/3/19Yi-Chun Chen5

6 (Cont.) How to determine overload situations? How much load to shed? How to approximate frequent patterns under the introduction of load shedding? 2008/3/19Yi-Chun Chen6

7 Definition : the occurrence count of X in DS up to the transaction MFIs: maximal frequent itemset 2008/3/19Yi-Chun Chen7

8 Adaptive Load Shedding in Data Streams Overload Detection Load Shedding by Sampling Transactions 2008/3/19Yi-Chun Chen8

9 Overload Detection To quickly estimate the system workload, we propose an approximate method on MFIs –MFIs also contains all frequent itemsets –The # of MFIs is smaller than the # of frequent itemsets –The support of MFIs is always closest to 2008/3/19Yi-Chun Chen9

10 (Cont.) load coefficient: –k be the # of MFIs in a transaction – be a MFI, where Suppose we measure the above statistics for n transactions over one time unit –r be the current rate of the data stream 2008/3/19Yi-Chun Chen10

11 Load Shedding by Sampling Transactions In order to estimate how much load to shed –P be a parameter expressing the fraction of transactions that should be discarded –Suppose P < 1, then we use Hoeffding bound to discard transactions and to approximate frequent patterns 2008/3/19Yi-Chun Chen11

12 (Cont.) Hoeffding bound: –, – r be the number of times that occurs in these transactions –sup(X) = p : the true support of X – : the estimated support of X –We want to satisfy the inequality, so the required number of sampling transactions is at least 2008/3/19Yi-Chun Chen12

13 (Cont.) Sample batch: each incoming transaction is chosen with probability P until we sample enough transactions Local patterns: all freq. itemsets in this sample batch are found only within part of the stream Global freq. itemsets in the entire stream 2008/3/19Yi-Chun Chen13

14 (Cont.) Due to the non-uniform distribution of the stream –False global patterns –Significant support : the max. support error of each pattern : frequent : sub-frequent : infrequent 2008/3/19Yi-Chun Chen14 Significant patterns

15 (Cont.) The required number of sampling transactions is at least If and,then is too huge we assume that each itemset appearing more than 0.01%,then if, then every itemset will be chosen, 2008/3/19Yi-Chun Chen15

16 Performance Results Accuracy Measurements Adaptability Recall: 找到的 true freq. patterns / 實際上是 true freq. patterns Precision: 找到 true freq. patterns / 找到的 total freq. patterns Synthetic: T5I3D1000K, T8I4D1000K with 10000 unique items Real-life: “BMS-POS” T6.5 D515597 with 1657 distinct items Fix, select 2008/3/19Yi-Chun Chen16

17 2008/3/19Yi-Chun Chen17

18 2008/3/19Yi-Chun Chen18

19 Conclusion To address the problem of finding frequent patterns from data streams where the mining system may not keep up with the arrival reat of the stream 2008/3/19Yi-Chun Chen19


Download ppt "Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen."

Similar presentations


Ads by Google