Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring Methods for Topic Drift in Message Streams By Christopher Ross & S. Muthu Muthukrishnan.

Similar presentations


Presentation on theme: "Monitoring Methods for Topic Drift in Message Streams By Christopher Ross & S. Muthu Muthukrishnan."— Presentation transcript:

1 Monitoring Methods for Topic Drift in Message Streams By Christopher Ross & S. Muthu Muthukrishnan

2 Topic Bursts Define a continued period of popularity as a “burst”. Given a series of N items, a “burst pattern” on that series is a binary string of length N.

3 Applications Astrophysics Observable phenomena Stock Market Correlated patterns News Items www.daypop.com National Security

4 Methods Two Algorithms: J. Kleinberg Bursty and Hierarchial Structure in Streams Y. Zhu / D. Shasha Efficient Elastic Burst Detection in Data Streams

5 Kleinberg Method Compares local frequency to mean frequency Utilizes weighted finite state automata and dynamic programming Approximately O(4*n)

6 Kleinberg Method, contd. The algorithm minimizes two costs: One cost is dependent upon the data stream: One cost is dependent upon the current state:

7 Kleinberg Method, contd. americans1994 -america1996 - medicare1994 -challenge1996 - school1994 -schools1996 - welfare1994 - 1997teachers1996 - bipartisan1995 -21 st 1997 - college1995 -ask1997 - communities1995 -century1997 - working1995 - 1996help1998 -

8 Kleinberg Method, contd. Advantages: Adaptable to multiple levels of “burstiness” Disadvantages: Unsuited for real-time calculations

9 Zhu/Shasha Method Compares local frequency to a predefined parameter. Utilizes shifted wavelet trees Approximately O(2*n)

10 Zhu/Shasha Method, contd. After creating the wavelet tree, it is not necessary to search every possible window, merely the appropriate level of the tree.

11 Zhu/Shasha Method, contd. Advantages Easily adapted to streaming data Flexible Disadvantages Requires many input parameters

12 Drawbacks Both algorithms are rather arbitrary Klienberg’s has its cost functions Zhu/Shasha’s has its windows and thresholds In essence, there is no universal definition of a burst.

13 Future Considerations Alternative algorithms exist After all, if the definition is arbitrary… Research into Haar wavelets Only capture the k largest coefficients Is this sufficient?

14 The End


Download ppt "Monitoring Methods for Topic Drift in Message Streams By Christopher Ross & S. Muthu Muthukrishnan."

Similar presentations


Ads by Google