1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Rectangle-Efficient Aggregation in Spatial Data Streams Srikanta Tirthapura David Woodruff Iowa State IBM Almaden.
Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.
Maintaining Variance and k-Medians over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan Stanford University.
Maintaining Variance over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O ’ Callaghan, Stanford University ACM Symp. on Principles.
Mining Data Streams.
From Counting Sketches to Equi-Depth Histograms CS240B Notes from a EDBT11 paper entitled: A Fast and Space-Efficient Computation of Equi-Depth Histograms.
Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal.
CS 253: Algorithms Chapter 8 Sorting in Linear Time Credit: Dr. George Bebis.
Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.
1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura.
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.
A survey on stream data mining
Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
1 Analysis of Link Reversal Routing Algorithms Srikanta Tirthapura (Iowa State University) and Costas Busch (Renssaeler Polytechnic Institute)
CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.
One-Pass Wavelet Decompositions of Data Streams TKDE May 2002 Anna C. Gilbert,Yannis Kotidis, S. Muthukrishanan, Martin J. Strauss Presented by James Chan.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Maintaining Variance and k-Medians over Data Stream Windows Paper by Brian Babcock, Mayur Datar, Rajeev Motwani and Liadan O’Callaghan. Presentation by.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
1 Efficient Computation of Frequent and Top-k Elements in Data Streams.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
1 Oblivious Routing in Wireless networks Costas Busch Rensselaer Polytechnic Institute Joint work with: Malik Magdon-Ismail and Jing Xi.
Mathematical Induction II Lecture 21 Section 4.3 Mon, Feb 27, 2006.
Dave McKenney 1.  Introduction  Algorithms/Approaches  Tiny Aggregation (TAG)  Synopsis Diffusion (SD)  Tributaries and Deltas (TD)  OPAG  Exact.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Analysis of Link Reversal Routing Algorithms for Mobile Ad Hoc Networks Costas Busch (RPI) Srikanth Surapaneni (RPI) Srikanta Tirthapura (Iowa State University)
Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
1 Costas Busch Õ(Congestion + Dilation) Hot-Potato Routing on Leveled Networks Costas Busch Rensselaer Polytechnic Institute.
1 Online Computation and Continuous Maintaining of Quantile Summaries Tian Xia Database CCIS Northeastern University April 16, 2004.
Calculating frequency moments of Data Stream
Mining of Massive Datasets Ch4. Mining Data Streams
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University.
1 Bottleneck Routing Games on Grids Costas Busch Rajgopal Kannan Alfred Samman Department of Computer Science Louisiana State University.
1 Concurrent Counting is harder than Queuing Costas Busch Rensselaer Polytechnic Intitute Srikanta Tirthapura Iowa State University.
ICDCS 05Adaptive Counting Networks Srikanta Tirthapura Elec. And Computer Engg. Iowa State University.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Sorting.
Mining Data Streams (Part 1)
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
Clustering Data Streams
The Stream Model Sliding Windows Counting 1’s
Finding Frequent Items in Data Streams
Efficient Bufferless Routing on Leveled Networks Costas Busch Shailesh Kelkar Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Introduction to Algorithms
Streaming & sampling.
Analysis of Link Reversal Routing Algorithms
Lecture 4: CountSketch High Frequencies
Range-Efficient Counting of Distinct Elements
Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
Range-Efficient Computation of F0 over Massive Data Streams
Maintaining Stream Statistics over Sliding Windows
Mathematical Induction II
Author: Ramana Rao Kompella, Kirill Levchenko, Alex C
Presentation transcript:

1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura Iowa State University

2 Introduction Algorithm Analysis Outline of Talk

3 Time Data stream: For simplicity assume unit valued elements

4 Current time Most recent time window of duration Compute the sum of elements with time stamps in time window Goal: Data stream:

5 Example I: All packets on a network link, maintain the number of different ip sources in the last one hour Example II: Large database, continuously maintain averages and frequency moments

6 Synchronous stream t i : In ascending order Asynchronous stream t i : No order guaranteed Data stream:

7 Why Asynchronous Data Streams? Synchronous streamAsynchronous stream Synchronous Asynchronous Merge w/o control Network delay & multi-path routing

8 Processing Requirements: One pass processing Small workspace: poly-logarithmic in the size of data Fast processing time per element Approximate answers are ok

9 Our results: A deterministic data aggregation algorithm Time: Space: Relative Error:

10 Previous Work: [Datar, Gionis, Indyk, Motwani. SIAM Journal on Computing, 2002] Deterministic, Synchronous [Tirthapura, Xu, Busch, PODC, 2006] Randomized, Asynchronous Merging buckets Random sampling

11 Introduction Algorithm Analysis Outline of Talk

12 Current time Time Data stream: For simplicity assume unit valued elements

13 Current time Most recent time window of duration Data stream: Compute the sum of elements with time stamps in time window Goal:

14 Divide time into periods of duration

15 The sliding window may span at most two time periods sliding window

16 sliding window Sum can be written as two sub-sums In two time periods

17 sliding window Data structure that maintains an estimate of In left time period

18 Without loss of Generality, Consider data structure in time period

19 Data structure consists of various levels is an upper bound of the sum in a period

20 Counts up to elements Time period Bucket at Level Consider level

21 Increase counter value Stream:

22 Increase counter value Stream:

23 Stream: Increase counter value

24 Increase counter value Stream:

25 Stream: Counter threshold of reached Split bucket

26 Stream: New buckets have threshold also

27 Stream: Increase appropriate bucket

28 Stream: Increase appropriate bucket

29 Stream: Increase appropriate bucket

30 Stream: Split bucket

31 Stream:

32 Stream: Increase appropriate bucket

33 Stream: Split bucket

34 Stream:

35 Splitting Tree

36 Leaf buckets of duration 1 are not split any further Max depth =

37 The initial bucket may be split into many buckets Leaf buckets

38 Due to space limitations we only keep the last buckets Leaf buckets

39 Suppose we want to find the sum of elements in time period

40 of splitting threshold Consider various levels

41 First level with a leaf bucket that intersects timeline

42 Estimate of S: Consider buckets on right of timeline

43 First level with a leaf bucket On right timeline OR

44 Introduction Algorithm Analysis Outline of Talk

45 Suppose that we use level in order to compute the estimate

46 Stream: A data element is counted in the appropriate bucket Consider splitting threshold level

47 Stream: We can assume that the element is placed in the respective bucket

48 Stream: We can assume that when bucket splits the element is placed in an arbitrary child bucket

49 Stream: If: GOOD! Element counted in correct bucket

50 Stream: If: BAD! Element counted in wrong bucket

51 Consider Leaf Buckets If GOOD!

52 Consider Leaf Buckets If BAD! Element counted in wrong bucket

53 :elements of left part counted on right Consider Leaf Buckets :elements of right part counted on left

54 elements of left part counted on right Must have been initially inserted in one of these buckets

55 Since tree depth

56 Since tree depth Similarly, we can prove Therefore:

57 It can be proven Since

58 It can be proven Since Combined with We obtain relative error :