Presentation is loading. Please wait.

Presentation is loading. Please wait.

DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams.

Similar presentations


Presentation on theme: "DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams."— Presentation transcript:

1 DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

2 Talk Outline Intro & Motivation Stream Partitioning Techniques – Basic window partitioning – Batch partitioning – Pane-based partitioning Ring-based Query Evaluation Experimental Evaluation Conclusions & Future Work 2 2

3 DSMS Intro & Motivation 3 3

4 Architectural Overview Classical Split-Merge pattern from Parallel DBs Adjustable parallelism level, d QoS on max latency & order Query Query nodes Split stage Split node Query Merge stage Merge node input stream output stream QoS: latency < 5 seconds disorder < 3 tuples Query 4 4

5 Related Work: How to Partition? Content-sensitive – FluX: Fault-tolerant, load balancing Exchange [1,2] – Use group-by values from the query to partition – Need explicit load-balancing due to skewed data Content-insensitive – GDSM: Window-based parallelization (fixed-size tumbling wins) [3] – Win-Distribute: Partition at window boundaries – Win-Split: Partition each win into equi-length subwins The Problem: – How to handle sliding windows? – How to handle queries without group-by or a few groups? [1] Flux: An Adaptive Partitioning Operator for Continuous Query Systems, ICDE03 [2] Highly-Available, Fault-Tolerant, Parallel Dataflows, SIGMOD 04 [3] Customizable Parallel Execution of Scientific Stream Queries, VLDB

6 Stream Partitioning Techniques

7 Independently processable chunking – Window aware splitting of the stream Each window has an id & tuples are marked – (first-winid, last-winid, is-win-closer) Tuples are replicated for each of their windows t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t W1W1 W3W3 W4W4 w = 6 units, s = 2 units, Replication = 6/2 = 3 Node 1 Node 2 Node 3 Split W2W2 Approach 1: Basic Sliding Window Partitioning 7 7

8 The Problem with Basic sliding window partitioning: Tuples belong to many windows depending on slide Excessive replication of tuples for each window Increase in output data volume of split The Problem with Basic sliding window partitioning: Tuples belong to many windows depending on slide Excessive replication of tuples for each window Increase in output data volume of split Approach 1: Basic Sliding Window Partitioning t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t W1W1 W3W3 W4W4 Node 1 Node 2 Node 3 Split W2W2 w = 6 units, s = 2 units, Replication = 6/2 = 3 8 8

9 Approach 2: Batch-based Partitioning Batch several windows together to reduce replication Batch-window: w b = w+(B-1)*s ; s b = B*s – All the tuples in a batch go to the same partition – Only tuples overlapping btw. batches are replicated Replication reduced to w b /s b partitions instead of w/s t 1 t 2 t 3 t 4 t 5 t 6 t 7 t 8 t 9 t w1w1 w2w2 w3w3 w4w4 w5w5 w6w6 w7w7 w8w8 B1B1 B2B2 w = 3, s = 1 B = 3 w b = 5, s b = 3 Replication : 3 5/3 9 9 Definitions: w : window-size s : slide-size B : batch-size

10 The Panes Technique Divide overlapping windows into disjoint panes Reduce cost by sub-aggregation and sharing Each window has w/gcd(w,s) panes of size gcd(w,s) Query is decomposed: pane-level (PLQ) & window-level (WLQ) queries w1w1 w2w2 w3w3 w4w4 w5w5... windows p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8... panes [1] No Pane, No Gain: Efficient Evaluation of Sliding Window Aggregates over Data Streams, SIGMOD Record 05 10

11 Approach 3: Pane-based Partitioning Mark each tuple with pane-id + win-id – Treat panes as tumbling window with w p = s p = gcd(w,s) Route tuples to a node based on pane-id Nodes compute PLQ with pane tuples Combine all PLQ results of a window to form WLQ – Need for an organized topology of nodes – We propose organization of nodes in a ring 11 Node 1 Node 2 Node 3 Split w = 6 units, s = 2 units

12 Window 1 Pane 1 Pane 3 65 Pane Window 2 Pane Pane 4 87 Pane 3 65 Window 3 Pane 6 Pane Pane Node 1 Node 3 Node 2 Merge … P9 P8 P3 P2 P1 … P1 1 P1 0 P5P4... … P1 3 P1 2 P7P6 R3R3 R 11 R9R9 R5R5 R 13 R7R7 W1W1 W2W2 W3W3 Input Source Split Ring-based Query Evaluation High amount of pipelined result sharing among nodes Organized communication topology 12 W = 6, S = 4 tuples P = GCD(6,4) = 2 tuples

13 Assignment of Windows and Panes to Nodes All pane results only arrive from predecessors Pane results sent to successor is only local panes – Each node is assigned n consecutive windows – Min n st. 13 Definitions: w w : win-size in # of panes s w : slide-size in # of panes

14 Flexible Result Merging [1] Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS Fully- ordered FIFO k-ordered: k-ordering constraint [1], certain disorder allowed Defn: For any tuple s, s arrives at least k+1 tuples after s st. s.A s.A k-ordered: k-ordering constraint [1], certain disorder allowed Defn: For any tuple s, s arrives at least k+1 tuples after s st. s.A s.A * k = 0

15 Experimental Evaluation Implementation of techniques in Borealis Workload adapted from Linear Road Benchmark – Slightly modified segment statistics queries – Basic aggregation functions with different window/slide ratios 15

16 Scalability of Split Operator Pane-partitioning: cost & tput constant regardless of overlap ratio Window & batch –partitioning: cost and tput as overlap Excessive replication in window-partitioning is reduced by batching 16 window-size/slide ratio (window overlap) Maximum input rate (tuples/second)

17 Scalability of Partitioning Techniques Pane-based scales close to linear until split is saturated – per tuple cost is constant Window & batch based: exteremely high replication – Split is not saturated, but scales very slowly 17 * w/s = overlap ratio = 100

18 Summary & Conclusions Pane-partitioning is the choice of partitioning – Avoids tuple replication – Incurs less overhead in split and aggregate – Scales close to linear 18 1) Window-based2) Batch-based3) Pane-based

19 Ongoing & Future Work Generalization of the framework Support for adaptivity during runtime Extending complexity of query plans Extending performance analysis & experiments 19

20 Thank You! 20


Download ppt "DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams."

Similar presentations


Ads by Google