Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani

Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani
Chain: Operator Scheduling for Memory Minimization in Data Stream Systems Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani

Data Streams are Bursty
Data stream arrival rates are often: Fast Irregular Examples: Network traffic (IP, telephony, etc.) messages Web page access patterns Peak rate much higher than average rate 1-2 orders of magnitude Impractical to provision system for peak rate

Bursts Create Backlogs
Arrival rate temporarily exceeds throughput Queues of unprocessed elements build up Two options when memory fills up Page to disk Slows system, lowers throughput Admission control (i.e. drop packets) Data is lost, answer quality suffers Our goal: reduce memory needed for queueing

Outline Problem Definition Intuition Behind the Solution
Chain Scheduling Algorithm Near-Optimality of Chain Scheduling Open Problems

Problem Definition Inputs:
Data flow path(s) consisting of sequences of operators For each operator we know: Execution time (per block) Selectivity σ Σ Query #1 σ Query #2 Time: t4 Time: t2 Selectivity: s4 Selectivity: s2 Time: t1 Time: t3 Selectivity: s1 Selectivity: s3 Stream Stream

Progress charts σ (0,1) Opt1 Block Size (1,0.5) Opt2 (4,0.25) Opt3
(0,0) (6,0) Time

Problem Definition Inputs:
Data flow path(s) consisting of sequences of operators For each operator we know: Execution time (per block) Selectivity At each time step: Adversary may add blocks of tuples to initial input queue(s) Scheduler selects one block of tuples Selected block moves one step on its progress chart Objective: Minimize peak memory usage (sum of queue sizes) σ Σ Query #1 σ Query #2 Time: t4 Time: t2 Selectivity: s4 Selectivity: s2 Time: t1 Time: t3 Selectivity: s1 Selectivity: s3 Stream Stream

Main Solution Idea Fast, selective operators release memory quickly
Therefore, to minimize memory: Give preference to fast, selective operators Postpone slow, unselective operators Greedy algorithm: Operator priority = selectivity per unit time (si/ti) Always schedule the highest-priority available operator Greedy doesn’t quite work… A “good” operator that follows a “bad” operator rarely runs The “bad” operator doesn’t get scheduled Therefore there is no input available for the “good” operator

Bad Example for Greedy Tuples build up here Opt1 Opt2 Opt3 Block Size
Time

Chain Scheduling Algorithm
Opt1 Block Size Opt2 Lower envelope Opt3 Time

Chain Scheduling Algorithm
Calculate lower envelope Priority = |slope of lower envelope segment| Always schedule highest-priority available operator Break ties using operator order in pipeline Favor later operators

FIFO: Example (0,1) Opt1 (1,0.5) Block Size Opt2 (4,0.25) Opt3 (0,0)
(6,0) Time

Chain: Example (0,1) Opt1 Block Size (1,0.5) Opt2 (4,0.25)
Lower envelope Opt3 (0,0) (6,0) Time

Memory Usage

Chain is Near-Optimal Theorem:
Given a system with k queries, all operator selectivities ≤ 1, Let C(t) = # of blocks of memory used by Chain at time t. At every time t, any algorithm must use ≥ C(t) - k memory. Memory within constant factor of optimal offline algorithm Proof sketch: Greedy scheduling is optimal for convex progress charts “Best” operators are immediately available Lower envelope is convex Lower envelope closely approximates actual progress chart Proof on next slide…

Lemma: Lower Envelope is Close to Actual Progress Chart
At most one block in the middle of each lower envelope segment Due to tie-breaking rule (Lower envelope + 1) gives upper bound on actual memory usage Additive error of 1 block per progress chart Difference

Performance Comparison
spike in memory due to burst

Open Problems Avoid starvation Handle sharing between query plans
Introduce deadlines (maximum allowable latency) Handle sharing between query plans Shared computation Shared data (queues store pointers) Sliding-window joins between streams Time synchronization complicates scheduling Heuristics proposed in SIGMOD ‘03 paper

Thanks for Listening!

Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani

Similar presentations

Presentation on theme: "Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani

Similar presentations

Presentation on theme: "Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani"— Presentation transcript:

Similar presentations

About project

Feedback