Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams DMSN 2011 Cagri Balkesen & Nesime Tatbul.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Modal Versions of the Ontological Argument Based on Alvin Plantingas discussion in God, Freedom, and Evil (1974).
CPU Scheduling.
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Chapter 5: CPU Scheduling
Slide 1 Insert your own content. Slide 2 Insert your own content.
Dynamic Power Redistribution in Failure-Prone CMPs Paula Petrica, Jonathan A. Winter * and David H. Albonesi Cornell University *Google, Inc.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
An Alliance based Peering Scheme for P2P Live Media Streaming Darshan Purandare Ratan Guha University of Central Florida August 31, P2P-TV, Kyoto.
Scalable Routing In Delay Tolerant Networks
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Query optimisation.
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
NGS computation services: API's,
ZMQS ZMQS
Lehrstuhl Informatik III: Datenbanksysteme Grid-based Data Stream Processing in e-Science 1 Richard Kuntschke 1, Tobias Scholl 1, Sebastian Huber 1, Alfons.
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
OLAP Over Uncertain and Imprecise Data T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan (Wisconsin),
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Voronoi-based Geospatial Query Processing with MapReduce
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.
Ken C. K. Lee, Baihua Zheng, Huajing Li, Wang-Chien Lee VLDB 07 Approaching the Skyline in Z Order 1.
Minimum Weight Plastic Design For Steel-Frame Structures EN 131 Project By James Mahoney.
Taha Rafiq MMath Thesis Presentation 24/04/2013
Shredder GPU-Accelerated Incremental Storage and Computation
Chapter 1 Object Oriented Programming 1. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
25 July, 2014 Martijn v/d Horst, TU/e Computer Science, System Architecture and Networking 1 Martijn v/d Horst
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Randomized Distributed Decision Pierre Fraigniaud, Amos Korman, Merav Parter and David Peleg Yes No Yes No DISC 2012.
Shortest Violation Traces in Model Checking Based on Petri Net Unfoldings and SAT Victor Khomenko University of Newcastle upon Tyne Supported by IST project.
Devising Secure Sockets Layer-Based Distributed Systems: A Performance-Aware Approach Norman Lim, Shikharesh Majumdar,Vineet Srivastava, Dept. of Systems.
Opportunistic Multipath Forwarding in Publish/Subscribe Systems Reza Sherafat Kazemzadeh AND Hans-Arno Jacobsen Middleware Systems Research Group University.
Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Unit 1:Parallel Databases
KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
Week 1.
We will resume in: 25 Minutes.
1 Unit 1 Kinematics Chapter 1 Day
Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Copyright McGraw-Hill/Irwin 2002 Aggregate Demand Derivation of the AD Curve Changes in AD Determinants of AD Shifts in AE Schedule and Curve Aggregate.
Choosing an Order for Joins
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
Datorteknik F1 bild 1 Higher Level Parallelism The PRAM Model Vector Processors Flynn Classification Connection Machine CM-2 (SIMD) Communication Networks.
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Query Evaluation Techniques for Cluster Database Systems Andrey V. Lepikhov, Leonid B. Sokolinsky South Ural State University Russia 22 September 2010.
Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
A N I N - MEMORY F RAMEWORK FOR E XTENDED M AP R EDUCE 2011 Third IEEE International Conference on Coud Computing Technology and Science.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
The Dataflow Model.
Presentation transcript:

Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams DMSN 2011 Cagri Balkesen & Nesime Tatbul

Talk Outline Intro & Motivation Stream Partitioning Techniques Basic window partitioning Batch partitioning Pane-based partitioning Ring-based Query Evaluation Experimental Evaluation Conclusions & Future Work cagri.balkesen@inf.ethz.ch 2

Intro & Motivation DSMS cagri.balkesen@inf.ethz.ch 3

Architectural Overview Query Query Split stage Split node Query Merge stage Merge node input stream output stream Query nodes QoS: latency < 5 seconds disorder < 3 tuples Classical Split-Merge pattern from Parallel DBs Adjustable parallelism level, d QoS on max latency & order cagri.balkesen@inf.ethz.ch 4

Related Work: How to Partition? Content-sensitive FluX: Fault-tolerant, load balancing Exchange [1,2] Use group-by values from the query to partition Need explicit load-balancing due to skewed data Content-insensitive GDSM: Window-based parallelization (fixed-size tumbling wins) [3] Win-Distribute: Partition at window boundaries Win-Split: Partition each win into equi-length subwins The Problem: How to handle sliding windows? How to handle queries without group-by or a few groups? [1] Flux: An Adaptive Partitioning Operator for Continuous Query Systems, ICDE‘03 [2] Highly-Available, Fault-Tolerant, Parallel Dataflows, SIGMOD ‘04 [3] Customizable Parallel Execution of Scientific Stream Queries, VLDB ‘05 cagri.balkesen@inf.ethz.ch 5

Stream Partitioning Techniques

Approach 1: Basic Sliding Window Partitioning Independently processable chunking Window aware splitting of the stream Each window has an id & tuples are marked (first-winid, last-winid, is-win-closer) Tuples are replicated for each of their windows Node1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . . W1 Split Node2 W2 W3 W4 Node3 w = 6 units, s = 2 units, Replication = 6/2 = 3 cagri.balkesen@inf.ethz.ch 7

Approach 1: Basic Sliding Window Partitioning The Problem with Basic sliding window partitioning: Tuples belong to many windows depending on slide Excessive replication of tuples for each window Increase in output data volume of split Node1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . . W1 Split Node2 W2 W3 W4 Node3 w = 6 units, s = 2 units, Replication = 6/2 = 3 cagri.balkesen@inf.ethz.ch 8

Approach 2: Batch-based Partitioning Batch several windows together to reduce replication “Batch-window”: wb = w+(B-1)*s ; sb = B*s All the tuples in a batch go to the same partition Only tuples overlapping btw. batches are replicated Replication reduced to wb/sb partitions instead of w/s t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . . w1 w2 w3 w4 w5 w6 w7 w8 B1 B2 Definitions: w : window-size s : slide-size B : batch-size w = 3, s = 1 B = 3  wb = 5, sb = 3 Replication : 3  5/3 cagri.balkesen@inf.ethz.ch 9

The Panes Technique Divide overlapping windows into disjoint panes Reduce cost by sub-aggregation and sharing Each window has w/gcd(w,s) panes of size gcd(w,s) Query is decomposed: pane-level (PLQ) & window-level (WLQ) queries w1 w2 w3 w4 w5 . . . windows p1 p2 p3 p4 p5 p6 p7 p8 panes [1] No Pane, No Gain: Efficient Evaluation of Sliding Window Aggregates over Data Streams, SIGMOD Record ‘05 cagri.balkesen@inf.ethz.ch 10

Approach 3: Pane-based Partitioning Mark each tuple with pane-id + win-id Treat panes as tumbling window with wp = sp = gcd(w,s) Route tuples to a node based on pane-id Nodes compute PLQ with pane tuples Combine all PLQ results of a window to form WLQ Need for an organized topology of nodes We propose organization of nodes in a ring Node1 Node2 Node3 Split w = 6 units, s = 2 units cagri.balkesen@inf.ethz.ch 11

Ring-based Query Evaluation High amount of pipelined result sharing among nodes Organized communication topology Pane1 Pane2 4 3 Pane3 6 5 Window1 1 2 Input Source W = 6, S = 4 tuples P = GCD(6,4) = 2 tuples Pane3 6 5 Pane4 8 7 Pane5 10 9 Window2 … P9 P8 P3 P2 P1 … P11 P10 P5 P4 Window3 Pane6 Pane7 14 13 12 11 Pane5 10 9 Split … P13 P12 P7 P6 . . . R3 R9 Node2 Node1 W2 Merge W1 R11 R7 W3 R5 R13 Node3 cagri.balkesen@inf.ethz.ch 12

Assignment of Windows and Panes to Nodes All pane results only arrive from predecessors Pane results sent to successor is only local panes Each node is assigned n consecutive windows Min n st. Definitions: ww : win-size in # of panes sw : slide-size in # of panes cagri.balkesen@inf.ethz.ch 13

Flexible Result Merging FIFO Fully-ordered * k = 0 k-ordered: k-ordering constraint [1], certain disorder allowed Defn: For any tuple s, s’ arrives at least k+1 tuples after s st. s’.A ≥ s.A [1] Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS ‘04 cagri.balkesen@inf.ethz.ch 14

Experimental Evaluation Implementation of techniques in Borealis Workload adapted from Linear Road Benchmark Slightly modified segment statistics queries Basic aggregation functions with different window/slide ratios cagri.balkesen@inf.ethz.ch 15

Scalability of Split Operator Maximum input rate (tuples/second) window-size/slide ratio (window overlap) Pane-partitioning: cost & tput constant regardless of overlap ratio Window & batch –partitioning: cost ↑ and tput↓ as overlap ↑ Excessive replication in window-partitioning is reduced by batching cagri.balkesen@inf.ethz.ch 16

Scalability of Partitioning Techniques * w/s = overlap ratio = 100 Pane-based scales close to linear until split is saturated per tuple cost is constant Window & batch based: exteremely high replication Split is not saturated, but scales very slowly cagri.balkesen@inf.ethz.ch 17

Summary & Conclusions Pane-partitioning is the choice of partitioning 1) Window-based 2) Batch-based 3) Pane-based Pane-partitioning is the choice of partitioning Avoids tuple replication Incurs less overhead in split and aggregate Scales close to linear cagri.balkesen@inf.ethz.ch 18

Ongoing & Future Work Generalization of the framework Support for adaptivity during runtime Extending complexity of query plans Extending performance analysis & experiments cagri.balkesen@inf.ethz.ch 19

Thank You! cagri.balkesen@inf.ethz.ch 20