Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al. Published conference: SIGMOD’13 Reporter: Ma Yuanwen

Introduction Stream – A sequence of tuples – Sensor network, stock trading system Query plan – A query is specified as a directed acyclic graph Distributed stream processing – A query is deployed on a set of nodes State management – Checkpoint, back up, restore, partition Scale out – Split the instance, when the instance is overload Fault tolerance – Recover from failures without affecting processing results

Outline Background – Problem Statement – System Model State Management – Query state – State Operations Scale out and fault tolerance – Fault tolerance scale out algorithm – System architecture – Bottleneck detection and scaling policy Evaluation Conclusions

Problem Statement Operators – Stateless operators (e.g. filter or map) and stateful operators(e.g. join or aggregation ) – Sliding window based state – Entire history based state Intra-query parallelism – Query graph and execution graph Fault tolerance – Passive standby strategy – Active standby strategy – Upstream backup strategy Report the words frequencies in the recent 1 hour about every 10 minutes

System model(1) nameAgesex Li Lei16male Han Mei15female Jim17male nameagesex Li Lei16male nameagesex Han Mei15female nameagesex Jim17male

System model(2)

Query state (1)

Query state (2)

State operations Operator state backup and restore – Checkpoint the state of an operator and backup the state to an upstream operator – Restore state for failure and scale out Operator state partitioning – When a stateful operator scales out, it’s processing state must be split across the new partitioned operators

Operator state backup and restore

Operator state partitioning

Scale out and Fault Tolerance Scale out – SPS partitions operator on-demand in response to bottleneck operators Fault Tolerance – If a node hosting an operator fails, the SPS must replace it with an operator on a new node Operator recover becomes special case of scale out, in which a failed Operator is scale out to a parallelization of 1

Fault-tolerant scale out algorithm

System architecture Query manager – Perform a mapping of query operators to nodes and maintain the execution graph Deployment manager – use the execution graph to initialize nodes, deploy operators, set up stream communication and start processing

Bottleneck detection and scaling policy

Goals and deployment of evaluation The goals of experimental evaluation are to investigate – The effectiveness of stateful operator scale out approach – The recovery time of the stateful recovery mechanism – The impact of state management approach on tuple processing latency Experiment deployment

Experiment data Linear road benchmark (LRB) – It models a road toll network – Queries: (1) Provide toll notifications to vehicles within 5s; (2) detect accidents within 5s; (3) answer balance account queries about paid toll amounts – The input rate for a single express-way (L=1) begins at 15 tuples/s and increase to 1700 tuples/s Wikipedia – A map/reduce-style top-k query That outputs every 30 seconds the ranking of the most visited Wikipedia language versions based on Wikipedia data traces

Dynamic scale out (1)

Dynamic scale out (2)

Failure recovery Word count

State management overhead

Conclusions Provide state management of stateful operators – Checkpoint, back up, restore, partition Present an integrated approach for scale out and failure recovery

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.

Similar presentations

Presentation on theme: "Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.

Similar presentations

Presentation on theme: "Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al."— Presentation transcript:

Similar presentations

About project

Feedback