Download presentation
Presentation is loading. Please wait.
Published byKatrina Shaw Modified over 6 years ago
1
From Rivulets to Rivers: Elastic Stream Processing in Heron
Bill Graham , Twitter Ashvin Agrawal, Microsoft Avrilia Floratou, Microsoft
2
Prediction is very difficult, especially if it’s about the future.
Nils Bohr We cannot direct the wind, but we can adjust the sails. Dolly Parton
3
Outline Heron Overview Elastic Scaling Challenges
Current Implementation Work in Progress – Auto-scaling
4
A realtime, distributed, fault-tolerant stream processing engine.
Heron A realtime, distributed, fault-tolerant stream processing engine.
5
About Heron Developed by Twitter in 2014 Open sourced in May 2016
Storm API compatible Isolation at all levels: Topology Container Task (process-based) At least once, at most once semantics Backpressure Low resource overhead (< 10%)
6
Logical Topology Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3
7
Physical Execution Bolt 1 Spout 1 Bolt 4 Bolt 2 Spout 2 Bolt 5 Bolt 3
8
Packing Plan How to distribute instances onto containers?
IPacking.pack()
9
Topology Submission Containers Allocated Processes Initialize
Instances Register Stream Manager Registers S1 S2 B3 S1 S2 B3 S1 B2 B3 Data Flows B4 B5 B6 B4 B5 B6 B4 B5 B6 heron submit Heron Client Stream Manager Stream Manager Stream Manager PackingPlan Heron Scheduler Container 0 Topology Master
10
Data Rate Variations
11
Parallelism Challenges
Anticipating component parallelism is difficult Changing parallelism is costly - O(hour) code change, review, merge, build, kill, submit Tuning for load spikes or valleys is manual - O(day) Under-provisioning leads to back pressure leads to support costs Over-provisioning is the norm
12
Over-provisioning CPU Requested CPU Used 40% 25%
13
Elastic Scaling Opportunity
Reduce administration cost Reduce support cost Reduce hardware cost Provide better SLA
14
Ordinary Topology Management Process
User Tasks Heron System Tasks Releases Resources Kill Topology Submit Topology Create Packing Acquire Resources Monitor / Estimate Build State Start Topology Install Topology Time Consuming Tasks
15
Low-cost Topology “update”
2 2 3 4 4 3
16
Optimized Topology Scale-up Process
User Tasks Heron System Tasks Kill Topology Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology
17
heron “update” … Aims to Maintain Uniform Component Distribution
$ heron update my_cluster/user/dev MyTopology \ --component-parallelism=bolt1:20 \ --component-parallelism=bolt2:40 Available in Aims to Maintain Uniform Component Distribution Execution Time O(mins) Aggressively Prunes Containers Minimizes Disruption Customizable Through IRepacking.repack()
18
Current Limitations Automated state transition not yet supported
Component scaling event notification : IUpdatable.update() Example: KafkaSpout queue partition mappings Fields group routing might change Workaround: pause topology > cache flush interval before scaling Algorithmic Auto-Scaling Modifying an existing packing plan can be more complex than creating one from scratch
19
Algorithmic Auto-Scaling …
User Tasks User Tasks Heron System Tasks Heron System Tasks Submit Topology Create Packing Acquire Resources Update Topology Pause Topology Add / Reduce Resources Un-Pause Topology Prepare Components Monitor / Estimate Build State Start Topology Install Topology
20
Auto-Scaling Heron uses Dhalion to adjust to external shocks.
Dhalion is a framework that provides self-regulating capabilities to Heron and will be open-sourced in the near future. Dhalion periodically observes the state of the topology and determines whether resources should be scaled up or down. Heron should automatically identify variations in the incoming load and react to them.
21
Using Dhalion to Auto-Scale
Dhalion’s scales up and down the topology resources as needed while still keeping the topology in a steady state where backpressure is not observed Resource Overprovisioning Diagnoser Pending Packets Detector Bolt Scale Down Resolver Symptoms Resource Underprovisioning Diagnoser Diagnosis Bolt Scale Up Resolver Resolver Invocation Metrics Backpressure Detector Data Skew Diagnoser Data Skew Resolver Processing Rate Skew Detector Restart Instances Resolver Slow Instances Diagnoser Symptom Detection Diagnosis Generation Resolution
22
Initial Results Dhalion is able to adjust the topology resources on-the-fly when workload spikes occur. Our policy eventually reaches a healthy state where backpressure is not observed and the overall throughput is maximized.
23
Future Plans Use Dhalion to enforce throughput and latency SLOs
and to auto-tune Heron topologies. Open-source Dhalion and the auto-scaling policy as part of Heron. Combine scaling with stateful stream processing.
24
Get Involved
25
Up Next Anomaly detection in real-time data streams using Heron Arun Kejariwal, Machine Zone Karthik Ramasamy, Twitter
26
Questions?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.