Very Large Scale Stream Processing inside Alibaba Alibaba.

Very Large Scale Stream Processing inside Alibaba longda@alibaba Alibaba

Current 1 Alibaba

Current NextFuture 3 团队介绍 Apache Storm PMC The First Storm Team in China Storm 0.5.1/0.5.4/0.6.0/0.6.2/0.7.0/0.7.1 Jstorm 0.7.1/0.9.0/0.9.1/0.9.2/0.9.3/0.9.3.1/0.9.4/0.9.4.1/0.9.5/0.9.5.1/0.9.6/0.9.6.1/0. 9.6.2/0.9.6.3/0.9.7/0.9.7.1/0.9.7.2/0.9.8/2.0.4/2.1.0 Our job – Do Everything: Application Development JStorm Platform Evolve JStorm/Storm Technology Support Maintain all Cluster

Current NextFuture 4 In Alibaba Everywhere 1600 machines, 70 K machines will deploy More 1000 Applications, 1500 topology 1.5 PB 2 Trillion Messages

Current NextFuture 5 Tlog/eagleeye 1000 Billion Message, 700 TB log, monitor 200K machines log. Rds Monitor200 TB Log CTU Security 200 Billion Message, monitor all of trade/user actions, 500w DB Monitor 200 Billion Message, 500w BI Realtime Monitor200 Billion Message, more than 2000 KPI. Alimama Anti Cheat100 Billion Message, Living Room11.11 Living Room, 12.12 Living Room, Spring Festival Living Room OthersAll kinds of monitor System Large Scale Application

Current NextFuture 6 Advanced Features User Side Functionality Stability Enhancement Performance Improvement

Current NextFuture 7 Stable Customer Feedback No one accident since the switch to Jstorm in the Alimama Cluster

Current NextFuture 8 Improve Stability Redesign Metric System Backpressure Resource Isolation Nimbus HA Topology Manager Redesign ZK usage Modify OS setting in RPM Advanced Feature – Improve Stability

Current NextFuture 9 Redesign Metric System Key point: Every Tuple Stage RT, including wait-time between stages, network cost. Avoid noise Pluginable Provide API to fetch all metrics Koala Simple Directly Display all metrics

Current NextFuture 10 New UI

Current NextFuture 11 Backpressure The paper about Heron is too simple to use The design is complicated Works well on our online system, 6 times than the normal

Current NextFuture 12 Resource Isolation Cluster Isolation, control through one unified porter –Koala In one cluster: Cgroup ， share + limit CPU User-defined Scheduler, force topology run on special nodes.

Current NextFuture 13 Nimbus HA Nimbus HA, Run more than 20 months Stable

Current NextFuture 14 TopologyMaster Topology’s central control, move some jobs from Nimbus Backpressure coordinator Metrics collector/calculator Hearbeat collector

Current NextFuture 15 Redesign ZK usage No dynamic data stored on ZK, especially metrics and hearbeat ZK can’t support more than 400 Storm nodes. ZK can support 2000 Jstorm node, current in Alibaba, a lot of Jstorm ZK support 800 node.

Current NextFuture 16 RPM Setting Easy install Jstorm Modify Local temporary port range Ulimit Cronjob Environment viriable

Current NextFuture 17 Advanced Features – From User Side User Side Functionality User-Defined Scheduler User-Defined Log User-Defined Metrics Gently Shutdown Dynamic Expand/Reload/Restart Customized Memory Usage Different Netty Policy Classloader

Current NextFuture 18 User-Defined Scheduler Just Using API: Customize every worker’s CPU/Memory usage Customized topology assignment Assign Topology by used Bind several component into one worker （ such as spout/bolt ） Bind upstream/downstream component into one worker Force one component run on special machines Force one component’s task run on different machines Force topology run on special machines Force using old assignment

Current NextFuture 19 Used-Define Log Switch to user log configuration Switch between logback and log4j Redirect System.out to any file Add tags （ clustername/hostname/topologyname/workerid/taskid ） Dynamic change log setting: Enable/Disable debug, debug log sample rate

Current NextFuture 20 User-Defined Metrics Using java metrics Use-defined metrics Web UI display Using Alimonitor All metrics will be sent to Alimonitor Used defined Alarm Display history Koala System – JStorm porter All metrics will be sent to Koala System Display history User Defined Alarm

Current NextFuture 21 Gently shutdown Resolve problem: No data loss during shutdown All worker must be killed ZK is clean

Current NextFuture 22 Dynamic Expand/Reload/Restart Expand Don’t kill current worker, don’t impact current data flow Restart Reset all configuration Modify worker/component parallel Reload Reload binary Reload Configuration

Current NextFuture 23 Customized memory usage Customize Worker memory -- worker.memory.size Modify gc worker.gc.childopts Using user-define scheduler api Queue mode Capacity limited/unlimited

Current NextFuture 24 Advanced Netty Feature Sync /Async Mode Async mode blocking policy Async cache policy

Current NextFuture 25 classloader Resolve class conflict between Application and JStorm

Current NextFuture 26 6 Servers (24core/98G) 18 Spout/18 Bolt/18 Acker

Current NextFuture 27 Performance Improvement 1.Smart Batch Policy 2.Add one thread to deserialize Tuple in every task 3.Remove total send/receive stage 4.Separate send and receive operation in Spout 5. Fix several bug which leading to CPU empty run. 6.Reduce metrics system performance influence. 7.Tuning Acker code 8.Tuning GC

Current NextFuture 28 Archeture zookeeper ui nimbussupervisor worker task

Current Next Future 29 Merge into Storm Replace the clojure core

Current Next Future 30 Redesign our SQL Engine The SQL Engine is customized, no general

CurrentNext Future 31 1.A more powerful SQL Engine 2.A more powerful high level program framework 1.Easier to learn, to debug 2.Provide higher thoroughput 3.A high level scheduler 1.I don’t prefer to offline system – liking Hadoop/Spark/Yarn 2.I prefer to online system – Elastic Online Scheduler/Docker/virtual machine 3.More light What should Storm/Jstorm go Alibaba

Thanks ！ Welcome join us ： QQ/ 微信 : 32147704 Alibaba

Very Large Scale Stream Processing inside Alibaba Alibaba.

Similar presentations

Presentation on theme: "Very Large Scale Stream Processing inside Alibaba Alibaba."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Very Large Scale Stream Processing inside Alibaba Alibaba.

Similar presentations

Presentation on theme: "Very Large Scale Stream Processing inside Alibaba Alibaba."— Presentation transcript:

Similar presentations

About project

Feedback