Presentation is loading. Please wait.

Presentation is loading. Please wait.

Very Large Scale Stream Processing inside Alibaba Alibaba.

Similar presentations


Presentation on theme: "Very Large Scale Stream Processing inside Alibaba Alibaba."— Presentation transcript:

1 Very Large Scale Stream Processing inside Alibaba longda@alibaba Alibaba

2 Current 1 Alibaba

3 Current NextFuture 3 团队介绍 Apache Storm PMC The First Storm Team in China Storm 0.5.1/0.5.4/0.6.0/0.6.2/0.7.0/0.7.1 Jstorm 0.7.1/0.9.0/0.9.1/0.9.2/0.9.3/0.9.3.1/0.9.4/0.9.4.1/0.9.5/0.9.5.1/0.9.6/0.9.6.1/0. 9.6.2/0.9.6.3/0.9.7/0.9.7.1/0.9.7.2/0.9.8/2.0.4/2.1.0 Our job – Do Everything: Application Development JStorm Platform Evolve JStorm/Storm Technology Support Maintain all Cluster

4 Current NextFuture 4 In Alibaba Everywhere 1600 machines, 70 K machines will deploy More 1000 Applications, 1500 topology 1.5 PB 2 Trillion Messages

5 Current NextFuture 5 Tlog/eagleeye 1000 Billion Message, 700 TB log, monitor 200K machines log. Rds Monitor200 TB Log CTU Security 200 Billion Message, monitor all of trade/user actions, 500w DB Monitor 200 Billion Message, 500w BI Realtime Monitor200 Billion Message, more than 2000 KPI. Alimama Anti Cheat100 Billion Message, Living Room11.11 Living Room, 12.12 Living Room, Spring Festival Living Room OthersAll kinds of monitor System Large Scale Application

6 Current NextFuture 6 Advanced Features User Side Functionality Stability Enhancement Performance Improvement

7 Current NextFuture 7 Stable Customer Feedback No one accident since the switch to Jstorm in the Alimama Cluster

8 Current NextFuture 8 Improve Stability Redesign Metric System Backpressure Resource Isolation Nimbus HA Topology Manager Redesign ZK usage Modify OS setting in RPM Advanced Feature – Improve Stability

9 Current NextFuture 9 Redesign Metric System Key point: Every Tuple Stage RT, including wait-time between stages, network cost. Avoid noise Pluginable Provide API to fetch all metrics Koala Simple Directly Display all metrics

10 Current NextFuture 10 New UI

11 Current NextFuture 11 Backpressure The paper about Heron is too simple to use The design is complicated Works well on our online system, 6 times than the normal

12 Current NextFuture 12 Resource Isolation Cluster Isolation, control through one unified porter –Koala In one cluster: Cgroup , share + limit CPU User-defined Scheduler, force topology run on special nodes.

13 Current NextFuture 13 Nimbus HA Nimbus HA, Run more than 20 months Stable

14 Current NextFuture 14 TopologyMaster Topology’s central control, move some jobs from Nimbus Backpressure coordinator Metrics collector/calculator Hearbeat collector

15 Current NextFuture 15 Redesign ZK usage No dynamic data stored on ZK, especially metrics and hearbeat ZK can’t support more than 400 Storm nodes. ZK can support 2000 Jstorm node, current in Alibaba, a lot of Jstorm ZK support 800 node.

16 Current NextFuture 16 RPM Setting Easy install Jstorm Modify Local temporary port range Ulimit Cronjob Environment viriable

17 Current NextFuture 17 Advanced Features – From User Side User Side Functionality User-Defined Scheduler User-Defined Log User-Defined Metrics Gently Shutdown Dynamic Expand/Reload/Restart Customized Memory Usage Different Netty Policy Classloader

18 Current NextFuture 18 User-Defined Scheduler Just Using API: Customize every worker’s CPU/Memory usage Customized topology assignment Assign Topology by used Bind several component into one worker ( such as spout/bolt ) Bind upstream/downstream component into one worker Force one component run on special machines Force one component’s task run on different machines Force topology run on special machines Force using old assignment

19 Current NextFuture 19 Used-Define Log Switch to user log configuration Switch between logback and log4j Redirect System.out to any file Add tags ( clustername/hostname/topologyname/workerid/taskid ) Dynamic change log setting: Enable/Disable debug, debug log sample rate

20 Current NextFuture 20 User-Defined Metrics Using java metrics Use-defined metrics Web UI display Using Alimonitor All metrics will be sent to Alimonitor Used defined Alarm Display history Koala System – JStorm porter All metrics will be sent to Koala System Display history User Defined Alarm

21 Current NextFuture 21 Gently shutdown Resolve problem: No data loss during shutdown All worker must be killed ZK is clean

22 Current NextFuture 22 Dynamic Expand/Reload/Restart Expand Don’t kill current worker, don’t impact current data flow Restart Reset all configuration Modify worker/component parallel Reload Reload binary Reload Configuration

23 Current NextFuture 23 Customized memory usage Customize Worker memory -- worker.memory.size Modify gc worker.gc.childopts Using user-define scheduler api Queue mode Capacity limited/unlimited

24 Current NextFuture 24 Advanced Netty Feature Sync /Async Mode Async mode blocking policy Async cache policy

25 Current NextFuture 25 classloader Resolve class conflict between Application and JStorm

26 Current NextFuture 26 6 Servers (24core/98G) 18 Spout/18 Bolt/18 Acker

27 Current NextFuture 27 Performance Improvement 1.Smart Batch Policy 2.Add one thread to deserialize Tuple in every task 3.Remove total send/receive stage 4.Separate send and receive operation in Spout 5. Fix several bug which leading to CPU empty run. 6.Reduce metrics system performance influence. 7.Tuning Acker code 8.Tuning GC

28 Current NextFuture 28 Archeture zookeeper ui nimbussupervisor worker task

29 Current Next Future 29 Merge into Storm Replace the clojure core

30 Current Next Future 30 Redesign our SQL Engine The SQL Engine is customized, no general

31 CurrentNext Future 31 1.A more powerful SQL Engine 2.A more powerful high level program framework 1.Easier to learn, to debug 2.Provide higher thoroughput 3.A high level scheduler 1.I don’t prefer to offline system – liking Hadoop/Spark/Yarn 2.I prefer to online system – Elastic Online Scheduler/Docker/virtual machine 3.More light What should Storm/Jstorm go Alibaba

32 Thanks ! Welcome join us : QQ/ 微信 : 32147704 Alibaba


Download ppt "Very Large Scale Stream Processing inside Alibaba Alibaba."

Similar presentations


Ads by Google