How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)

How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing
Xueyan Li (Qunar) & Chunming Li (Garena)

Contents Introduction to Qunar Hotel Data Services and Data processing platform Part 01 Part 02 Qunar Hotel Data Acceleration with Alluxio Qunar Hotel Data Use Alluxio to enable data sharing between Batch / Streaming Part 03

Part 01 Introduction to Qunar Hotel Data Services and data processing platform

Hotel price data Price Data 4000QPS 500G 4T Sensitive data
After compression Raw message Daily data volume

Use Storm to extract data and convert to protobuf
点此添加标题 Price Center Data Landing ORC compression Use Storm to extract data and convert to protobuf Use Spark Streaming run the batch

Application of data 1 01 2 02 03 3 04 4 Analyst/PM/Operations
Downstream application Direct queries 03 3 04 Price center Monitor 4 Real-time / off-line model training

System architecture Uniform use Marathon + Docker mode

Upgrade to Spark 2.0.x After

Part 02 Qunar Hotel Data Acceleration with Alluxio

Receiver balance problem
Conclusion: Each Executor runs only one Receiver for the highest performance.

Basic tuning spark Increase streaming duration
The longer the time, the more data each batch receives, the greater the storage requirements. Kafka Partition = Spark Receiver Using Spark high-level API，in order to make full use of resources, the number of partitions must be equal to the number of Receivers. Increase block size Increased block interval will generate larger blocks, and it will make the file orc less, but the higher the memory requirements. But the processing performance will be improved. Modify Mesos resource scheduling Spark has a node local problem, there must be a reasonable scheduling program to make sure the resource is not wasted.

There are problems Large amount of data
Day data is too large, hive SQL and Spark batch job can not run well Large amount of data Can not be real-time data analysis, hot data set, will only use the day or the day before for the results. Real-time If you do not use checkpoint, data will be lost when the task fails or restarts. Checkpoint

Why use Alluxio? Save the data cache Garbage Collection 02 01
When a Spark executor fails to exit, the calculated data will not be lost due to the "drifting" of the executor. Spark data on rdd can reduce GC overhead and save time. Data sharing 03 04 Tiered storage Zeppelin, Flink, Spark, MapReduce, can share data at memory-speed. Management of the local storage media, including memory, SSD and disk, constitute a hierarchical storage layer.

Tiered storage separates cold and hot data
Most of the data in a hotspot will only be used for the day's results. We deployed Alluxio Worker on each compute node and managed the local storage media, including memory, SSDs and disks, to form a hierarchical storage tier. Each node upstream computing related data will be stored in the local as much as possible, to avoid consumption of network resources. At the same time, Alluxio itself provides LRU, LFU and other efficient replacement strategy to ensure that the hot data is located in the faster memory layer to improve the data access rate; even the cold data is stored in the local disk, avoiding having to access remote HDFS storage cluster. MEM SSD HDD

System data flow

Average processing time

Average processing message

Other benefits of Alluxio
Web UI Web UI and CLI Simple and easy to use API Alluxio's command-line tools and web UI facilitate validation and debugging during the development process, shortening the overall system development cycle. Alluxio provides a set of easy-to-use API, its native API is a set of similar java.io file input and output interface, the use of its development does not require complex user learning curve. For example, we use Chronos early in the morning through the Alluxio loadufs command to load the day before the MapReduce calculated by the good data to Alluxio, so that subsequent operations can directly read these files.

Part 03 Qunar Hotel Data Use Alluxio to enable data sharing between Batch / Streaming

Spark/Zeppelin on Alluxio
Tool chain HMM We use Zeppelin as a tool for development, debugging, and analysis. LR 1 2 Computational framework interconnection Reduce development costs Directly write code to run on the results, the results can be directly attached to the Spark code. In addition to Spark, Flink, or other computational frameworks can also use the computed data. SVM CRT 3 4 Cross-machine room synchronization data Memory speed increase The downstream application takes the same calculated data directly from the memory for machine learning. EM Asynchronous synchronization data acceleration can be used when writing as a bottleneck.

Unified Namespace For the upper application and computing framework transparent unified namespace HDFS and Alluxio own storage space for unified management To avoid the complex input and output logic Alluxio mount function to manage the remote HDFS storage cluster In Qunar we use the account name as the data directory HDFS, we use swift to store Spark, Storm, Flink program jar package, for the checkpoint we use checkpoint/appcode as the path.

Calculation framework
Unified Namespace Calculation framework Storage framework

The benefits of data sharing
Spark MLLib Part of the intermediate results can be shared between different Spark MLLib pipelines, greatly improving computational efficiency. Spark SQL Spark SQL can provide partial query results directly to downstream applications, improving efficiency.

Summary Pricing system Alluxio Data sharing Spark checkpoint
Alluxio Data synchronization Spark block

How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)

Similar presentations

Presentation on theme: "How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)

Similar presentations

Presentation on theme: "How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)"— Presentation transcript:

Similar presentations

About project

Feedback