Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Similar presentations


Presentation on theme: "Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent."— Presentation transcript:

1 Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent HC269@KENT.AC.UK

2 Entropy Theory Based Cloud Scheduler

3 Weather Prediction

4 Cloud Resource Performance Prediction Resource performance prediction is needed for better jobs scheduling in cloud It is hard or Impossible to get accurate prediction without gathering enough reliable information of the resource in short time Why?

5 The Need for Resource Performance Prediction in Cloud Scheduling Without knowing the performance of all the available resource, how do we know which resource is better fit for a job? But, most of the time, the resource performance is “Unpredictable”!

6 Unpredictable – Information Overload We can’t just count the number of the available CPU cores to predict the performance Resource performance are affected by many factors, e.g. CPU speed, CPU utilization, Memory usage, Disk I/O, Network I/O … Information overload as the Cloud scale up

7 Unpredictable – Scheduling Overhead It take times to analysis lots of related information Traditional cloud resource prediction method try to collect as many related information as they can, which will introduce a lot of overhead when the degree information increase

8 Unpredictable – Highly Dynamic Cloud change every second! The performance of the resource is highly dynamic Information gathered from highly dynamic resource is less reliable for using in performance prediction

9 Cloud Computing Evolutions

10 Cloud Analysis As a Service

11 Cloud Scheduling Challenge When meeting highly concurrent service requests, cloud scheduling become less reliable Increased average response time Increased request failure rate Increased variance of response time

12 Better Fulfill the New Demand? Scheduler in Production – May work better if it takes the resource performance into account? Yarn, Mesos, Borg, … Scheduler in Research – Is there a way to analysis less information to get better prediction with lower scheduling overhead?

13 Entropy Theory Entropy, as a measure of the degree of disorder in a system, can thus serve to measure a cloud resource’s reliability

14 Scheduling Optimization Resource performance prediction must take the following into consideration: (1) the characteristics and activity of the individual resource (2) the reliability of information gain from the resource.

15 Resource CPU Utilization CPU utilization, which represents how efficiently the operator thread uses the CPU throughout the jobs execution. This is highly relevant for making scheduling decision as it is directly related to the resource’s performance during run-time.

16 Predict the “Unpredictable” - RAV Resource Activity Vector (RAV) To obtain the RAV values, we run a resource monitor on each resource. The resource monitor captures the resource’s CPU utilization and updates the RAV with the CPU utilization difference every second.

17 Predict the “Unpredictable” - REL Resource Entropy Level (REL)

18 Predict the “Unpredictable” - RPR Resource Performance Ranking (RPR) The resource first calculate the RPR based on its CPU speed, current CPU utilization value and entropy level, and then sent the heartbeat to the scheduler with its ranking for making scheduling decision.

19 Spark on Entropy – Experiment Setup Apache Spark is one of the fastest growing big data projects in the history of the Apache Software Foundation. With its memory-oriented architecture, flexible processing libraries and ease-of-use, Spark has emerged as a leading distributed computing framework for real-time analytics.

20 Better performance and a higher degree of satisfying of QoS requirement

21 Improvement of the overall spark analysis server throughput

22 More capable for running Cloud Analysis Service that providing web service with QoS guarantee

23 Reduce a significant amount of failed requests compared with Fair Scheduler

24 Entropy Scheduler outperform native Fair Scheduler in respect of both efficiency and reliability.

25 CONTRIBUTIONS We show that scheduling low-latency parallel analysis jobs on the Cloud is a nontrivial problem. We identify its performance implications and characterize the elements needed to solve it. We introduce the concept of Entropy as a means to characterize and quantify the reliability requirements of the resource. We present a novel scheduler that performs three important tasks: (i) Capture the resource CPU utilization and entropy level, (ii) computes the ranking of resource according to both CPU utilization and entropy, and (iii) suggest a scheduling consider the ranking of resource.

26 CONCLUSIONS & FUTURE WORK Future work should further examine and expand the entropy method presented in this paper: a)We would like to load test the Entropy Scheduler with larger scale of cloud and more complex query workload to ensure it is robust to more complex situations b) We would also like to learn the idea from Omega, Mesos, Sparrow..., and then transform the Entropy Scheduler from centralized to decentralized to solve the bottleneck problem when meeting with high concurrent query workload; c) Also we are currently developing a simulation tools to benchmark all the scheduler supported by Spark, such as YARN and MESOS d)Finally, we plan to research on its application in other similar cloud engines, MapReduce, Hive, Dremel …

27


Download ppt "Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent."

Similar presentations


Ads by Google