The Performance of Big Data Workloads in Cloud Datacenters

The Performance of Big Data Workloads in Cloud Datacenters
Alexandru Uta, Alexandru Custura, Harry Obaseki Vrije Universiteit Amsterdam Massivizing Computer Systems June 11, 2018

Massivizing Computer Systems
Big Science Education for Everyone (Online) Business Services Online Gaming Grid Computing Datacenters Daily Life In massivizing computer systems we study the complex and non-trivial interactions Between applications and the large-scale ecosystems they are running on To elaborate on this problem, in this presentation I’m going to present The interaction between big data processing systems and applications and clouds

Convenient to use big data + cloud
I think we all agree it is very convenient to run big data workloads in the cloud Very easy to spawn a couple of VMs and run your big data framework Even easier, cloud providers offer big data frameworks as a service (e.g., Elastic MR in AWS) So, not need to worry anymore about managing a cluster and dealing with hardware And if it’s so easy, …

Wide variety of frameworks running together
What happens when everybody runs big data in the cloud? Large variety of big data apps and frameworks We have here the big data landscape of Matt Turck Most of these systems already run in clouds, either as services, or deployed independently by users As a result: clouds see very diverse workloads running next to each other Image courtesy of mattturck.com

Resource contention produces performance variability in clouds!
Co-location induces (resource) performance variability How does resource interference affect performance? Resource contention produces performance variability in clouds! Let’s assume a physical machine in a cloud It’s quite possible that it runs VMs that perform very different tasks (hadoop, spark, tensorflow) What does the interference entail? Resource contention produces perf. variability

Workload variability produces performance variability!
Co-location induces (resource) performance variability How does workload variability affect performance? Workload variability produces performance variability! Utilization Same setup, but let’s assume all apps use same resource With highly different utilization patterns

Cloud (resource) performance is highly variable!
Due to: Co-location Virtualization Workload variability Network congestion Affects all possible resources: Emergent behavior in large-scale ecosystems! Multiple reasons for perf variability in the cloud: co-location + wkld var. (as in the example, sharing the same hardware) Virtualization (adding a layer of abstraction between the physical hw and the VMs slows down Congestion is mostly about networking [overcommiting resources] The most impacted resources are the most limited ones [disk, netw] MSR Cambridge measured the netw bw variability on 8 different clouds, at infrastructure level the distributions are very different (var between clouds and within clouds) Ballani et al., SIGCOMM 2011 Iosup et al., CCGrid 2011

Convenient to use big data + cloud, but...
Variability entails: Poor performance predictions Poor scheduling decisions Over-provisioning Extra costs How to study performance variability? How to control the variability?

How to study performance variability?
Traditional performance analysis: (1) Trace analysis (2) Benchmarking (3) Performance modeling Current models, benchmarks do not consider resource variability! No study on resource performance variability and big data Variability within clouds and between clouds (performance portability issues) Resource variability leads to performance variability? Why is perf variability bad? Because we ideally want to be able to predict performance Example with the train to airport: we want to catch plane at 10am, we assume bus comes on time, or that security checks don’t take more than 1h The same stands in computing: we need performance predictability

A Framework for Studying Performance Variability
Fallback to empirical evaluation based on previous observations Controlled environment that emulates real-world variability scenarios Multiple classes of big data applications Statistical analysis and performance modeling to understand correlations 1 2 3 (1) Traces (2) Benchmark (3) Modeling

Benchmarking Performance Variability

Quantifying network variability impact on Big Data
Systematic study using A-H cloud bandwidth distributions Run a series of big data applications - So the solution is to take these 8 distributions Emulate them in our cluster Perform a systematic study using the distributions and a series of representative big data apps We do this on spark, which has become the de-facto platform for big data analytics

Cloud network bandwidth emulation
For each distribution: Cluster Vary bandwidth How do we do this? Take each distribution at a time Draw values from those distributions (25% in that interval, 25% in the other etc.) Assign these values to our nodes In the cluster

Big Data Workloads HiBench suite, MapReduce-style apps
6 real-world applications from various domains Each app having different resource usage Application Wordcount ++ -- Sort Terasort Naïve Bayes K-means PageRank We use the HiBench benchmark suite, from which we select 6 representative applications We characterize all these apps in terms of the resources they use ++ means high -- means low 0 means medium They cover a large spectrum of resource usage

Variable network = Variable Runtime (Terasort)
We ran an experiment using the terasort app on the 8 bw distributions And one exp on a 1Gbps homogeneous network setup We see that the homogeneous achieves little to no runtime variability Whereas, heterogeneous netw also gives runtime variability

Surprisingly, non-network-intensive Wordcount slowed down
We use the wordcount app, which is mostly cpu bound And we compute the slowdown in comparison with the no limitation 1gbps setup notice the slowdown given by two of the cloud setups Notice that bandwidth impacts performance quite a lot

Most apps are slowed down on real clouds
All apps get quite a significant slowdown, mostly on the A and D bandwidth distributions Surprinsing result: not only the network-intensive applications get to be slowed down As a consequence, network variability breaks performance predictability Moreover, network variability breaks performance portability (results on cloud A differ significantly from results on cloud F)

Take-home message Alexandru Uta
Network variability leads to high slowdown for big data in the cloud Network variability also affects performance portability Surprisingly, also apps not network-bound applications slow down Future work: In-depth statistical analysis Performance modeling tools Control through better scheduling Alexandru Uta Vrije Universiteit Amsterdam Massivizing Computer Systems

The Performance of Big Data Workloads in Cloud Datacenters

Similar presentations

Presentation on theme: "The Performance of Big Data Workloads in Cloud Datacenters"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Performance of Big Data Workloads in Cloud Datacenters

Similar presentations

Presentation on theme: "The Performance of Big Data Workloads in Cloud Datacenters"— Presentation transcript:

Similar presentations

About project

Feedback