Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program.

Similar presentations


Presentation on theme: "Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program."— Presentation transcript:

1 Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program Investissement d’avenir A Control Approach for Performance of Big Data Systems Mihaly Berekmeri, Sara Bouchenak, Damian Serrano, Bogdan Robu, Nicolas Marchand GIPSA - LIG - Grenoble University, France Pervasive Computing Systems, Persyval -Lab

2 Scientific days, June 16 th & 17 th, 2014 Big Data - Big Problems Problem: Vast amounts of data generated daily – Facebook: 3.2 x 10 9 likes and comments/day – CERN’s LHC: Up to 1 PB/s during experiments How do we store it? How do we process it? 2

3 Scientific days, June 16 th & 17 th, 2014 Solution: use the cloud – Cloud computing Fast, on demand assigning of a group of shared hardware resources to applications – “Unlimited” storage and computing capacity 3

4 Scientific days, June 16 th & 17 th, 2014 Challenges in the cloud Challenges: – Current cloud solutions don’t assure performance – Difficult to provision with a changing workload – Interference, concurrency problems: IO, network skews, failures, node heterogeneity  Assuring performance objectives poses considerable challenges 4

5 Scientific days, June 16 th & 17 th, 2014 Our approach Develop and apply control strategies to cloud software systems Control theory is everywhere: automotive, robotics, energy, microelectronics etc. “Except” in software systems 5

6 Scientific days, June 16 th & 17 th, 2014 Our approach But why would software need control theory? -> Dealing with the dynamics “The journey just as important as the destination” -> Mathematical tools face safely complexity, guarantee theoretically results, flexibility, robustness 6

7 Scientific days, June 16 th & 17 th, 2014 Challenges in building control theory for software systems No physics behind algorithms, applications  difficult to use classical techniques to build models  sensors can disappear with a system update Language difficulties: e.g. response time 7

8 Scientific days, June 16 th & 17 th, 2014 Objectives Develop a dynamical model for a distributed software framework dealing with BigData Build a test framework for control strategies Devise new control strategies that improve performance and reliability Consideration: – Implementations evolve rapidly  remain agnostic to implementation 8

9 Scientific days, June 16 th & 17 th, 2014 MapReduce Programming model introduced by J. Dean and S. Ghemawat (Google) in 2004 Wide range of applications: log analysis, data mining, web search engines, scientific computing, business intelligence,… Used by the biggest companies: Amazon, eBay, Facebook, LinkedIn, Twitter, Yahoo, Microsoft... Automatic features: data partitioning and replication, task scheduling, fault tolerance 9

10 Scientific days, June 16 th & 17 th, 2014 MapReduce 10

11 Scientific days, June 16 th & 17 th, 2014 State of the Art Existing models – static models not suitable for control using control theory – assume that jobs are isolated don’t deal with concurrent job executions, unlikely in real life scenarios For modeling, we’ve essentially started from 0. 11

12 Scientific days, June 16 th & 17 th, 2014 State of the Art Existing controls – Focus on static, off-line configuration not robust enough – Dedicated cluster or job priorities bad performance for low priority jobs – Job level controllers: off-line profile, online adjustment based on job progress large profile database, modifying schedulers 12

13 Scientific days, June 16 th & 17 th, 2014 Sensors & Actuators Problem: most metrics are not available for measurement or control Solution: we built all the online sensors and actuators – Measure: average performance, availability, throughput in the last time window – Control: number of computing nodes 13

14 Scientific days, June 16 th & 17 th, 2014 The test framework we developed 14

15 Scientific days, June 16 th & 17 th, 2014 MapReduce Benchmark Suite Developed by Sangroya et al. (2012) performance and dependability benchmark Advantages: realistic multiuser workloads comprehensive test data fault injection 15

16 Scientific days, June 16 th & 17 th, 2014 Experimental setup 16 ClusterCPUMemoryStorageNetwork 60 nodes Grid5000 4 cores/CPU Intel 2.53GHz 15GB298GBInfiniband 20G business intelligence benchmark consists of a decision support system for a wholesale supplier requests are typical business queries over a large amount of data (10GB )

17 Scientific days, June 16 th & 17 th, 2014 Modeling challenges & Insights Capturing system dynamics – we define a sliding window over time – take average over window Handle complexity – linearize around an operating point defined by a baseline number of nodes and clients – the point of full utilization is the set-point 17

18 Scientific days, June 16 th & 17 th, 2014 Model structure grey-box modeling technique predicts MapReduce cluster performance based on the number of nodes and the number of clients 18

19 Scientific days, June 16 th & 17 th, 2014 Identification both of the models were identified using step response identification (prediction error estimation) 19

20 Scientific days, June 16 th & 17 th, 2014 Control architecture Control challenges: – Large deadtime – Many point of concurrency and interference 20

21 Scientific days, June 16 th & 17 th, 2014 Baseline experiment 21

22 Scientific days, June 16 th & 17 th, 2014 Relaxed performance– Minimal resource 22

23 Scientific days, June 16 th & 17 th, 2014 Strict performance control 23

24 Scientific days, June 16 th & 17 th, 2014 Conclusions Results: – design, implementation and evaluation of the first dynamic model for MapReduce systems – development and successful implementation of a control framework for assuring service time constraints The control architecture is implemented on a real Hadoop cluster Published at IFAC World Congress 2014 and ComPAS 2014, presented at several international workshops 24

25 Scientific days, June 16 th & 17 th, 2014 Future Work Add other metrics to our model: throughput, availability, reliability Online identification techniques Minimize the number of changes in the control input -> event based control Test with several on-line cloud frameworks and more complex workload scenarios 25

26 Scientific days, June 16 th & 17 th, 2014 Thank you for your attention! Questions? 26


Download ppt "Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program."

Similar presentations


Ads by Google