Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX-0025-01) funded by the French program.

Slides:



Advertisements
Similar presentations
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Advertisements

Large Scale Computing Systems
SLA-Oriented Resource Provisioning for Cloud Computing
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Programming Models for IoT and Streaming Data IC2E Internet of Things Panel Judy Qiu Indiana University.
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Failure Avoidance through Fault Prediction Based on Synthetic Transactions Mohammed Shatnawi 1, 2 Matei Ripeanu 2 1 – Microsoft Online Ads, Microsoft Corporation.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
DAvinCi: A Cloud Computing Framework for Service Robots
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人:碩資工一甲 董耀文.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Cloud MapReduce: A MapReduce Implementation on top of a Cloud Operation System 江嘉福 徐光成 章博遠 2011, 11th IEEE/ACM International.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Cluster Reliability Project ISIS Vanderbilt University.
Profile Driven Component Placement for Cluster-based Online Services Christopher Stewart (University of Rochester) Kai Shen (University of Rochester) Sandhya.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
1 A Framework for Data-Intensive Computing with Cloud Bursting Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The Ohio.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Introduction. Readings r Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 3 m Note: All figures from this book.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Matchmaking: A New MapReduce Scheduling Technique
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
EuroSys Doctoral Workshop 2011 Resource Provisioning of Web Applications in Heterogeneous Cloud Jiang Dejun Supervisor: Guillaume Pierre
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
BIG DATA BIGDATA, collection of large and complex data sets difficult to process using on-hand database tools.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
An Open Source Project Commonly Used for Processing Big Data Sets
Applying Control Theory to Stream Processing Systems
Edinburgh Napier University
15-826: Multimedia Databases and Data Mining
Maximum Availability Architecture Enterprise Technology Centre.
PA an Coordinated Memory Caching for Parallel Jobs
SpatialHadoop: A MapReduce Framework for Spatial Data
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
MapReduce: Data Distribution for Reduce
Software models - Software Architecture Design Patterns
Big DATA.
Presentation transcript:

Scientific days, June 16 th & 17 th, 2014 This work has been partially supported by the LabEx PERSYVAL-Lab (ANR-11-LABX ) funded by the French program Investissement d’avenir A Control Approach for Performance of Big Data Systems Mihaly Berekmeri, Sara Bouchenak, Damian Serrano, Bogdan Robu, Nicolas Marchand GIPSA - LIG - Grenoble University, France Pervasive Computing Systems, Persyval -Lab

Scientific days, June 16 th & 17 th, 2014 Big Data - Big Problems Problem: Vast amounts of data generated daily – Facebook: 3.2 x 10 9 likes and comments/day – CERN’s LHC: Up to 1 PB/s during experiments How do we store it? How do we process it? 2

Scientific days, June 16 th & 17 th, 2014 Solution: use the cloud – Cloud computing Fast, on demand assigning of a group of shared hardware resources to applications – “Unlimited” storage and computing capacity 3

Scientific days, June 16 th & 17 th, 2014 Challenges in the cloud Challenges: – Current cloud solutions don’t assure performance – Difficult to provision with a changing workload – Interference, concurrency problems: IO, network skews, failures, node heterogeneity  Assuring performance objectives poses considerable challenges 4

Scientific days, June 16 th & 17 th, 2014 Our approach Develop and apply control strategies to cloud software systems Control theory is everywhere: automotive, robotics, energy, microelectronics etc. “Except” in software systems 5

Scientific days, June 16 th & 17 th, 2014 Our approach But why would software need control theory? -> Dealing with the dynamics “The journey just as important as the destination” -> Mathematical tools face safely complexity, guarantee theoretically results, flexibility, robustness 6

Scientific days, June 16 th & 17 th, 2014 Challenges in building control theory for software systems No physics behind algorithms, applications  difficult to use classical techniques to build models  sensors can disappear with a system update Language difficulties: e.g. response time 7

Scientific days, June 16 th & 17 th, 2014 Objectives Develop a dynamical model for a distributed software framework dealing with BigData Build a test framework for control strategies Devise new control strategies that improve performance and reliability Consideration: – Implementations evolve rapidly  remain agnostic to implementation 8

Scientific days, June 16 th & 17 th, 2014 MapReduce Programming model introduced by J. Dean and S. Ghemawat (Google) in 2004 Wide range of applications: log analysis, data mining, web search engines, scientific computing, business intelligence,… Used by the biggest companies: Amazon, eBay, Facebook, LinkedIn, Twitter, Yahoo, Microsoft... Automatic features: data partitioning and replication, task scheduling, fault tolerance 9

Scientific days, June 16 th & 17 th, 2014 MapReduce 10

Scientific days, June 16 th & 17 th, 2014 State of the Art Existing models – static models not suitable for control using control theory – assume that jobs are isolated don’t deal with concurrent job executions, unlikely in real life scenarios For modeling, we’ve essentially started from 0. 11

Scientific days, June 16 th & 17 th, 2014 State of the Art Existing controls – Focus on static, off-line configuration not robust enough – Dedicated cluster or job priorities bad performance for low priority jobs – Job level controllers: off-line profile, online adjustment based on job progress large profile database, modifying schedulers 12

Scientific days, June 16 th & 17 th, 2014 Sensors & Actuators Problem: most metrics are not available for measurement or control Solution: we built all the online sensors and actuators – Measure: average performance, availability, throughput in the last time window – Control: number of computing nodes 13

Scientific days, June 16 th & 17 th, 2014 The test framework we developed 14

Scientific days, June 16 th & 17 th, 2014 MapReduce Benchmark Suite Developed by Sangroya et al. (2012) performance and dependability benchmark Advantages: realistic multiuser workloads comprehensive test data fault injection 15

Scientific days, June 16 th & 17 th, 2014 Experimental setup 16 ClusterCPUMemoryStorageNetwork 60 nodes Grid cores/CPU Intel 2.53GHz 15GB298GBInfiniband 20G business intelligence benchmark consists of a decision support system for a wholesale supplier requests are typical business queries over a large amount of data (10GB )

Scientific days, June 16 th & 17 th, 2014 Modeling challenges & Insights Capturing system dynamics – we define a sliding window over time – take average over window Handle complexity – linearize around an operating point defined by a baseline number of nodes and clients – the point of full utilization is the set-point 17

Scientific days, June 16 th & 17 th, 2014 Model structure grey-box modeling technique predicts MapReduce cluster performance based on the number of nodes and the number of clients 18

Scientific days, June 16 th & 17 th, 2014 Identification both of the models were identified using step response identification (prediction error estimation) 19

Scientific days, June 16 th & 17 th, 2014 Control architecture Control challenges: – Large deadtime – Many point of concurrency and interference 20

Scientific days, June 16 th & 17 th, 2014 Baseline experiment 21

Scientific days, June 16 th & 17 th, 2014 Relaxed performance– Minimal resource 22

Scientific days, June 16 th & 17 th, 2014 Strict performance control 23

Scientific days, June 16 th & 17 th, 2014 Conclusions Results: – design, implementation and evaluation of the first dynamic model for MapReduce systems – development and successful implementation of a control framework for assuring service time constraints The control architecture is implemented on a real Hadoop cluster Published at IFAC World Congress 2014 and ComPAS 2014, presented at several international workshops 24

Scientific days, June 16 th & 17 th, 2014 Future Work Add other metrics to our model: throughput, availability, reliability Online identification techniques Minimize the number of changes in the control input -> event based control Test with several on-line cloud frameworks and more complex workload scenarios 25

Scientific days, June 16 th & 17 th, 2014 Thank you for your attention! Questions? 26