SAN FRANCISCO, CA, USA Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can HankendiAyse K. Coskun Boston.

Slides:

Advertisements

Similar presentations

Virtual Hierarchies to Support Server Consolidation Michael Marty and Mark Hill University of Wisconsin - Madison.

Advertisements

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.

1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.

Locality-Aware Dynamic VM Reconfiguration on MapReduce Clouds Jongse Park, Daewoo Lee, Bokyeong Kim, Jaehyuk Huh, Seungryoul Maeng.

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

International Symposium on Low Power Electronics and Design Dynamic Workload Characterization for Power Efficient Scheduling on CMP Systems 1 Gaurav Dhiman,

VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.

Scalability-Based Manycore Partitioning Hiroshi Sasaki Kyushu University Koji Inoue Kyushu University Teruo Tanimoto The University of Tokyo Hiroshi Nakamura.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.

NUMA Tuning for Java Server Applications Mustafa M. Tikir.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy

Distributed Systems Meet Economics: Pricing In The Cloud Authors: Hongyi Wang, Qingfeng Jing, Rishan Chen, Bingsheng He, Zhengping He, Lidong Zhou Presenter:

By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and

New Challenges in Cloud Datacenter Monitoring and Management

VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.

23 September 2004 Evaluating Adaptive Middleware Load Balancing Strategies for Middleware Systems Department of Electrical Engineering & Computer Science.

Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.

Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.

Supporting GPU Sharing in Cloud Environments with a Transparent

How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer

Cloud Computing Energy efficient cloud computing Keke Chen.

Virtual Machine Scheduling for Parallel Soft Real-Time Applications

Kinshuk Govil, Dan Teodosiu*, Yongqiang Huang, and Mendel Rosenblum

Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.

Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee and Margaret Martonosi.

Ramazan Bitirgen, Engin Ipek and Jose F.Martinez MICRO’08 Presented by PAK,EUNJI Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors.

Challenges towards Elastic Power Management in Internet Data Center.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)

Virtualization Part 2 – VMware. Virtualization 2 CS5204 – Operating Systems VMware: binary translation Hypervisor VMM Base Functionality (e.g. scheduling)

1 of 20 Phase-based Cache Reconfiguration for a Highly-Configurable Two-Level Cache Hierarchy This work was supported by the U.S. National Science Foundation.

Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.

An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.

Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

1 Process Scheduling in Multiprocessor and Multithreaded Systems Matt Davis CS5354/7/2003.

Energy Management in Virtualized Environments Gaurav Dhiman, Giacomo Marchetti, Raid Ayoub, Tajana Simunic Rosing (CSE-UCSD) Inside Xen Hypervisor Online.

A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,

Micro-sliced Virtual Processors to Hide the Effect of Discontinuous CPU Availability for Consolidated Systems Jeongseob Ahn, Chang Hyun Park, and Jaehyuk.

Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.

Embedded System Lab 김해천 Thread and Memory Placement on NUMA Systems: Asymmetry Matters.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,

Full and Para Virtualization

BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs Socrates Demetriades and Sangyeun Cho Computer Frontiers.

DOE Network PI Meeting 2005 Runtime Data Management for Data-Intensive Scientific Applications Xiaosong Ma NC State University Joint Faculty: Oak Ridge.

Exploiting Unbalanced Thread Scheduling for Energy and Performance on a CMP of SMT Processors Authors: Matthew DeVuyst, Rakesh Kumar, and Dean M. Tullsen.

Sunpyo Hong, Hyesoon Kim

© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Understanding Virtualization Overhead.

REMINDER Check in on the COLLABORATE mobile app Best Practices for Oracle on VMware - Deep Dive Darryl Smith Chief Database Architect Distinguished Engineer.

Is Virtualization ready for End-to-End Application Performance?

Adaptive Cache Partitioning on a Composite Core

Computing Resource Allocation and Scheduling in A Data Center

Bank-aware Dynamic Cache Partitioning for Multicore Architectures

Department of Computer Science University of California, Santa Barbara

Haishan Zhu, Mattan Erez

Jeongseob Ahn*, Chang Hyun Park‡, Taekyung Heo‡, Jaehyuk Huh‡

Hardware Counter Driven On-the-Fly Request Signatures

Request Behavior Variations

Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,

Department of Computer Science University of California, Santa Barbara

A workload-aware energy model for VM migration

Presentation transcript:

SAN FRANCISCO, CA, USA Adaptive Energy-efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems Can HankendiAyse K. Coskun Boston University Electrical and Computer Engineering Department This project has been partially funded by:

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Energy Efficiency in Computing Clusters Energy-related costs are among the biggest contributors to the total cost of ownership. Consolidating multiple workloads on the same physical node improves energy efficiency. 2 (Source: International Data Corporation (IDC), 2009)

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Multi-threaded Applications in the Cloud HPC applications are expected to shift towards cloud resources. Resource allocation decisions significantly affect the energy efficiency of server nodes. Energy efficiency is a function of application characteristics. 3

Computing in Heterogeneous, Autonomous 'N' Goal-oriented EnvironmentsOutline Background Methodology Adaptive Resource Sharing Results Conclusions 4

Computing in Heterogeneous, Autonomous 'N' Goal-oriented EnvironmentsBackground Cluster-level VM Management -Consolidation policies across server nodes -VM migration techniques [Srikantaiah, HotPower’08] [Bonvin, CCGrid’11] Node-level Management Recent Co-scheduling policies -Co-scheduling contrasting workloads -Balancing performance events across nodes -Cache misses -IPC -Bus accesses [Dhiman, ISLPED’09] [Bhadauria, ICS’10] -Co-scheduling based on thread communication -Identifying best thread mixes to co-schedule [Frachtenberg, TPDS’05] [McGregor, IPDPS’05] 5

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Virtualized System Setup 12-core AMD Magny Cours Server  2x 6-core dies attached side by side in the same package  Private L1 and L2-caches for each core  6 MB shared L3-cache for each 6-core die 6 Virtualized through VMware vSphere 5 ESXi hypervisor  2 Virtual Machines (VM) with Ubuntu Server Guest OS

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Methodology: Measurement Setup System-level power measurements at 1s sampling rate Performance counter collection through vmkperf at 1s sampling rate  Counters: CPU cycles, retired instructions, L3-cache misses VM-level CPU and memory utilization data collection through esxtop with 2s sampling rate System-level power measurement Logger esxtop vmkperf 7

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Parallel Workloads PARSEC 2.1 benchmark suite [Bienia et al., 2008] BenchmarkApplicationIPCMemory Acc. blackscholesFinancial AnalysisLow bodytrackComputer VisionHighMedium cannealVLSI DesignLowHigh dedupEnterprise StorageMediumLow ferretSimilarity SearchMediumLow freqmineData MiningHighLow swaptionsFinancial AnalysisHighLow streamclusterData MiningLowHigh vipsMedia ProcessingHighLow x264Media ProcessingMedium 8

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Tracking Parallel Phases consolmgmt Consolidation management interface Synchronizes ROI (region-of-interest) of multiple workloads consolmgmt parsecmgmt hooks.c roi-Trigger() start-Logging Input (Serial) Output (Serial) Input (Serial)Output (Serial) Benchmark A Benchmark B sleep() start-Logging() end-Logging() roi-Trigger() 9

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Performance Impact of Consolidation Consolidating multiple workloads can degrade performance due to resource contention. Virtualization provides performance isolation by managing memory and NUMA node affinities. With native OS, performance variation is 2.5x higher. 10 Average throughput of Streamcluster when co- scheduled with another PARSEC benchmark

Computing in Heterogeneous, Autonomous 'N' Goal-oriented EnvironmentsOutline Background Methodology Adaptive Resource Sharing Results Conclusions 11

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Impact of Application Selection Previous co-scheduling policies focus on application selection to improve energy efficiency. Application selection is based on balancing memory operations and CPU usage. 12 A B C D

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Predicting Power Efficiency To improve the energy efficiency, we need to allocate more CPU resources to power-efficient workloads. IPC*CPU Utilization metric shows strong correlation with power efficiency. 13 IPC*CPU Utilization

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes. Application Classification 14

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments IPC*CPU Utilization metric is used to classify applications according to their power efficiency levels. We utilize density based clustering algorithm (DBSCAN) to determine application groups based on their power efficiency classes. Application Classification Case 2 VM1 ESXi VM0 VM1 ESXi VM0 Benchmarks Case 1 VM Configuration 15

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Reconfiguring Resource Allocations CPU hot-plugging:  Adding/removing vCPUs during runtime  Cons: Removing vCPU is not supported in some OSes Resource Allocation Adjustment:  Allocating/limiting CPU resources for VMs  Pros: Fine granularity (resource allocation unit is MHz) Both techniques have low overhead, less than 1%. 16 Resource Configuration Comparison

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Reconfiguration Runtime Behavior Resource allocation limits can be dynamically adjusted according to application classes. CPU allocation limits can be effectively reconfigured within a second. 17

Computing in Heterogeneous, Autonomous 'N' Goal-oriented EnvironmentsResults Proposed approach improves throughput-per-watt by up to 25% and by 9% on average. 18

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Results We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. 19 Set 23x canneal 3x ferret 2x bodytrack 1x dedup 1x vips Set 1 4x blackscholes 2x vips 1x bodytrack 1x freqmine 1x streamcluster 1x swaptions

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Results We generate 50 workload sets, each consists of randomly selected 10 PARSEC applications. Proposed resource sharing technique improves the throughput-per- watt by 12% on average in comparison to application selection based co-scheduling techniques. 20

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Conclusions & Future Work Consolidation is a powerful technique to improve the energy efficiency on data centers. Energy efficiency of parallel workloads varies significantly depending on application characteristics. Adaptive VM configuration for parallel workloads improves the energy efficiency by 12% on average over existing co-scheduling algorithms. Future research directions include:  Investigating the effect of memory allocation decisions on energy efficiency;  Utilizing application-level instrumentation to explore power/energy optimization opportunities;  Expanding the application space. 21

Computing in Heterogeneous, Autonomous 'N' Goal-oriented Environments Performance Comparison 22