Power-Aware Parallel Job Scheduling

Slides:

Advertisements

Similar presentations

Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)

Advertisements

Energy Efficient Scheduling in IaaS Cloud Mehdi Sheikhalishahi University of Calabria Supervisor: Prof. Lucio Grandinetti OGF 28 Munich, th March.

Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters Presenter: Xiaoyu Sun.

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

Hadi Goudarzi and Massoud Pedram

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.

Towards Provision of Quality of Service Guarantees in Job Scheduling Mohammad IslamPavan Balaji P. SadayappanD. K. Panda Computer Science and Engineering.

Opportune Job Shredding: An Efficient Approach for Scheduling Parameter Sweep Applications Rohan Kurian, Pavan Balaji, P. Sadayappan The Ohio State University.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.

A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.

Managing Risk of Inaccurate Runtime Estimates for Deadline Constrained Job Admission Control in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing.

Senior Design Project: Parallel Task Scheduling in Heterogeneous Computing Environments Senior Design Students: Christopher Blandin and Dylan Machovec.

Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden.

Parallel Job Scheduling Algorithms and Interfaces Research Exam for Cynthia Bailey Lee Department of Computer Science and Engineering University of California,

Power-Aware Placement

Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,

Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.

Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.

Energy Model for Multiprocess Applications Texas Tech University.

Energy Aware Network Operations Authors: Priya Mahadevan, Puneet Sharma, Sujata Banerjee, Parthasarathy Ranganathan HP Labs IEEE Global Internet Symposium.

University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.

Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research.

Integrated Risk Analysis for a Commercial Computing Service Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. Dept.

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian, Haojie Zhou, Yongqiang He,Li Zha 簡報人：碩資工一甲董耀文.

XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.

MM Process Management Karrie Karahalios Spring 2007 (based off slides created by Brian Bailey)

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.

GridIS: an Incentive-based Grid Scheduling Lijuan Xiao, Yanmin Zhu, Lionel M. Ni, Zhiwei Xu 19th International Parallel and Distributed Processing Symposium.

Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.

Energy Savings with DVFS Reduction in CPU power Extra system power.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.

1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.

The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.

A Node and Load Allocation Algorithm for Resilient CPSs under Energy-Exhaustion Attack Tam Chantem and Ryan M. Gerdes Electrical and Computer Engineering.

CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 31 – Multimedia OS (Part 1) Klara Nahrstedt Spring 2011.

Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.

System Utilization Benchmark on the Cray T3E and IBM SP Adrian Wong, Leonid Oliker, William Kramer, Teresa Kaltz, Therese Enright and David Bailey National.

An Energy-efficient Task Scheduler for Multi-core Platforms with per-core DVFS Based on Task Characteristics Ching-Chi Lin Institute of Information Science,

XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.

QoPS: A QoS based Scheme for Parallel Job Scheduling M. IslamP. Balaji P. Sadayappan and D. K. Panda Computer and Information Science The Ohio State University.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 31 – Process Management (Part 1) Klara Nahrstedt Spring 2009.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Sunpyo Hong, Hyesoon Kim

1 / 21 Providing Differentiated Services from an Internet Server Xiangping Chen and Prasant Mohapatra Dept. of Computer Science and Engineering Michigan.

Courtesy Piggybacking: Supporting Differentiated Services in Multihop Mobile Ad Hoc Networks Wei LiuXiang Chen Yuguang Fang WING Dept. of ECE University.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Basic Concepts Maximum CPU utilization obtained with multiprogramming

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker ： Chun-Chung Chen Single-ISA.

Energy Aware Network Operations

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1

Green cloud computing 2 Cs 595 Lecture 15.

Analyzing Security and Energy Tradeoffs in Autonomic Capacity Management Wei Wu.

Ching-Chi Lin Institute of Information Science, Academia Sinica

A Characterization of Approaches to Parrallel Job Scheduling

ANALYSIS OF USER SUBMISSION BEHAVIOR ON HPC AND HTC

Process Scheduling B.Ramamurthy 4/11/2019.

Process Scheduling B.Ramamurthy 4/7/2019.

Uniprocessor scheduling

Presentation transcript:

Power-Aware Parallel Job Scheduling Maja Etinski Julita Corbalan Jesus Labarta Mateo Valero {maja.etinski,julita.corbalan,jesus.labarta,mateo.valero}@bsc.es

Power Consumption of Supercomputing Systems Striving for performance has led to enormous power dissipation of HPC centers (Top500 list) KWatts EEHiPC'10

Power reduction approaches in HPC Application level: - Runtime systems: - exploit certain application characteristics (load imbalance, communication intensive regions) - based on very fine grain DVFS application System level: - Turning off idle nodes: - resource allocation such that there are more completely idle nodes - determining number of online nodes - Operating system power management via DVFS: - linux governors – per core, unawareness of the rest of the system - DVFS taking into the account entire system workload? EEHiPC'10

Parallel Job Scheduling Job scheduler has a global view of the whole system Wait Queue Queued jobs Job submission HPC Job Scheduler Job with its requirements Job Scheduling Resource Manager EEHiPC'10

DVFS and Job Scheduling Wait Queue Queued jobs Job submission HPC Job Scheduler Job with its requirements Job Scheduling Resource Manager Power-Aware Component Job CPU frequency assignment based on goals/constraints EEHiPC'10

Outline Parallel job scheduling: Power and run time modelling: short introduction to parallel job scheduling the EASY backfilling policy Power and run time modelling: first we need to understand how frequency scaling affects CPU power dissipation and runtime Energy-saving parallel job scheduling policies: Utilization-driven power-aware scheduling [Maja Etinski, Julita Corbalan, Jesus Labarta, and Mateo Valero. Utilization driven power-aware parallel job scheduling. Energy Aware High Performace Computing Conference, Hamburg, September 2010] BSLD-driven power-aware scheduling [Maja Etinski, Julita Corbalan, Jesus Labarta, and Mateo Valero. Bsld threshold driven power management policy for hpc centers. IEEE Parallel and Distributed Processing Symposium, HPPAC Workshops 2010 Atlanta, GA, April 2010] Power-budgeting: how to maximize job performance under a given power budget? [Maja Etinski, Julita Corbalan, Jesus Labarta, and Mateo Valero. Optimizing job performance under a given power constraint in hpc centers. IEEE International Green Computing Conference, Chicago, IL, August 2010] EEHiPC'10

About Parallel Job Scheduling Parallel job scheduling can be seen as finding a free rectangle for the job being scheduled: FCFS policy used in the beginning Backfilling policies introduced to improve system utilization Job performance metrics: Response time: WaitTime(J)+RunTime(J) Slowdown: (WaitTime(J) + RunTime(J))/RunTime(J) Bounded Slowdown: max((WaitTime(J)+Runtime(J))/max(Th,RunTime(J)) ,1) CPUs Job 5 Job 3 Job 2 Job 1 Job 4 Job 6 Time Job Performance Wait Time Run Time EEHiPC'10

The EASY backfilling policy Jobs are executed in FCFS order except when the first job in the wait queue can not start Users have to submit an estimation of job's runtime – requested time When the first job in the WQ can not start, a reservation is made for it based on requested times of running jobs A job is executed before previously arrived ones only if it does not delay the first job in the wait queue CPUs Arrival of Job 5 Arrival of Job 6 Job 5 Job 1 Job 2 MakeJobReservation(Job5) Job 5 Job 3 BackfillJob(Job6) Job 4 Job 6 Time EEHiPC'10

High-Level DVFS modelling

Power Model CPU power presents one of main system power components It consists of dynamic and static power: Pcpu = Pdynamic + Pstatic Pdynamic = AcfV2 Pstatic = α V Fraction of static in total CPU power is a model parameter: Pstatic(Vtop) = X(Pstatic(Vtop) + Pdynamic (ftop,Vtop)) ( X = 25% in our experiments ) Average activity factor assumed to be same for all jobs (2.5 times higher than idle activity) Idle processors: do not consume power/ consume power at the lowest frequency DVFS gear set : EEHiPC'10

F(f,ß)=T(f) / T(ftop) = ß(ftop / f -1) + 1 Time Model Execution time dependence on frequency is captured by the following model: F(f,ß)=T(f) / T(ftop) = ß(ftop / f -1) + 1 [Hsu,Feng SC05: A Power-Aware Run Time System for High- Performance Computing] ß is assumed to have the following normal distributions: Global application ß depends on communication/computation ratio Two ß scenarios: ß is known in advance (at the moment of scheduling) ß is not known in advance (at the moment of scheduling the worst case, ß = 1, is assumed ) N(0.3, 0.064) More than 32 N(0.4, 0.01) Between 4 and 32 N(0.5, 0.01) Less or equal to 4 Distribution Number of CPUs ß=0.7 ß=0.5 ß=0.3 EEHiPC'10

Energy Saving Parallel Job Scheduling Policies

Utilization-Driven Policy Frequency assigned once (at jobs start time) for entire job execution based on system utilization Utilization is computed for each interval T: An additional control over system load WQthreshold: If there are more than WQthreshold jobs in the wait queue no frequency scaling will be applied Otherwise, job started during interval Jk runs at frequency F ftop Fk fupper flower Uk-1 Ulower Uupper EEHiPC'10

Evaluation Alvio simulator C++ event driven parallel job scheduling simulator has been upgraded Policy parameters: utilization thresholds: Ulower = 50% Uupper = 80% reduced frequencies: flower = 1.4 GHz fupper = 2.0 GHz utilization computation interval: T = 10 min wait queue length threshold: WQthreshold = 0, 4, 16, NO - limit Metric of job performance – Bounded Slowdown: BSLD at frequency f : Policy parameters Metric of performance EEHiPC'10

Workloads Five workloads from production use have been simulated: Cornell Theory Center -large jobs with relatively low level of parallelism San Diego Supercomputing Center -less sequential jobs than CTC -runtime distribution similar Lawrence Livermore National Lab - small to medium size jobs Lawrence Livermore National Lab - large parallel jobs San Diego Supercomputing Center - no sequential job Parallel workload archive http://www.cs.huji.ac.il/labs/parallel/workload EEHiPC'10

Results: Normalized CPU Energy short wait queues very similar results for both energy scenarios savings of not highly loaded workloads up to 12% EEHiPC'10

Results: Normalized Performance high penalty in the least conservative case for highly loaded workload WQ threshold has almost no impact an increase in number of backfilled jobs EEHiPC'10

Average frequency - SDSCBlue EEHiPC'10

BSLD-Driven Policy Frequency is assigned based on job's predicted performance Lower frequency -> longer execution time -> worse job performance metric BSLDth controls allowable performance penalty (“target BSLD”) In order to be run at lower frequency f a job has to satisfy BSLD condition at frequency f: if the job's predicted BSLD at frequency f is lower than BSLDth than it satisfies the BSLD condition at frequency f Predicted BSLD: Job Ji NO WQsize ≤ WQthreshold Run Ji at Ftop YES f = Flowest find an allocation Alloc satisfiesBSLD(Alloc,Ji,f) or f=Ftop NO f = next higher frequency YES Run job Ji at frequency f EEHiPC'10

Results: Normalized CPU Energy Normalized energies in two energy scenarios behave in the same way Average savings in the most aggressive case: 5% - 23% Difference in savings per workload for the most conservative and the most aggressive threshold combinations goes from 5% (SDSC) to 15% (LLNLThunder) WQthreshold controls DVFS aggressiveness much better than BSLDthreshold BSLDthreshold has stronger impact when WQthreshold is higher EEHiPC'10

Average BSLD 24.91 Strong impact on performance in the most aggressive case Impact of WQthreshold higher than of BSLDthreshold BSLDthreshold has stronger impact when WQthreshold is higher 1 4.66 5.15 1.08 Decrease in performance is proportional to energy savings EEHiPC'10

Reduced jobs (out of 5000)‏ Performance depends on the number of reduced jobs It depends on used frequencies as well It was remarked that performance of jobs that have been run at the nominal frequency was affected as well When load is very high (SDSC) no DVFS is applied (in order to apply it thresholds have to be set to higher values) EEHiPC'10

Wait time Main problem observed: -> high impact on wait time Zoom of SDSCBlue wait time behavior EEHiPC'10

Power-Budgeting Policy

PB-Guided Policy: How DVFS can improve overall job performance NO DVFS CASE J3 J5 ftop J1 Wait Queue: J2 J4 J5 Time T1 T2 ftop J4 J5 J3 DVFS CASE J4 Power Budget flower J2 J3 J1 J2 J1 penalty in run time due to frequency scaling but more jobs can run simultaneously EEHiPC'10

Power Budgeting: PB-Guided Policy Frequency assignment is guided by predicted job performance and current power draw Prediction of BSLD when selecting frequency: BSLD condition: A job satisfies BSLD condition at reduced frequency f if its predicted BSLD at the frequency f is lower than current value of the BSLD threshold The policy is power conservative: A job will be scheduled at the lowest frequency at which both BSLD condition and power limit are satisfied The closer to the PB limit, the higher the BSLD threshold The higher the BSLD threshold, the lower frequency will be selected BSLD threshold Pcurrent Plower Pupper Power Budget EEHiPC'10

Power Budgeting: PB-Guided Policy A job can be scheduled with one of the two functions: MakeJobReservation(J)‏ 1: scheduled <-- false; 2: shiftInTime <-- 0; 3: nextFinishJob <-- next(OrderedRunningQueue); 4: while( !scheduled)‏ { 5: f <-- FlowestReduced 6: while(f < Fnominal) 7: Alloc = findAllocation(J,currentTime + shiftInTime,f); 8: if (satisfiesBSLD(Alloc, J, f) and satisfiesPowerLimit(Alloc, J, f) ) 9: { schedule(J, Alloc); 10: scheduled <-- true; 11: break; } 12: if (f == Fnominal) 13: Alloc = findAllocation(J,currentTime + shiftInTime, Fnominal) 14: if (satisfiesPowerLimit(Alloc, J,Fnominal)) 15: schedule(J, Alloc); 16: break; 17: shiftInTime <-- FinishTime(nextFinishJob) - currentTime; 18: nextFinishJob <-- next(OrderedRunningQueue); } BackfillJob(J)‏ 1: f <-- Flowest 2: while(f < Fnominal) { 3: Alloc = TryToFindBackfilledAllocation(J,f); 4: if (correct(Alloc) and satisfiesBSLD(Alloc, J,f) and satisfiesPowerLimit(Alloc,J,f)) 5: { schedule(J, Alloc); 6: break; } 7: f <-- nextHigherFrequency } 8: if (f==Fnominal) 9: { Alloc = TryToFindBackfilledAllocation(J,Fnominal); 10: if ((correct(Alloc) and satisfiesPowerLimit(Alloc,J,f)) 11: schedule(J, Alloc); } the lowest frequency that satisfies it will be selected BSLD Condition power budget must not be violated during entire job execution Power Limit EEHiPC'10

Evaluation Policy parameters: Power budget Power budget thresholds: Plower = 0.6 , Pupper = 0.9 BSLD threshold values which have been used: BSLDlower = avg(BSLD) without power budgeting BSLDupper = 2* BSLDlower Power budget set to 80% of the total CPU power consumed by whole system when running at Fnominal Four workloads from production use have been simulated: 89% 80% 1 20 - 25 LLNLThunder - 4008 74% 69% 5.15 20 – 25 SDSCBlue – 1152 95% 85% 24.91 40 - 45 SDSC – 128 72% 70% 4.66 CTC – 430 Over PB Utilization Avg BSLD Jobs(K)‏ Workload - # CPUs

Baseline Power Budgeting Policy Power limited without DVFS: No job will start if it would violate the budget although there are available processors This case is equal to the EASY scheduling with a smaller machine Arrival of Job 6 CPUs Arrival of Job 4 Arrival of Job 5 Job 4 Job 1 Job 2 Job 6 MakeJobReservation(Job5)‏ Job 4 Job 3 Job 5 BackfillJob(Job6)‏ Job 6 can not start because of power budget EEHiPC'10

Results: Performance Oracle case: it is assumed that ß values are known at the scheduling time PB-guided policy shows better performance for all workloads! AVG wait time decreases with DVFS under power constraint EEHiPC'10

Results: Normalized CPU Energy (idle=0) Oracle case: it is assumed that ß values are known at the scheduling time EEHiPC'10

Utilization Over Time EEHiPC'10

Power Budget Consumed EEHiPC'10

Comparison of Unknown and Known ß Avg.BSLD, Avg.WT and Avg.Energy values are normalized with respect to corresponding baseline values (EASY-backfilling with power limit and without DVFS) EEHiPC'10

Conclusions Energy – performance trade-off must be done carefully as DVFS does not affect only job runtime but it can affect significantly job wait time and additionally decrease job performance Performance-energy trade-off needs to be done at job scheduling level as it affect jobs in the wait queue and only the scheduler can estimate potential negative impact on queued jobs DVFS application to highly loaded workloads (SDSC) leads to very high performance penalty Parallel job scheduling policies can be designed such that maximizes job performance under a given power constraint It has been shown that DVFS can improve performance in power constrained HPC centers (using lower CPU frequencies allows more job to run simultaneously)‏ It is not necessary to know ß values in advance, moreover assuming the worst case at scheduling time can give better performance than when they are known in advance EEHiPC'10

Thank you for your attention! HPPAC 2010 Atlanta EEHiPC'10