Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance.

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Xavier León PhD defense
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Energy Efficient Prefetching with Buffer Disks for Cluster File Systems 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software.
P.Krusche / A. Tiskin - Efficient LLCS Computation using Bulk-Synchronous Parallelism Efficient Longest Common Subsequence Computation using Bulk-Synchronous.
Kick-off meeting 3 October 2012 Patras. Research Team B Communication Networks Laboratory (CNL), Computer Engineering & Informatics Department (CEID),
Multi-core processors. History In the early 1970’s the first Microprocessor was developed by Intel. It was a 4 bit machine that was named the 4004 The.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Analyzing the Energy Efficiency of a Database Server Hanskamal Patel SE 521.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
“Early Estimation of Cache Properties for Multicore Embedded Processors” ISERD ICETM 2015 Bangkok, Thailand May 16, 2015.
Green IT and Data Centers Darshan R. Kapadia Gregor von Laszewski 1.
Introduction Due to the recent advances in smart grid as well as the increasing dissemination of smart meters, the electricity usage of every moment in.
Task Alloc. In Dist. Embed. Systems Murat Semerci A.Yasin Çitkaya CMPE 511 COMPUTER ARCHITECTURE.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
 Escalonamento e Migração de Recursos e Balanceamento de carga Carlos Ferrão Lopes nº M6935 Bruno Simões nº M6082 Celina Alexandre nº M6807.
Low-Power Wireless Sensor Networks
Cloud Computing Energy efficient cloud computing Keke Chen.
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin.
Euro-Par, A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments Qian Zhu and Gagan Agrawal Department of.
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
Power-Aware Scheduling of Virtual Machines in DVFS-enabled Clusters
Co-Grid: an Efficient Coverage Maintenance Protocol for Distributed Sensor Networks Guoliang Xing; Chenyang Lu; Robert Pless; Joseph A. O ’ Sullivan Department.
Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
Scheduling Real-Time tasks on Symmetric Multiprocessor Platforms Real-Time Systems Laboratory RETIS Lab Marko Bertogna Research Area: Multiprocessor Systems.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Energy-Aware Scheduling for Aperiodic Tasks on Multi-core Processors Dawei Li and Jie Wu Department of Computer and Information Sciences Temple University,
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
XI HE Computing and Information Science Rochester Institute of Technology Rochester, NY USA Rochester Institute of Technology Service.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker : Chun-Chung Chen Single-ISA.
Introduction | Model | Solution | Evaluation
Edinburgh Napier University
Mean Value Analysis of a Database Grid Application
A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids e-Science IEEE 2007 Report: Wei-Cheng Lee
Babak Sorkhpour, Prof. Roman Obermaisser, Ayman Murshed
CMAQ PARALLEL PERFORMANCE WITH MPI AND OpenMP George Delic, Ph
See also: defense_Ziliang.ppt
Namyoon Woo and Heon Y. Yeom
ICIEV 2014 Dhaka, Bangladesh
Energy-Efficient Storage Systems
2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos.
Presentation transcript:

Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance Clusters

Where is Auburn University? Ph.D.’04, U. of Nebraska-Lincoln 04-07, New Mexico Tech 07-09, Auburn University

Storage Systems Research Group at New Mexico Tech ( )

Storage Systems Research Group at Auburn (2008)

Storage Systems Research Group at Auburn (2009)

Investigators Ziliang Zong, Ph.D. Assistant Professor, South Dakota Schools of Mines and Technology Adam Manzanares, Ph.D. Candidate Auburn University Xiao Qin, Ph.D. Assistant Professor at Auburn University

Introduction - Applications

Introduction – Data Centers

Motivation – Electricity Usage EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Motivation – Energy Projections EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Motivation – Design Issues Energy Efficiency Performance Reliability&Security

Outline Introduction & Motivation General Architecture for High- Performance Computing Platforms Energy- Efficient Scheduling for Clusters Energy- Efficient Scheduling for Grids Energy- Efficient Storage Systems Conclusions

Architecture – Multiple Layers

Energy Efficient Devices

Multiple Design Goals PerformanceEnergy Efficiency ReliabilitySecurity High- Performance Computing Platforms

Outline Introduction & Motivation General Architecture for High- Performance Computing Platforms Energy- Efficient Scheduling for Clusters Energy- Efficient Scheduling for Grids Energy- Efficient Storage Systems Conclusions

Energy-Aware Scheduling for Clusters

Parallel Applications

Motivational Example An Example of duplication Linear ScheduleTime: 39s No Duplication Schedule (NDS) T1 08 T3 23 T2 33 T4 39 Time: 32s Task Duplication Schedule (TDS)Time: 29s T1 08 T T1 08 T3 23 T T1 08 T3 23 T T4 32

Motivational Example (cont.) T1 08 T3 23 T T4 32 T1 08 T T1 08 T3 23 T An Example of duplication Linear ScheduleTime:39s Energy: 234J No Duplication Schedule (MCP) Task Duplication Schedule (TDS) T1 08 T3 23 T2 33 T4 39 Time: 32s Energy: 242J Time: 29s Energy: 284J CPU_Energy=6W Network_Energy=1W (10,60) (8,48) (6,6)(5,5) (15,90) (2,2) (4,4) (6,36)

Motivational Example (cont.) (10,60) (8,48) (6,6)(5,5) (15,90) (2,2) (4,4) (6,36) The energy cost of duplicating T1: CPU side: 48J Network side: -6J Total: 42J The performance benefit of duplicating T1: 6s Energy-performance tradeoff: 42/6 = 7 T1 08 T3 23 T T4 32 T1 08 T T1 08 T3 23 T EAD PEBD Time: 32s Energy: 242J Time: 29s Energy: 284J If Threshold = 10 Duplicate T1? EAD: NO PEBD: Yes

Basic Steps of Energy-Aware Scheduling Task Description: Task Set {T1, T2, …, T9, T10 } T1 is the entry task; T10 is the exit task; T2, T3 and T4 can not start until T1 finished; T5 and T6 can not start until T2 finished; T7 can not start until both T3 and T4 finished; T8 can not start until both T5 and T6 finished; T9 can not start until both T6 and T7 finished; T10 can not start until both T8 and T9 finished; Task Description: Task Set {T1, T2, …, T9, T10 } T1 is the entry task; T10 is the exit task; T2, T3 and T4 can not start until T1 finished; T5 and T6 can not start until T2 finished; T7 can not start until both T3 and T4 finished; T8 can not start until both T5 and T6 finished; T9 can not start until both T6 and T7 finished; T10 can not start until both T8 and T9 finished; Step 1: DAG Generation Algorithm Implementation:

Basic Steps of Energy-Aware Scheduling Step 2: Parameters Calculation Algorithm Implementation: TaskLevelESTECTLASTLACTFP Total Execution time from current task to the exit task Earliest Start Time Earliest Completion Time Latest Allowable Start Time Latest Allowable Completion Time Favorite Predecessor

Basic Steps of Energy-Aware Scheduling Step 3: Scheduling Algorithm Implementation: TaskLevelESTECTLASTLACTFP Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}

Basic Steps of Energy-Aware Scheduling Step 4: Duplication Decision Algorithm Implementation: Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Decision 1: Duplicate T1? Decision 2: Duplicate T2? Duplicate T1? Decision 3: Duplicate T1?

The EAD and PEBD Algorithms Generate the DAG of given task sets Find all the critical paths in DAG Generate scheduling queue based on the level (ascending) select the task (has not been scheduled yet) with the lowest level as starting task For each task which is in the same critical path with starting task, check if it is already scheduled allocate it to the same processor with the tasks in the same critical path Yes No meet entry task Save time if duplicate this task? Yes Calculate energy increase and time decrease Ratio= energy increase/ time decrease Ratio<=Threshold? No Yes Duplicate this task and select the next task in the same critical path Calculate energy increase more_energy<=Threshold? Duplicate this task and select the next task in the same critical path Yes No PEBD EAD

Energy Dissipation in Processors

Parallel Scientific Applications Fast Fourier TransformGaussian Elimination

Large-Scale Parallel Applications Robot ControlSparse Matrix Solver

Impact of CPU Power Dissipation Energy consumption for different processors (Gaussian, CCR=0.4) Energy consumption for different processors (FFT, CCR=0.4) 19.4%3.7% CPU TypePower (busy)Power (idle)Gap 104w15w89w 75w14w61w 47w11w36w 44w26w18w Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

Impact of Interconnect Power Dissipation Energy consumption (Robot Control, Myrinet)Energy consumption (Robot Control, Infiniband) 16.7% 5% InterconnectionPower Myrinet33.6w Infiniband65w Observation: The energy saving of EAD and PEBD is degraded if the interconnection has high power consumption rate. 13.3%3.1%

Parallelism Degrees Energy consumption of Robert Control(Myrinet) Energy consumption of Sparse Matrix (Myrinet) ApplicationParallelism Robot Control Sparse Matrix Solver Observation: Robert Control has more task dependencies thus there exists more possibility for EAD and PEBD to consume energy by judiciously duplicating tasks. 17% 15.8% 6.9%5.4%

Communication-Computation Ratio Energy consumption under different CCRs Processor type:Athlon W Interconnection:Myrinet Simualated Application:Robot Control CCR:(0.1, 0.5, 1, 5, 10) Observation:  The overall energy consumption of EAD and PEBD are less than MCP and TDS.  EAD and PEBD are very sensitive to CCR  MCP provides the greatest energy savings if CCR is less than 1  MCP consumes much more energy when CCR is large CCR: Communication-Computation Rate

Performance Schedule length of Gaussian EliminationSchedule length of Sparse Matrix Solver ApplicationEAD Performance Degradation (: TDS) PEBD Performance Degradation (: TDS) Gaussian Elimination5.7%2.2% Sparse Matrix Solver2.92%2.02% Observation: it is worth trading a marginal degradation in schedule length for a significant energy savings for cluster systems.

Heterogeneous Clusters - Motivational Example

Motivational Example (cont.) Energy calculation for tentative schedule C1 C2 C3 C4

Experimental Settings Parameters Value (Fixed) - (Varied) Different trees to be examined Gaussian elimination, Fast Fourier Transform Execution time of Gaussian Elimination {5, 4, 1, 1, 1, 1, 10, 2, 3, 3, 3, 7, 8, 6, 6, 20, 30, 30 }-(random) Execution time of Fast Fourier Transform {15, 10, 10, 8, 8, 1, 1, 20, 20, 40, 40, 5, 5, 3, 3 }-(random) Computing node type AMD Athlon 64 X with 85W TDP (Type 1) AMD Athlon 64 X with 65W TDP (Type 2) AMD Athlon 64 X with 35W TDP (Type 3) Intel Core 2 Duo E6300 processor (Type 4) CCR setBetween 0.1 and 10 Computing node heterogeneity Environment1: # of Type 1: 4 # of Type 2: 4 # of Type 3: 4 # of Type 4: 4 Environment2: # of Type 1: 6 # of Type 2: 2 # of Type 3: 2 # of Type 4: 6 Environment3: # of Type 1: 5 # of Type 2: 3 # of Type 3: 3 # of Type 4: 5 Environment4: # of Type 1: 7 # of Type 2: 1 # of Type 3: 1 # of Type 4: 7 Network energy consumption rate 20W, 33.6W, 60W Simulation Environments

Communication-Computation Ratio CCR sensitivity for Gaussian Elimination

Heterogeneity Computational nodes heterogeneity experiments CPU Type E1E2E3E Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

 Architecture for high-performance computing platforms  Energy-Efficient Scheduling for Clusters  Energy-Efficient Scheduling for Heterogeneous Systems  How to measure energy consumption? Kill-A- Watt Conclusions

Questions Questions