Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance.

Similar presentations


Presentation on theme: "Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance."— Presentation transcript:

1 Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance Clusters

2 Where is Auburn University? Ph.D.’04, U. of Nebraska-Lincoln 04-07, New Mexico Tech 07-09, Auburn University

3 3 2015-6-21 Storage Systems Research Group at New Mexico Tech (2004-2007)

4 4 2015-6-21 Storage Systems Research Group at Auburn (2008)

5 5 2015-6-21 Storage Systems Research Group at Auburn (2009)

6 6 2015-6-21 Investigators Ziliang Zong, Ph.D. Assistant Professor, South Dakota Schools of Mines and Technology Adam Manzanares, Ph.D. Candidate Auburn University Xiao Qin, Ph.D. Assistant Professor at Auburn University

7 7 2015-6-21 Introduction - Applications

8 8 2015-6-21 Introduction – Data Centers

9 9 2015-6-21 Motivation – Electricity Usage EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

10 10 2015-6-21 Motivation – Energy Projections EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

11 11 2015-6-21 Motivation – Design Issues Energy Efficiency Performance Reliability&Security

12 12 2015-6-21 Outline Introduction & Motivation General Architecture for High- Performance Computing Platforms Energy- Efficient Scheduling for Clusters Energy- Efficient Scheduling for Grids Energy- Efficient Storage Systems Conclusions

13 13 2015-6-21 Architecture – Multiple Layers

14 14 2015-6-21 Energy Efficient Devices

15 15 2015-6-21 Multiple Design Goals PerformanceEnergy Efficiency ReliabilitySecurity High- Performance Computing Platforms

16 16 2015-6-21 Outline Introduction & Motivation General Architecture for High- Performance Computing Platforms Energy- Efficient Scheduling for Clusters Energy- Efficient Scheduling for Grids Energy- Efficient Storage Systems Conclusions

17 17 2015-6-21 Energy-Aware Scheduling for Clusters

18 18 2015-6-21 Parallel Applications

19 19 2015-6-21 Motivational Example 8 1 23 4 65 1015 2 4 6 An Example of duplication Linear ScheduleTime: 39s No Duplication Schedule (NDS) T1 08 T3 23 T2 33 T4 39 Time: 32s Task Duplication Schedule (TDS)Time: 29s T1 08 T2 18 2 T1 08 T3 23 T4 2920 T1 08 T3 23 T2 6 2414 2 26 T4 32

20 20 2015-6-21 Motivational Example (cont.) T1 08 T3 23 T2 6 2414 2 26 T4 32 T1 08 T2 18 2 T1 08 T3 23 T4 2920 An Example of duplication Linear ScheduleTime:39s Energy: 234J No Duplication Schedule (MCP) Task Duplication Schedule (TDS) T1 08 T3 23 T2 33 T4 39 Time: 32s Energy: 242J Time: 29s Energy: 284J CPU_Energy=6W Network_Energy=1W (10,60) (8,48) 1 23 4 (6,6)(5,5) (15,90) (2,2) (4,4) (6,36)

21 21 2015-6-21 Motivational Example (cont.) (10,60) (8,48) 1 23 4 (6,6)(5,5) (15,90) (2,2) (4,4) (6,36) The energy cost of duplicating T1: CPU side: 48J Network side: -6J Total: 42J The performance benefit of duplicating T1: 6s Energy-performance tradeoff: 42/6 = 7 T1 08 T3 23 T2 6 2414 2 26 T4 32 T1 08 T2 18 2 T1 08 T3 23 T4 2920 EAD PEBD Time: 32s Energy: 242J Time: 29s Energy: 284J If Threshold = 10 Duplicate T1? EAD: NO PEBD: Yes

22 22 2015-6-21 Basic Steps of Energy-Aware Scheduling Task Description: Task Set {T1, T2, …, T9, T10 } T1 is the entry task; T10 is the exit task; T2, T3 and T4 can not start until T1 finished; T5 and T6 can not start until T2 finished; T7 can not start until both T3 and T4 finished; T8 can not start until both T5 and T6 finished; T9 can not start until both T6 and T7 finished; T10 can not start until both T8 and T9 finished; Task Description: Task Set {T1, T2, …, T9, T10 } T1 is the entry task; T10 is the exit task; T2, T3 and T4 can not start until T1 finished; T5 and T6 can not start until T2 finished; T7 can not start until both T3 and T4 finished; T8 can not start until both T5 and T6 finished; T9 can not start until both T6 and T7 finished; T10 can not start until both T8 and T9 finished; Step 1: DAG Generation Algorithm Implementation:

23 23 2015-6-21 Basic Steps of Energy-Aware Scheduling Step 2: Parameters Calculation Algorithm Implementation: TaskLevelESTECTLASTLACTFP 1400303-- 22836471 33737371 43535351 51667 172 6256167172 7337277 3 815162318256 913273227327 108324032409 Total Execution time from current task to the exit task Earliest Start Time Earliest Completion Time Latest Allowable Start Time Latest Allowable Completion Time Favorite Predecessor

24 24 2015-6-21 Basic Steps of Energy-Aware Scheduling Step 3: Scheduling Algorithm Implementation: TaskLevelESTECTLASTLACTFP 1400303-- 22836471 33737371 43535351 51667 172 6256167172 7337277 3 815162318256 913273227327 108324032409 Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1}

25 25 2015-6-21 Basic Steps of Energy-Aware Scheduling Step 4: Duplication Decision Algorithm Implementation: Original Task List: {10, 9, 8, 5, 6, 2, 7, 4, 3, 1} Decision 1: Duplicate T1? Decision 2: Duplicate T2? Duplicate T1? Decision 3: Duplicate T1?

26 26 2015-6-21 The EAD and PEBD Algorithms Generate the DAG of given task sets Find all the critical paths in DAG Generate scheduling queue based on the level (ascending) select the task (has not been scheduled yet) with the lowest level as starting task For each task which is in the same critical path with starting task, check if it is already scheduled allocate it to the same processor with the tasks in the same critical path Yes No meet entry task Save time if duplicate this task? Yes Calculate energy increase and time decrease Ratio= energy increase/ time decrease Ratio<=Threshold? No Yes Duplicate this task and select the next task in the same critical path Calculate energy increase more_energy<=Threshold? Duplicate this task and select the next task in the same critical path Yes No PEBD EAD

27 27 2015-6-21 Energy Dissipation in Processors http://www.xbitlabs.com

28 28 2015-6-21 Parallel Scientific Applications Fast Fourier TransformGaussian Elimination

29 29 2015-6-21 Large-Scale Parallel Applications Robot ControlSparse Matrix Solver http://www.kasahara.elec.waseda.ac.jp/schedule/

30 30 2015-6-21 Impact of CPU Power Dissipation Energy consumption for different processors (Gaussian, CCR=0.4) Energy consumption for different processors (FFT, CCR=0.4) 19.4%3.7% CPU TypePower (busy)Power (idle)Gap 104w15w89w 75w14w61w 47w11w36w 44w26w18w Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

31 31 2015-6-21 Impact of Interconnect Power Dissipation Energy consumption (Robot Control, Myrinet)Energy consumption (Robot Control, Infiniband) 16.7% 5% InterconnectionPower Myrinet33.6w Infiniband65w Observation: The energy saving of EAD and PEBD is degraded if the interconnection has high power consumption rate. 13.3%3.1%

32 32 2015-6-21 Parallelism Degrees Energy consumption of Robert Control(Myrinet) Energy consumption of Sparse Matrix (Myrinet) ApplicationParallelism Robot Control4.363796 Sparse Matrix Solver15.868853 Observation: Robert Control has more task dependencies thus there exists more possibility for EAD and PEBD to consume energy by judiciously duplicating tasks. 17% 15.8% 6.9%5.4%

33 33 2015-6-21 Communication-Computation Ratio Energy consumption under different CCRs Processor type:Athlon 3800+ 35W Interconnection:Myrinet Simualated Application:Robot Control CCR:(0.1, 0.5, 1, 5, 10) Observation:  The overall energy consumption of EAD and PEBD are less than MCP and TDS.  EAD and PEBD are very sensitive to CCR  MCP provides the greatest energy savings if CCR is less than 1  MCP consumes much more energy when CCR is large CCR: Communication-Computation Rate

34 34 2015-6-21 Performance Schedule length of Gaussian EliminationSchedule length of Sparse Matrix Solver ApplicationEAD Performance Degradation (: TDS) PEBD Performance Degradation (: TDS) Gaussian Elimination5.7%2.2% Sparse Matrix Solver2.92%2.02% Observation: it is worth trading a marginal degradation in schedule length for a significant energy savings for cluster systems.

35 35 2015-6-21 Heterogeneous Clusters - Motivational Example

36 36 2015-6-21 Motivational Example (cont.) Energy calculation for tentative schedule C1 C2 C3 C4

37 37 2015-6-21 Experimental Settings Parameters Value (Fixed) - (Varied) Different trees to be examined Gaussian elimination, Fast Fourier Transform Execution time of Gaussian Elimination {5, 4, 1, 1, 1, 1, 10, 2, 3, 3, 3, 7, 8, 6, 6, 20, 30, 30 }-(random) Execution time of Fast Fourier Transform {15, 10, 10, 8, 8, 1, 1, 20, 20, 40, 40, 5, 5, 3, 3 }-(random) Computing node type AMD Athlon 64 X2 4600+ with 85W TDP (Type 1) AMD Athlon 64 X2 4600+ with 65W TDP (Type 2) AMD Athlon 64 X2 3800+ with 35W TDP (Type 3) Intel Core 2 Duo E6300 processor (Type 4) CCR setBetween 0.1 and 10 Computing node heterogeneity Environment1: # of Type 1: 4 # of Type 2: 4 # of Type 3: 4 # of Type 4: 4 Environment2: # of Type 1: 6 # of Type 2: 2 # of Type 3: 2 # of Type 4: 6 Environment3: # of Type 1: 5 # of Type 2: 3 # of Type 3: 3 # of Type 4: 5 Environment4: # of Type 1: 7 # of Type 2: 1 # of Type 3: 1 # of Type 4: 7 Network energy consumption rate 20W, 33.6W, 60W Simulation Environments

38 38 2015-6-21 Communication-Computation Ratio CCR sensitivity for Gaussian Elimination

39 39 2015-6-21 Heterogeneity Computational nodes heterogeneity experiments CPU Type E1E2E3E4 4657 4231 4231 4657 Observation: CPUs with large gap between CPU_busy and CPU_idle can obtain greater energy savings

40 40 2015-6-21  Architecture for high-performance computing platforms  Energy-Efficient Scheduling for Clusters  Energy-Efficient Scheduling for Heterogeneous Systems  How to measure energy consumption? Kill-A- Watt Conclusions

41 41 2015-6-21 http://www.auburn.edu/~xzq0001

42 Questions Questions http://www.eng.auburn.edu/~xqin 42 2015-6-21


Download ppt "Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance."

Similar presentations


Ads by Google