Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:

Similar presentations


Presentation on theme: "Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:"— Presentation transcript:

1 Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source: Field-Programmable Technology (FPT), pp. 233-240, Dec. 2010 Presenter: Ming-Chih Li ESL, Dept. of CSIE, CCU 2011/09/16

2 Outline  Introduction  Heterogeneous Framework  Scheduling Polices  Applications  Performance Evaluation  Conclusions 2

3 Outline  Introduction  Heterogeneous Framework  Scheduling Polices  Applications  Performance Evaluation  Conclusions 3

4 Introduction  To increase the raw computation capacity of a system  Computational power  Number of processing units  High Performance Computing (HPC) systems  Co-processing accelerators  FPGA, GPU  Distributed computing  Several nodes in a cluster 4

5 Introduction (cont’)  Challenges of design  Hardware accelerators are customized for specific computation and communication patterns  High non-recurring engineering cost  Communication overhead 5

6 Introduction (cont’)  Focusing the research on the Monte-Carlo (MC) simulation problems  Contributions  A scalable distributed Monte-Carlo framework for multi-accelerator heterogeneous clusters  Load balancing schemes  Dynamic runtime scheduling  Mapped to two applications 6

7 Introduction (cont’)  What’s Monte-Carlo simulation problem?  A class of computational algorithms that rely on repeated random sampling to compute their results  Financial applications in banks  Example: calculation of PI value 7 1 1 0 Random (x, y) | x, y = [0,1] Area of square: 1*1 = 1 # of in-circle-points / total points * Area of square = Area of circle Area of circle = PI*r 2

8 Outline  Introduction  Heterogeneous Framework  Scheduling Polices  Applications  Performance Evaluation  Conclusions 8

9 Heterogeneous Framework  Three major concerns  Application programmer productivity  No new languages and tool chains  Scalability of approach  Hierarchical model  Resource utilization efficiency  Extensible dynamic scheduling policies  Based on computational performance or energy consumption 9

10 Heterogeneous Framework (cont’) 10

11 Heterogeneous Framework (cont’) 11

12 Heterogeneous Framework (cont’) 12

13 Outline  Introduction  Heterogeneous Framework  Scheduling Polices  Applications  Performance Evaluation  Conclusions 13

14 Scheduling Polices  The computational performance differs between different nodes and between different accelerators of the same node  Improper task distribution -> drastic performance reduction  For example:  Computing rate  FPGA = 1000/1s  CPU = 1/1s 14 One node FPGACPU MC distributor 20001000 1 1 Total time: 1000s

15 Scheduling Polices (cont’)  The computational performance differs between different nodes and between different accelerators of the same node  Improper task distribution -> drastic performance reduction  For example:  Computing rate  FPGA = 1000/1s  CPU = 1/1s 15 One node FPGACPU MC distributor 200011 19982 Total time: 2s

16 Scheduling Polices (cont’)  Proposed one static and two dynamic scheduling polices A. Constant-Size policy B. Linear-Incremental policy C. Exponential-Incremental policy  Definitions:  : initial task size for all child processes  : task size for child i at the jth time of simulation  : remaining uncompleted task size  16

17 Scheduling Polices (cont’) A. Constant-Size policy  For example:  If total simulation tasks size = 120 and = 50  Then TS i 1 = 50, R d = 70 TS i 2 = 50, R d = 20 TS i 3 = 20, R d = 0 17

18 Scheduling Polices (cont’) A. Linear-Incremental policy  For example:  If total simulation tasks size = 120, = 50, and c = 5  Then TS i 1 = 50, R d = 70 TS i 2 = 55, R d = 15 TS i 3 = 15, R d = 0 18

19 Scheduling Polices (cont’) A. Exponential-Incremental policy  For example:  If total simulation tasks size = 500, = 50, and m = 2  Then TS i 1 = 50, R d = 450 TS i 2 = 100, R d = 350 TS i 3 = 200, R d = 150 TS i 4 = 150, R d = 0 19

20 Scheduling Polices (cont’)  Other possible policies  Mixed scheduling policy  using Linear-Incremental policy at the beginning and then change the policy to Constant-Size after certain iteration  Energy-Equal scheduling policy  each MC worker consumes the same amount of computational energy 20

21 Outline  Introduction  Heterogeneous Framework  Scheduling Polices  Applications  Performance Evaluation  Conclusions 21

22 Applications  The authors have implemented two applications in the proposed framework, namely,  Asian option pricing using control variate method  GARCH asset simulation 22

23 Applications (cont’)  FPGA kernel  Constant-Size scheduling policy is the best choice as all MC cores finish the computation in the exact same cycle 23

24 Applications (cont’)  The number of pipelined stages must be identical for all the pipelined loops in order to guarantee a consistent computation schedule 24

25 Applications (cont’)  Xilinx Vertex-5 xc5vlx330t FPGA 25

26 Applications (cont’)  GPU kernel  Single Instruction Multiple Data (SIMD) computing devices  Design CUDA kernels  CPU kernel  C language  Intel Math Kernel Library (MKL)  compiled with Intel compiler (icc) 11.1 with -O3  OpenMP  parallel FOR #pragma 26

27 Outline  Introduction  Heterogeneous Framework  Scheduling Polices  Applications  Performance Evaluation  Conclusions 27

28 Performance Evaluation  An accelerator cluster  consists of 8 server nodes  two AMD Phenom 9650 Quad-Core 2.3GHz CPUs  one nVidia Tesla C1060 GPU  one Xilinx Virtex-5 xc5vlx330t FPGA 28

29 Performance Evaluation (cont’)  Dynamic scheduling analysis of a single node  The number of Monte-Carlo simulations is 10,000,000  Using Linear-Incremental policy with TS init = 1000 29

30 Performance Evaluation (cont’)  Dynamic scheduling analysis of a single node 30

31 Performance Evaluation (cont’)  Performance, energy and efficiency analysis of accelerator allocation of a cluster  Acceleration performance versus energy consumption  Power monitor  Additional Power Consumption for Computation (APCC)  APCC = Run-time Power – Static Power  Additional Energy Consumption for Computation (AECC) 31

32  simulations is 100000000  Using Linear-Incremental policy with TS init = 1000  Constant-Size scheduling policy is employed at the higher level MC distributor with TS init = 100M, 50M, 25M, 12.5M for a cluster with 1, 2, 4, 8 nodes. 32

33 Performance Evaluation (cont’) 33

34 Performance Evaluation (cont’) 34

35 Outline  Introduction  Heterogeneous Framework  Scheduling Polices  Applications  Performance Evaluation  Conclusions 35

36 Conclusions  Propose a dynamic scheduling Monte-Carlo framework for collaborative computation in a multi-accelerator heterogeneous cluster  Load balancing process is automated by employing dynamic scheduling policies using the proposed framework  The framework is scalable and extensible for a variety of dynamic scheduling policies  We have shown that the proposed framework is viable by mapping two applications involving financial computation 36

37 Conclusions  Future works  The automation for design development in this framework  Applications involving data-dependency will be tested  They also intend to collaborate with other institutes to form a “cluster of heterogeneous clusters” in solving practical scientific problems 37

38 Thanks for your Attention! 38


Download ppt "Dynamic Scheduling Monte-Carlo Framework for Multi-Accelerator Heterogeneous Clusters Authors: Anson H.T. Tse, David B. Thomas, K.H. Tsoi, Wayne Luk Source:"

Similar presentations


Ads by Google