Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Similar presentations


Presentation on theme: "Predictive Runtime Code Scheduling for Heterogeneous Architectures 1."— Presentation transcript:

1 Predictive Runtime Code Scheduling for Heterogeneous Architectures 1

2  Introduction  Scheduling Algorithm  Experiment method  Experiment results  Conclusions  Future work 2 Outline

3  Introduction  Scheduling Algorithm  Experiment method  Experiment results  Conclusions  Future work 3 Outline

4  Currently almost every desktop system is an heterogeneous system.  They both have a CPU and a GPU, two processing elements (PEs) with different characteristics but undeniable amounts of processing power.  It often used in a restricted way for domain-specific applications like scientific applications and games. 4 Heterogeneous system

5 1.Profile the application to be ported to a GPU and detect the most expensive parts in terms of execution time and the most amenable ones to fit the GPU-way of computing. 2.Port those code fragments to CUDA kernels (or any other framework for general purpose programming on GPU). 3.Iteratively optimize the kernels until the desired performance is achieved. 5 Current trend to program heterogeneous systems

6  Exploring and understanding the effect of different scheduling algorithms for heterogeneous architectures.  Fully exploiting the computing power available in current CPU/GPU-like heterogeneous systems.  Increasing overall system performance. 6 Objective

7  Introduction  Scheduling Algorithm  Experiment method  Experiment results  Conclusions  Future work 7 Outline

8  PE selection - the process to decide on which PE a new task should be executed.  task selection - the mechanism to choose which task must be executed next in a given PE. 8 Heterogeneous scheduling process

9 1.First-Free algorithm family 2.First-Free Round Robin (k) 3.performance history-based scheduler 9 PE selection Scheduling algorithms

10  pe : processing element  PElist : all processing elements  k[pe] : the number of tasks given to PE  history[pe, f] : keep the performance for every pair of PE and task  allowedPE : keep the PEs without a big unbalance 10 Term explanation

11 11 big unbalance PE1(3)PE2(10)PE3(5) allowedPE={pe │ ∄ pe' : history[pe,f] /history[pe', f] >}

12  g(PElist) looks for the first idle PE in that set and if there is not such a PE, it selects the GPU as the target.  h(allowedPE) basically estimates the waiting time in each queue and schedules the task to the queue with the smaller waiting time (in case both queues are empty it chooses the GPU). For that purpose the scheduler uses the performance history for every pair (task, PE) to predict how long is going to take until all the tasks in a queue complete their execution. 12 Term explanation ( cont )

13 13 First-Free algorithm family

14 14 First-Free Round Robin (k)  parameter k = (k1,…,k n )  Ex : k = (1,4) ↑↑ task : 4 1

15 15 Performance History Scheduling

16  first-come, first-served (FCFS)  It could be also possible to implement some more advanced techniques such as work stealing in order to increase the load balance of the different PEs. 16 Task selection

17  Introduction  Scheduling Algorithm  Experiment method  Experiment results  Conclusions  Future work 17 Outline

18  matmul - performs multiple square-matrix multiplications  ftdock - computes the interaction between two molecules (docking)  cp - computes the coulombic potential at each grid point over on plane in a 3D grid in which point charges have been randomly distributed.  sad - used in MPEG video encoders in order to perform a sum of absolute differences between frames. 18 benchmark

19 19 Performance

20  A machine with an Intel Core 2 E6600 processor running at 2.40GHz and 2GB of RAM has been used.  The GPU is an NVIDIA 8600 GTS with 512MB of memory.  The operating system is Red Hat Enterprise Linux 5. 20 Experiment setup

21  Introduction  Scheduling Algorithm  Experiment method  Experiment results  Conclusions  Future work 21 Outline

22 22 CPU v.s. GPU performance

23 23 Algorithms’ performance

24 24 Algorithms’ performance ( cont )

25 25 Benchmarks run on heterogeneous system

26 26 Effect of the number of tasks on the scheduler

27  Introduction  Scheduling Algorithm  Experiment method  Experiment results  Conclusions  Future work 27 Outline

28  CPU/GPU-like systems consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.  Performance predicting algorithms better balancing the system load, perform consistently better. 28 Conclusions

29  Introduction  Scheduling Algorithm  Experiment method  Experiment results  Conclusions  Future work 29 Outline

30  We intend to study new algorithms in order to further improve overall system performance.  Other benchmarks with different characteristics will be also tried. We expect that with a more realistic set of benchmarks (not only GPU-biased) the benefits of our system would be increased. 30 Future work

31  use and extend techniques such as clustering, code versioning and program phase runtime adaptation to improve the utilization and adaptation of all available resources in the future heterogeneous computing systems. 31 Future work

32 32 Thanks for your listening!


Download ppt "Predictive Runtime Code Scheduling for Heterogeneous Architectures 1."

Similar presentations


Ads by Google