Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Computing Explained How to Parallelize a Code

Similar presentations


Presentation on theme: "Parallel Computing Explained How to Parallelize a Code"— Presentation transcript:

1 Parallel Computing Explained How to Parallelize a Code
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida International University March 2009

2 Agenda 1 Parallel Computing Overview 2 How to Parallelize a Code
2.1 Automatic Compiler Parallelism 2.2 Data Parallelism by Hand 2.3 Mixing Automatic and Hand Parallelism 2.4 Task Parallelism 2.5 Parallelism Issues 3 Porting Issues 4 Scalar Tuning 5 Parallel Code Tuning 6 Timing and Profiling 7 Cache Tuning 8 Parallel Performance Analysis 9 About the IBM Regatta P690

3 How to Parallelize a Code
This chapter describes how to turn a single processor program into a parallel one, focusing on shared memory machines. Both automatic compiler parallelization and parallelization by hand are covered. The details for accomplishing both data parallelism and task parallelism are presented.

4 Automatic Compiler Parallelism
Automatic compiler parallelism enables you to use a single compiler option and let the compiler do the work. The advantage of it is that it’s easy to use. The disadvantages are: The compiler only does loop level parallelism, not task parallelism. The compiler wants to parallelize every do loop in your code. If you have hundreds of do loops this creates way too much parallel overhead.

5 Automatic Compiler Parallelism
To use automatic compiler parallelism on a Linux system with the Intel compilers, specify the following. ifort -parallel -O2 ... prog.f The compiler creates conditional code that will run with any number of threads. Specify the number of threads and make sure you still get the right answers with setenv: setenv OMP_NUM_THREADS 4 a.out > results

6 Data Parallelism by Hand
First identify the loops that use most of the CPU time (the Profiling lecture describes how to do this). By hand, insert into the code OpenMP directive(s) just before the loop(s) you want to make parallel. Some code modifications may be needed to remove data dependencies and other inhibitors of parallelism. Use your knowledge of the code and data to assist the compiler. For the SGI Origin2000 computer, insert into the code an OpenMP directive just before the loop that you want to make parallel. !$OMP PARALLEL DO do i=1,n … lots of computation ... end do !$OMP END PARALLEL DO

7 Data Parallelism by Hand
Compile with the mp compiler option. f90 -mp ... prog.f As before, the compiler generates conditional code that will run with any number of threads. If you want to rerun your program with a different number of threads, you do not need to recompile, just re-specify the setenv command. setenv OMP_NUM_THREADS 8 a.out > results2 The setenv command can be placed anywhere before the a.out command. The setenv command must be typed exactly as indicated. If you have a typo, you will not receive a warning or error message. To make sure that the setenv command is specified correctly, type: setenv It produces a listing of your environment variable settings.

8 Mixing Automatic and Hand Parallelism
You can have one source file parallelized automatically by the compiler, and another source file parallelized by hand. Suppose you split your code into two files named prog1.f and prog2.f. f90 -c -apo … prog1.f (automatic // for prog1.f) f90 -c -mp … prog2.f (by hand // for prog2.f) f90 prog1.o prog2.o (creates one executable) a.out > results (runs the executable)

9 Task Parallelism You can accomplish task parallelism as follows:
!$OMP PARALLEL !$OMP SECTIONS … lots of computation in part A … !$OMP SECTION … lots of computation in part B ... … lots of computation in part C ... !$OMP END SECTIONS !$OMP END PARALLEL Compile with the mp compiler option. f90 -mp … prog.f Use the setenv command to specify the number of threads. setenv OMP_NUM_THREADS 3 a.out > results

10 Parallelism Issues There are some issues to consider when parallelizing a program. Should data parallelism or task parallelism be used? Should automatic compiler parallelism or parallelism by hand be used? Which loop in a nested loop situation should be the one that becomes parallel? How many threads should be used?


Download ppt "Parallel Computing Explained How to Parallelize a Code"

Similar presentations


Ads by Google