Parallel Computing Explained How to Parallelize a Code

Slides:



Advertisements
Similar presentations
compilers and interpreters
Advertisements

OpenMP Optimization National Supercomputing Service Swiss National Supercomputing Center.
Parallel Processing with OpenMP
Introduction to Openmp & openACC
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Introductions to Parallel Programming Using OpenMP
The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Advanced microprocessor optimization Kampala August, 2007 Agner Fog
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
Cc Compiler Parallelization Options CSE 260 Mini-project Fall 2001 John Kerwin.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Dr. Muhammed Al-Mulhem 1ICS ICS 535 Design and Implementation of Programming Languages Part 1 OpenMP -Example ICS 535 Design and Implementation.
SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
OMPi: A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
OpenMP Blue Waters Undergraduate Petascale Education Program May 29 – June
Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
1 Serial Run-time Error Detection and the Fortran Standard Glenn Luecke Professor of Mathematics, and Director, High Performance Computing Group Iowa State.
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Lab 2 Parallel processing using NIOS II processors
HPC1 OpenMP E. Bruce Pitman October, HPC1 Outline What is OpenMP Multi-threading How to use OpenMP Limitations OpenMP + MPI References.
Threaded Programming Lecture 2: Introduction to OpenMP.
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
1 THE C COMPILATION STEPS. Najib Nadi Computing Sciences VU C Compilation Steps C Source Code FILE prog.c OPTION C Source Code C Source Code C Source.
Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio
PARALLELIZATION IN ABINIT
Topics to be covered Instruction Execution Characteristics
14 Compilers, Interpreters and Debuggers
User-Written Functions
Dieter an Mey Center for Computing and Communication, RWTH Aachen University, Germany V 3.0.
SHARED MEMORY PROGRAMMING WITH OpenMP
Introduction to programming
Topics Introduction to Repetition Structures
SHARED MEMORY PROGRAMMING WITH OpenMP
Loop Parallelism and OpenMP CS433 Spring 2001
Computer Engg, IIT(BHU)
Introduction to OpenMP
Lecture 5: GPU Compute Architecture
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
Programming Languages
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
Multi-core CPU Computing Straightforward with OpenMP
Lecture 2: Processes Part 1
Lecture 5: GPU Compute Architecture for the last time
Lesson Objectives Aims Key Words Compiler, interpreter, assembler
Welcome back to Software Development!
Shared Memory Programming
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Parallel Computing Explained Timing and Profiling
Introduction to OpenMP
Programming with Shared Memory Specifying parallelism
Lecture 2 The Art of Concurrency
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Programming with Shared Memory Specifying parallelism
1.3.7 High- and low-level languages and their translators
EN Software Carpentry Python – A Crash Course Esoteric Sections Parallelization
Shared-Memory Paradigm & OpenMP
Presentation transcript:

Parallel Computing Explained How to Parallelize a Code Slides Prepared from the CI-Tutor Courses at NCSA http://ci-tutor.ncsa.uiuc.edu/ By S. Masoud Sadjadi School of Computing and Information Sciences Florida International University March 2009

Agenda 1 Parallel Computing Overview 2 How to Parallelize a Code 2.1 Automatic Compiler Parallelism 2.2 Data Parallelism by Hand 2.3 Mixing Automatic and Hand Parallelism 2.4 Task Parallelism 2.5 Parallelism Issues 3 Porting Issues 4 Scalar Tuning 5 Parallel Code Tuning 6 Timing and Profiling 7 Cache Tuning 8 Parallel Performance Analysis 9 About the IBM Regatta P690

How to Parallelize a Code This chapter describes how to turn a single processor program into a parallel one, focusing on shared memory machines. Both automatic compiler parallelization and parallelization by hand are covered. The details for accomplishing both data parallelism and task parallelism are presented.

Automatic Compiler Parallelism Automatic compiler parallelism enables you to use a single compiler option and let the compiler do the work. The advantage of it is that it’s easy to use. The disadvantages are: The compiler only does loop level parallelism, not task parallelism. The compiler wants to parallelize every do loop in your code. If you have hundreds of do loops this creates way too much parallel overhead.

Automatic Compiler Parallelism To use automatic compiler parallelism on a Linux system with the Intel compilers, specify the following. ifort -parallel -O2 ... prog.f The compiler creates conditional code that will run with any number of threads. Specify the number of threads and make sure you still get the right answers with setenv: setenv OMP_NUM_THREADS 4 a.out > results

Data Parallelism by Hand First identify the loops that use most of the CPU time (the Profiling lecture describes how to do this). By hand, insert into the code OpenMP directive(s) just before the loop(s) you want to make parallel. Some code modifications may be needed to remove data dependencies and other inhibitors of parallelism. Use your knowledge of the code and data to assist the compiler. For the SGI Origin2000 computer, insert into the code an OpenMP directive just before the loop that you want to make parallel. !$OMP PARALLEL DO do i=1,n … lots of computation ... end do !$OMP END PARALLEL DO

Data Parallelism by Hand Compile with the mp compiler option. f90 -mp ... prog.f As before, the compiler generates conditional code that will run with any number of threads. If you want to rerun your program with a different number of threads, you do not need to recompile, just re-specify the setenv command. setenv OMP_NUM_THREADS 8 a.out > results2 The setenv command can be placed anywhere before the a.out command. The setenv command must be typed exactly as indicated. If you have a typo, you will not receive a warning or error message. To make sure that the setenv command is specified correctly, type: setenv It produces a listing of your environment variable settings.

Mixing Automatic and Hand Parallelism You can have one source file parallelized automatically by the compiler, and another source file parallelized by hand. Suppose you split your code into two files named prog1.f and prog2.f. f90 -c -apo … prog1.f (automatic // for prog1.f) f90 -c -mp … prog2.f (by hand // for prog2.f) f90 prog1.o prog2.o (creates one executable) a.out > results (runs the executable)

Task Parallelism You can accomplish task parallelism as follows: !$OMP PARALLEL !$OMP SECTIONS … lots of computation in part A … !$OMP SECTION … lots of computation in part B ... … lots of computation in part C ... !$OMP END SECTIONS !$OMP END PARALLEL Compile with the mp compiler option. f90 -mp … prog.f Use the setenv command to specify the number of threads. setenv OMP_NUM_THREADS 3 a.out > results

Parallelism Issues There are some issues to consider when parallelizing a program. Should data parallelism or task parallelism be used? Should automatic compiler parallelism or parallelism by hand be used? Which loop in a nested loop situation should be the one that becomes parallel? How many threads should be used?