Parallel Programming by Tiago Sommer Damasceno Using OpenMP

Slides:



Advertisements
Similar presentations
Implementing Domain Decompositions Intel Software College Introduction to Parallel Programming – Part 3.
Advertisements

OpenMP.
Parallel Processing with OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP Blue Waters Undergraduate Petascale Education Program May 29 – June
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
O PEN MP (O PEN M ULTI -P ROCESSING ) David Valentine Computer Science Slippery Rock University.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
Computer Organization David Monismith CS345 Notes to help with the in class assignment.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Hybrid MPI and OpenMP Parallel Programming
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Threaded Programming Lecture 2: Introduction to OpenMP.
Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.
CS/EE 217 GPU Architecture and Parallel Programming Lecture 23: Introduction to OpenACC.
Single Node Optimization Computational Astrophysics.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
Heterogeneous Computing using openMP lecture 1 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Computer Engg, IIT(BHU)
Introduction to OpenMP
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Computer Science Department
Multi-core CPU Computing Straightforward with OpenMP
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Programming with Shared Memory
Introduction to OpenMP
OpenMP Parallel Programming
Parallel Programming with OPENMP
Presentation transcript:

Parallel Programming by Tiago Sommer Damasceno Using OpenMP

This module teaches about the basic concepts of parallel programming that uses a multi-threading API called OpenMP. Requirements  Knowledge in C Programming Language.  C compiler that supports OpenMP.  Unix/Linux terminal knowledge.  Access to a multi-core computer. Goals  Learn basics of OpenMP, such as compiler directives, functions and environment variables;  Understand how OpenMP works with shared memory. Introduction

What is it? OpenMP is a standard API that can be used in shared memory programs, which are written in Fortran, C, and C++.  It consists of: Compiler Directives. Functions. Environment Variables.  Requires a supportive compiler, such as: GNU, IBM, Oracle, Intel, HP, MS, Cray, Absoft Pro Fortran, Portland Group Compilers and Tools, Lahey/Fujitsu Fortran 95, Path Scale.  The last version, OpenMP 3.0 was release in May,  A summary of directives, functions and environment variables can be found at the Petascale Website ( or at

Pros and Cons Pros:  No need for major changes in the serial code.  Portable. (Supported by many compilers)  Has great scalability (If implemented correctly and the computer’s architecture permits)  Data decomposition is handled automatically.  Does not need to deal with message passing. Cons:  Requires a compiler that supports OpenMP  Currently only runs efficiently on shared- memory multiprocessors architectures.  Scalability limited by the computer’s architecture.  Can not be used on GPU.

Directives  Initializes parallel region.  Divides the work.  Synchronizes.  Data-Sharing Attributes. Functions  Defines number of threads.  Gets thread ID.  Supports Nested Parallelism. Environment Variables  Scheduling Type.  Number of Threads.  Dynamic Adjustment of Threads.  Maximum Number of Threads. For a more specific explanation of those three components, go to and look for OpenMP Specifications Main Components

Shared Memory In OpenMP all the threads can access Shared Memory Data. Private Memory Data can only be accessed by the thread that owns it. Shared Memory Thread Private Memory Thread Private Memory Thread Private Memory Thread Private Memory

Fork and Join Model Master Thread OpenMP Parallel Threads Parallel Section Initialization *Parallel Block Parallel Section Synchronization Must call something like #pragma omp parallel to create parallel threads This is performed implicitly at the end of each parallel section *Has ID 0 When a program starts to run, it consists of just one thread. The master thread then uses OpenMP directives which initialize executing the Parallel Section. When all the threads finish executing their instructions, the results are synchronized and the program can continue through the completion.

helloWorld.c In a Unix terminal window go to your home directory by typing: cd ~ Then edit your.bashrc file. Use your preferred editor, such as vim: vim.bashrc Add the following line, which is an alias to compile OpenMP code: alias ompcc='gcc44 –fopenmp '

Let’s create a Hello World program.  In your terminal type vim hello.c  Include OpenMP library by typing the following inside of your hello.c file. #include  Add the following compiler directive inside your main function to create many threads that will display a hello world message: #pragma omp parallel { /*Parallel Section*/ } The code should look like the sample below: #include int main(){ #pragma omp parallel { printf("Hello World!\n"); } return (0); } helloWorld.c

Compile and run the code using the following command lines in the Unix terminal window: Compile: ompcc helloWorld.c Run:./a.out Expected Output: Multiple hello world messages. Each message line corresponds to a thread created by OpenMP. Since we did not specify how many threads we wanted, OpenMP creates one thread for each processor available in the run.

helloWorld.c Instead of counting how many hello world messages were output, the following function can tell how many threads OpenMP created in the run: omp_get_num_threads(); Let’s see how many processors are available. The code should look similar to this: #include int main(){ int numThreads; #pragma omp parallel { numThreads = omp_get_num_threads(); printf("Hello World! There are %d threads\n",numThreads); } return (0); } Compile and run the code: Compile: ompcc helloWorld.c Run:./a.out

helloWorld.c Expected Output: The number of threads should correspond to the number of hello world messages. It is possible to specify the number of threads that Open MP will create. To do so use the following function: int omp_set_num_threads(int num);

helloWorld.c In this example 5 threads will be created. With that modification, the code should correspond to the example below: #include int main(){ omp_set_num_threads(5); int numThreads; #pragma omp parallel { numThreads = omp_get_num_threads(); printf("Hello World! There are %d threads\n",numThreads); } return (0); } Compile and run the code: Compile: ompcc helloWorld.c Run:./a.out

helloWorld.c Expected Output: 5 hello world messages. If there are 5 threads in this run, then each thread should have an ID. To display which corresponds to each hello world message, use the following function call to get the individual thread numbers: omp_get_thread_num();

helloWorld.c Let’s output the thread number with the hello world message. The code should look like this: #include int main(){ omp_set_num_threads(5); int numThreads, tNum; #pragma omp parallel { numThreads = omp_get_num_threads(); tNum = omp_get_thread_num(); printf("Hello World! I am thread %d. There are %d threads\n", tNum,numThreads); } return (0); } Compile and run the code: Compile: ompcc helloWorld.c Run:./a.out

helloWorld.c Expected Output: Each hello world message should contain its own thread ID. (Note: Thread numbering starts from zero.) Try running the code a few times. Is there anything unexpected? Are there two or more messages with the same thread ID number? If the answer is “yes” then congratulations! This is data contention called a Race Condition! Race Condition: Occurs when multiple threads change the value of a variable at the same time. In this case, the variable can get overwritten before it gets output by the thread. In this case, tNum is the variable getting overwritten. There are two ways to fix this Race Condition. The first is to add the compiler directive critical after the #pragma omp parallel statement, but that will make the code execute one thread at a time. Thus, defeating the purpose of parallel programming.

The other way of doing this is to add the compiler directive private() after the #pragma omp parallel statement, for every variable added inside private’s parenthesis, OpenMP will create an instance of it for each thread. Thus, each thread will have its own copy and won’t overwrite anyone’s variable. Your code should look similar to this now: #include int main(){ omp_set_num_threads(5); int numThreads, tNum; #pragma omp parallel private(tNum) { numThreads = omp_get_num_threads(); tNum = omp_get_thread_num(); printf("Hello World! I am thread %d. There are %d threads\n", tNum,numThreads); } return (0); } Compile and Run the program. helloWorld.c

Expected Output If everything was done correctly, then the output should be the same as before. This time there is no race condition, regardless of how many times the program is run, because tNum is not getting overwritten, since each thread has its own tNum. It probably seems excessive to output the number of threads with each hello world message, doesn’t it? Then how can the code be modified to display this information only once? Each thread has its own number. Thus, it is possible to give instructions to a specific thread. By simply creating an if-else statement it is possible to give thread zero, for example, the instructions to print how many threads there are and let the other threads just to display the hello world message. Why is this important to learn? In computational Science, scientists are working with large problems. Before the calculations are done, work has to be divided among the processors, to increase the performance one thread is in charge of distributing the work and eventually gathering the results, while the rest of the threads are used to do calculations. helloWorld.c

The following code will tell thread zero to display how many threads there are and leave the other threads to say hello world. #include int main(){ omp_set_num_threads(5); int numThreads, tNum; #pragma omp parallel private(tNum) { tNum = omp_get_thread_num(); if(tNum == 0) { numThreads = omp_get_num_threads(); printf("Hello World! I am thread %d. There are %d threads.\n", tNum,numThreads); } else { printf("Hello World from thread %d.\n", tNum); } return (0); }

helloWorld.c Compile and Run the program. Expected Output Five hello world messages, with thread zero displaying: “Hello World! I am thread 0. There are 5 threads.”

If everything was done correctly then it is time to move to something more advanced. For example, finding the area under a curve in parallel! Using everything learned in this module, let’s try to find the area under the x 2 curve, in the domain X[0, 1]. There is a serial version of code that finds the area under a curve on the next page. Try to parallelize this code using OpenMP. Parallel Programming

Serial Code: float f(float x) { return (x*x); } int main() int i, SECTIONS = 1000; float height = 0.0; float area = 0.0, y = 0.0, x = 0.0; float dx = 1.0/(float)SECTIONS; for( i = 0; i < SECTIONS; i++){ x = i*dx; y = f(x); area += y*dx; } printf("Area under the curve is %f\n",area); return (0); } integration.c

Compile and run the serial version using the same commands as before. The area under the x 2 curve from X[0,1] should be approximately Turn this serial code into a parallel code by simply adding one directive in the line above of the for loop. #pragma omp parallel for private(var) reduction(operation:var) (note: The directive should be all in one line) The two new directives on this line are: for: This part of the directive tells the compiler that the next line is a for loop and then it divides the loop into equal parts for each thread, but the order that the loop is performed is nondeterministic, in other words, it does not perform the loop in chronological order. reduction: Creates a private copy of each variable for each thread. After the parallel section has ended, all the copies are combined using the operator specified, such as “+”(addition), “-”(subtraction), “*”(multiplication), etc.

The code should be similar to the one below: float f(float x) { return (x*x); } int main() int i, PRECISION = 1000; float height = 0.0; float area = 0.0, y = 0.0, x = 0.0; float dx = 1.0/(float)PRECISION; #pragma omp parallel for private(x, y) reduction(+: area) for( i = 0; i < PRECISION; i++){ x = i*dx; y = f(x); area += y*dx; } printf("Area under the curve is %f\n",area); return (0); } Expected Output The area under the curve should be approximately integration.c

More about OpenMP To learn more about OpenMP, here is a list of website that can be useful use to learn more about OpenMP These sites include good summaries summary of all OpenMP’s directives, functions, environment variables. documents/OpenMP3.0-SummarySpec.pdf documents/OpenMP3.0-SummarySpec.pdf documents/OpenMP3.0-FortranCard.pdfhttp:// documents/OpenMP3.0-FortranCard.pdf