Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.

Slides:



Advertisements
Similar presentations
Introduction to Openmp & openACC
Advertisements

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Threaded Programming Lecture 4: Work sharing directives.
Introduction to OpenMP
09/09/2010CS4961 CS4961 Parallel Programming Lecture 6: Data Parallelism in OpenMP, cont. Introduction to Data Parallel Algorithms Mary Hall September.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
Threaded Programming Lecture 2: Introduction to OpenMP.
Parallel Programming Models (Shared Address Space) 5 th week.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
10/05/2010CS4961 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP Mary Hall October 5,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
CS240A, T. Yang, Parallel Programming with OpenMP.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Computer Science Department
Shared Memory Programming with OpenMP
Multi-core CPU Computing Straightforward with OpenMP
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Introduction to OpenMP
OpenMP Parallel Programming
Shared-Memory Paradigm & OpenMP
Parallel Programming with OPENMP
WorkSharing, Schedule, Synchronization and OMP best practices
Presentation transcript:

Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1

Outline I. Introduction to OpenMP II. OpenMP Programming Model III. OpenMP Directives IV. OpenMP Clauses V. Run-Time Library Routine VI. Environment Variables VII. Summary 2

What is OpenMP Application program interface (API) that is used to explicitly direct multi-threaded, shared memory parallelism Consists of: Compiler directives Run time routines Environment variables Specification maintained by the OpenMP, Architecture Review Board ( Version 3.0 has been released May

What OpenMP is Not Not Automatic parallelization User explicitly specifies parallel execution Compiler does not ignore user directives even if wrong Not just loop level parallelism Functionality to enable coarse grained parallelism Not meant for distributed memory parallel systems Not necessarily implemented identically by all vendors Not Guaranteed to make the most efficient use of shared memory 4

History of OpenMP In the early 90's, vendors of shared-memory machines supplied similar, directive-based, Fortran programming extensions: The user would augment a serial Fortran program with directives specifying which loops were to be parallelized. First attempt at a standard was the draft for ANSI X3H5 in It was never adopted, largely due to waning interest as distributed memory machines became popular. The OpenMP standard specification started in the spring of 1997, taking over where ANSI X3H5had left off, as newer shared memory machine architectures started to become prevalent 5

Goal of OpenMP Standardization : Provide a standard among a variety of shared memory architectures/platforms Lean and mean : Establish a simple and limited set of directives for programming shared memory machines. Ease of Use : Provide capability to incrementally parallelize a serial program Provide the capability to implement both coarse-grain and fine- grain parallelism Portability : Support Fortran (77, 90, and 95), C, and C++ 6

Outline I. Introduction to OpenMP II. OpenMP Programming Model III. OpenMP Directives IV. OpenMP Clauses V. Run-Time Library Routine VI. Environment Variables VII. Summary 7

OpenMP Programming Model Thread Based Parallelism Explicit Parallelism Compiler Directive Based Dynamic Threads Nested Parallelism Support Task parallelism support (OpenMP specification 3.0) 8

Shared Memory Model 9

Execution Model 10 ID=0 ID=1,2,3…N-1

Terminology OpenMP Team=: Master + workers A parallel region is block of code executed by all threads simultaneously. Master thread always has thread ID=0 Thread adjustment is done before entering parallel region. An “if” clause can be used with parallel construct, incase the condition evaluate to FALSE, parallel region is avoided and code run serially Work-sharing construct is responsible for dividing work among the threads in parallel region 11

Example OpenMP Code Structure 12

Components of OpenMP 13

I. Introduction to OpenMP II. OpenMP Programming Model III. OpenMP Directives IV. OpenMP Clauses V. Run-Time Library Routine VI. Environment Variables VII. Summary 14

15 Go to helloworld.c

C/C++ Parallel Region Example 16 !$OMP PARALLEL write (*,*) “Hello” !$OMP END PARALLEL Hello world from thread = 0 Number of threads = 3 Hello world from thread = 1Hello world from thread = 2 thread 0 thread 1 thread 2

OpenMP Directives 17

OpenMP Scoping Static Extent: The code textually enclosed between beginning and end of structure block The static extent does not span other routines Orphaned Directive: An OpenMP directive appear independently Dynamic Extent: It include extent of both static extent and orphaned directives 18

OpenMP Parallel Regions A block of code that will be executed by multiple threads Properties - Fork-Join Model - Number of threads won’t change inside a parallel region - SPMD execution within region - Enclosed block of code must be structured, no branching into or out of block Format #pragma omp parallel clause1 clause2 … 19

OpenMP Threads How many threads? Use of the omp_set_threads() library function Setting of the OMP_NUM_THREADS environment variable Implementation default Dynamic Threads : By default, the same number of threads are used to execute each parallel region Two methods for enabling dynamic threads  Use of the omp_set_dynamic() library function  Setting of the OMP_DYNAMIC environment variable 20

OpenMP Work-sharing constructs 21 Data parallelismFunctional parallelismSerialize a section

Example: Count3s in an array Lets assume we have an array of N integers. We want to find how many 3s are in the array. We need a for loop if statement, and a count variable Lets look at its serial and parallel version 22

Serial: Count3s in an array int count, n=100; int array[n]; // initialize array for(i=0;i<length;i++) { if (array[i]==3) count++; } 23

Work-sharing construct: “for loop” “for loop” work-sharing construct is thought of as data parallelism construct. 24

Parallelize 1 st attempt: Count3s in an array int count, n=100; int array[n]; // initialize array #pragma omp parallel for default(none) shared(n,array,count) private(i) for(i=0;i<length;i++) { if (array[i]==3) count++; } 25

Work-sharing construct: Example of “for loop” #pragma omp parallel for default(none) shared(n,a,b,c) private(i) for (i=0;i<n;i++) { c[i] = a[i] + b[i]; } 26

Work-sharing construct: “section” “Section” work-sharing construct is thought of as functional parallelism construct. 27

Parallelize 2 nd attempt: Count3s in an array Say we also want to count 4s in same array. Now we have two different function i.e. count 3 and count 4. int count, n=100; int array[n]; // initialize array #pragma omp parallel sections default(none) shared(n,array,count3,count4) private(i) #pragma omp parallel section for(i=0;i<length;i++) { if (array[i]==3) count3++; } #pragma omp parallel section for(i=0;i<length;i++) { if (array[i]==4) count4++; } 28 No date race condition in this example. WHY?

#pragma omp parallel sections default(none) shared(a,b,c,d,e,n) private(i) { #pragma omp section { printf("Thread %d executes 1st loop \n”,omp_get_thread_num()); for(i=0;i<n;i++) a[i]=3*b[i]; } #pragma omp section { printf("Thread %d executes 1st loop \n”,omp_get_thread_num()); for(i=0;i<n;i++) e[i]=2*c[i]+d[i]; } final_sum=sum(a,n) + sum(e,n); printf("FINAL_SUM is %d\n",final_sum) Work-sharing construct: Example 1 of “section” 29

Work-sharing construct: Example 2 of “section” 1/2 30

Work-sharing construct: Example 2 of “section” 2/2 31

Work-sharing construct: Example of “single” In parallel region “single block” is used to specify that this block is executed only by one thread in the team of threads. 32 Lets look at an example

I. Introduction to OpenMP II. OpenMP Programming Model III. OpenMP Directives IV. OpenMP Clauses V. Run-Time Library Routine VI. Environment Variables VII. Summary 33

OpenMP Clauses: Data sharing 1/2 shared(list) shared clause is used to specify which data is shared among thread. All threads can read and write to this shared variable. By default all variables are shared. private(list) private variable are local to thread. Typical example of private variable is loop counter, since each thread has its own loop counter initialized at entry point. 34

A private variable is defined between entry and exit point of parallel region. A private variable within parallel region has no scope out side of it firstprivate and lastprivate clauses are used to increase scope of variable beyond parallel region. firstprivate: All variables in the list are initialized with the original value that object had before entering parallel region lastprivate: The thread that executes the last iteration or section updates the value of object in list. 35 OpenMP Clauses: Data sharing 2/2

Example: firstprivate and lastprivate int main(){ int C, B, A= 10; /*--- Start of parallel region ---*/ #pragma omp parallel for default(none) firstprivate(A) lastprivate(B) private(i) for (i=0;i<n;i++) { … B = i + A; … } /*--- End of parallel region ---*/ C=B; } 36

OpenMP Clauses: nowait nowait clause is used to avoid implicit synchronization at end of work-sharing directive 37

OpenMP Clause: schedule schedule clause is supported in loop construct only. Used to control the manner in which loop iterations are distributed over the threads. Syntax: schedule(kind[,chunk_size) Types: static[,chunk]: distribute iterations in blocks of size “chunk over the threads in a round-robin fashion dynamic[,chunk]: fixed portions of work; size is controlled by the value chunk, when thread finishes its portion it starts with next portion. guided[,chunk]: same as “dynamic”, but size of the portion of work decreases exponentially. runtime[,chunk]: iteration scheduling scheme is set at runtime thought environment variable OMP_SCHEDULE 38

The Experiment with schedule clause 39

OpenMP Critical construct int main(){ int sum, n=5; int a[5]={1,2,3,4,5}; /*--- Start of parallel region ---*/ #pragma omp parallel for default(none) shared(sum,a,n) private(i) for (i=0;i<n;i++) { sum += a[i]; } /*--- End of parallel region ---*/ printf(“sum of vector a =%d”,sum); } 40 Example summation of a vector race condition

OpenMP Critical construct int main(){ int sum, local_sum, n=5; int a[5]={1,2,3,4,5}; /*--- Start of parallel region ---*/ #pragma omp parallel default(none) shared(sum,a,n) private(local_sum,i) { #pragma omp for for (i=0;i<n;i++) { local_sum += a[i]; } #pragma omp critical { sum+=local_sum } }/*--- End of parallel region ---*/ printf(“sum of vector a =%d”,sum); } 41

Parallelize 3 rd attempt: Count3s in an array int count, n=100; int array[n]; // initialize array #pragma omp parallel default(none) shared(n,array,count) private(i,local_count) { #pragma omp parallel for for(i=0;i<length;i++) { if (array[i]==3) local_count ++; } #pragma omp critical { count+=local_count } } /*--- End of Parallel region ---*/ 42

OpenMP Clause: reduction int main(){ int sum, n=5; int a[5]={1,2,3,4,5}; /*--- Start of parallel region ---*/ #pragma omp parallel for default(none) shared(a,n) private(i)\ reduction(+:sum) for (i=0;i<n;i++) { sum += a[i]; } /*--- End of parallel region ---*/ printf(“sum of vector a =%d”,sum); } 43 OpenMP provides a reduction clause which is used with for loop and section directives. reduction variable must be shared among threads race condition is avoided implicitly.

Parallelize 4 th attempt: Count3s in an array int count, n=100; int array[n]; // initialize array #pragma omp parallel for default(none) shared(n,array) private(i) \ for(i=0;i<length;i++) { if (array[i]==3) count++; } /*--- End of Parallel region ---*/ 44 reduction(+:count)

Tasking in OpenMP 45

Tasking in OpenMP In OpenMP 3.0 the concept of tasks has been added to the OpenMP execution model The Task model is useful is case where the number of parallel pieces and the work involved in each piece varies and/or unknown Before inclusion of the Task model OpenMP was not suited for unstructured problem Tasks are often set up within a single construct in a manager-worker model. 46

Task Parallelism Approach 1/2 Threads line up as workers, go through the queue of work to be done, and do a task Threads do not wait, as in loop parallelism, rather go back to queue and do more tasks. Each task is executed serially by work thread that encounter that task in queue. Load balancing occur as short and long task are done as threads become available. 47

Task Parallelism Approach 2/2 48

Example: Task parallelism 49

Best Practices Optimize barrier use Avoid ordered construct Avoid large critical regions Maximize parallel regions Avoid multiple use of parallel regions Address poor load balance 50

I. Introduction to OpenMP II. OpenMP Programming Model III. OpenMP Directives IV. OpenMP Clauses V. Run-Time Library Routine VI. Environment Variables VII. Summary 51

List of runtime library routine Runtime library routine are provided in omp.h header file void omp_set_num_threads(int num); int omp_get_num_threads(); int omp_get_max_threads(); int omp_get_thread_num(); int omp_get_thread_limit(); int omp_get_num_procs(); double omp_get_wtime(); int omp_in_parallel(); // return 0 false and non-zero true Few more 52

More list of runtime library routine These routine are new with OpenMP

I. Introduction to OpenMP II. OpenMP Programming Model III. OpenMP Directives IV. OpenMP Clauses V. Run-Time Library Routine VI. Environment Variables VII. Summary 54

Environment Variable OMP_NUM_THREAD OMP_DYANMIC OMP_THREAD_LIMIT OMP_STACKSIZE 55

I. Introduction to OpenMP II. OpenMP Programming Model III. OpenMP Directives IV. OpenMP Clauses V. Run-Time Library Routine VI. Environment Variables VII. Summary 56

Summary OpenMP provides small but yet powerful programming model Compilers with OpenMP support are widely available OpenMP is a directive based shared memory programming model OpenMP API is a general purpose parallel programming API with emphasis on the ability to parallelize existing programs Scalable parallel programs can be written by using parallel regions Work-sharing constructs enable efficient parallelization of computationally intensive portions of program 57

Thank You and Exercise Session 58