High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.

Slides:



Advertisements
Similar presentations
Introduction to Openmp & openACC
Advertisements

Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
2 3 Parent Thread Fork Join Start End Child Threads Compute time Overhead.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Threaded Programming Lecture 2: Introduction to OpenMP.
Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
Heterogeneous Computing using openMP lecture 2 F21DP Distributed and Parallel Technology Sven-Bodo Scholz.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
CS240A, T. Yang, Parallel Programming with OpenMP.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
SHARED MEMORY PROGRAMMING WITH OpenMP
Shared-memory Programming
CS427 Multicore Architecture and Parallel Computing
Parallelize Codes Using Intel Software
Loop Parallelism and OpenMP CS433 Spring 2001
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
Computer Science Department
Shared Memory Programming with OpenMP
Parallel Programming with OpenMP
Programming with Shared Memory Introduction to OpenMP
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Introduction to OpenMP
Shared-Memory Paradigm & OpenMP
Parallel Programming with OPENMP
Presentation transcript:

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer Engineering Purdue University, West Lafayette, IN

High-Performance Parallel Scientific Computing 2008 Purdue University Shared Memory Parallel Programming in the Multi-Core Era Desktop and Laptop –2, 4, 8 cores and … ? A single node in distributed memory clusters –Steele cluster node: 2  8  (16) cores –/proc/cpuinfo Shared memory hardware Accelerators Cell processors: 1 PPE and 8 SPEs Nvidia Quadro GPUs: 128 processing units

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP: Some syntax details to get us started Most of the constructs in OpenMP are compiler directives or pragmas. –For C and C++, the pragmas take the form: #pragma omp construct [clause [clause]…] –For Fortran, the directives take one of the forms: C$OMP construct [clause [clause]…] !$OMP construct [clause [clause]…] *$OMP construct [clause [clause]…] Include files #include “omp.h”

High-Performance Parallel Scientific Computing 2008 Purdue University How is OpenMP typically used? OpenMP is usually used to parallelize loops: Find your most time consuming loops. Split them up between threads. void main() { int i, k, N=1000; double A[N], B[N], C[N]; for (i=0; i<N; i++) { A[i] = B[i] + k*C[i] } } Sequential Program #include “omp.h” void main() { int i, k, N=1000; double A[N], B[N], C[N]; #pragma omp parallel for for (i=0; i<N; i++) { A[i] = B[i] + k*C[i]; } } Parallel Program

High-Performance Parallel Scientific Computing 2008 Purdue University How is OpenMP typically used? Single Program Multiple Data (SPMD) (Cont.) Thread 0 void main() { int i, k, N=1000; double A[N], B[N], C[N]; lb = 0; ub = 250; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; } } Thread 1 void main() { int i, k, N=1000; double A[N], B[N], C[N]; lb = 250; ub = 500; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; } } Thread 2 void main() { int i, k, N=1000; double A[N], B[N], C[N]; lb = 500; ub = 750; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; } } Thread 3 void main() { int i, k, N=1000; double A[N], B[N], C[N]; lb = 750; ub = 1000; for (i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; } } #include “omp.h” void main() { int i, k, N=1000; double A[N], B[N], C[N]; #pragma omp parallel for for (i=0; i<N; i++) { A[i] = B[i] + k*C[i]; } } Parallel Program

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Fork-and-Join model printf(“program begin\n”); N = 1000; #pragma omp parallel for for (i=0; i<N; i++) A[i] = B[i] + C[i]; M = 500; #pragma omp parallel for for (j=0; j<M; j++) p[j] = q[j] – r[j]; printf(“program done\n”); Serial Parallel Serial Parallel

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Constructs OpenMP’s constructs: 1.Parallel Regions 2.Worksharing (for/DO, sections, …) 3.Data Environment (shared, private, …) 4.Synchronization (barrier, flush, …) 5.Runtime functions/environment variables (omp_get_num_threads(), …)

High-Performance Parallel Scientific Computing 2008 Purdue University Most OpenMP constructs apply to structured blocks. –Structured block: a block with one point of entry at the top and one point of exit at the bottom. –The only “branches” allowed are STOP statements in Fortran and exit() in C/C++. A structured block Not A structured block OpenMP: Structured blocks (C/C++) if(count==1) goto more; #pragma omp parallel { more: do_big_job(id); if(++count>1) goto done; } done: if(!really_done()) goto more; #pragma omp parallel { more: do_big_job(id); if(++count>1) goto more; } printf(“ All done \n”);

High-Performance Parallel Scientific Computing 2008 Purdue University Structured Block Boundaries In C/C++: a block is a single statement or a group of statements between brackets {} In Fortran: a block is a single statement or a group of statements between directive/end-directive pairs. #pragma omp for for (I=0;I<N;I++) { res[I] = big_calc(I); A[I] = B[I] + res[I]; } #pragma omp parallel { id = omp_thread_num(); A[id] = big_compute(id); } C$OMP PARALLEL 10 W(id) = garbage(id) res(id) = W(id)**2 if(res(id) goto 10 C$OMP END PARALLEL C$OMP PARALLEL DO do I=1,N res(I)=bigComp(I) end do C$OMP END PARALLEL DO

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Parallel Regions Each thread executes the same code redundantly. double A[1000]; omp_set_num_threads(4); #pragma omp parallel { int ID = omp_get_thread_num(); pooh(ID, A); } printf(“all done\n”); omp_set_num_threads(4) pooh(1,A)pooh(2,A)pooh(3,A) printf(“all done\n”); pooh(0,A) double A[1000]; A single copy of A is shared between all threads. Threads wait here for all threads to finish before proceeding (I.e. a barrier)

High-Performance Parallel Scientific Computing 2008 Purdue University The OpenMP API Combined parallel work-share OpenMP shortcut: Put the “parallel” and the work-share on the same line int i; double res[MAX]; #pragma omp parallel { #pragma omp for for (i=0;i< MAX; i++) { res[i] = huge(); } int i; double res[MAX]; #pragma omp parallel for for (i=0;i< MAX; i++) { res[i] = huge(); } These are equivalent

High-Performance Parallel Scientific Computing 2008 Purdue University Shared Memory Model Data can be shared or private Shared data is accessible by all threads Private data can be accessed only by the threads that owns it Data transfer is transparent to the programmer Shared Memory Shared Memory thread1 thread2 thread3 thread4 thread5 private

High-Performance Parallel Scientific Computing 2008 Purdue University Data Environment: Default storage attributes Shared Memory programming model: –Most variables are shared by default Global variables are SHARED among threads –Fortran: COMMON blocks, SAVE variables, MODULE variables –C: File scope variables, static But not everything is shared... –Stack variables in sub-programs called from parallel regions are PRIVATE –Automatic variables within a statement block are PRIVATE.

High-Performance Parallel Scientific Computing 2008 Purdue University Critical Construct sum = 0; #pragma omp parallel private (lsum) { lsum = 0; #pragma omp for for (i=0; i<N; i++) { lsum = lsum + A[i]; } #pragma omp critical { sum += lsum; } } Threads wait their turn; only one thread at a time executes the critical section

High-Performance Parallel Scientific Computing 2008 Purdue University Reduction Clause sum = 0; #pragma omp parallel for reduction (+:sum) for (i=0; i<N; i++) { sum = sum + A[i]; } Shared variable

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP example: PI !$OMP PARALLEL PRIVATE(X, i) write (*,1004) omp_get_thread_num() !$OMP DO REDUCTION (+:sum) DO i = 1, , 1 x = step*((-0.5)+i) sum = sum + 4.0/(1.0+x**2) ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL

High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP example Running OpenMP applications on Steele qsub –I –l nodes=1:ppn=8 module avail module load intel icc/ifort omp_pi.f –o omp_pi –openmp setenv OMP_NUM_THREADS 4 time./pi

High-Performance Parallel Scientific Computing 2008 Purdue University Summary OpenMP is great for parallel programming –It allows parallel programs to be written incrementally. –Sequential programs can be enhanced with OpenMP directives, leaving the original program essentially intact. –Compared to MPI: you don’t need to partition data and insert messages in OpenMP programs

High-Performance Parallel Scientific Computing 2008 Purdue University Resources