Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Slides:



Advertisements
Similar presentations
OpenMP.
Advertisements

Parallel Processing with OpenMP
Introduction to Openmp & openACC
Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
3.5 Interprocess Communication
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.
CS 470/570:Introduction to Parallel and Distributed Computing.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
High Performance Computation --- A Practical Introduction Chunlin Tian NAOC Beijing 2011.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
OMPi: A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
4.1 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 4: Threads Overview Multithreading Models Thread Libraries  Pthreads  Windows.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
CS333 Intro to Operating Systems Jonathan Walpole.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
CS 838: Pervasive Parallelism Introduction to pthreads Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
NCHU System & Network Lab Lab #6 Thread Management Operating System Lab.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Computer Engg, IIT(BHU)
Introduction to OpenMP
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Shared Memory Programming with OpenMP
Chapter 4: Threads Overview Multithreading Models Thread Libraries
Multi-core CPU Computing Straightforward with OpenMP
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.
Introduction to High Performance Computing Lecture 20
Programming with Shared Memory Introduction to OpenMP
Threads Chapter 4.
Introduction to OpenMP
Shared-Memory Paradigm & OpenMP
Presentation transcript:

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick

Shared Memory Address space Processes

Shared Memory Multiprocessor

Distributed vs. DSM Address space Processes Address space Processes Address space Processes Network Memory Processes Memory Processes Memory Processes Network - Global address space

Parallel Programming Alternatives Use a new programming language Use a existing sequential language modified to handle parallelism Use a parallelizing compiler Use library routines/compiler directives with an existing sequential language –Shared memory (OpenMP) vs. distributed memory (MPI)

What is Shared Memory Parallelization? All processors can access all the memory in the parallel system (one address space). The time to access the memory may not be equal for all processors –not necessarily a flat memory Parallelizing on a SMP does not reduce CPU time –it reduces wallclock time Parallel execution is achieved by generating multiple threads which execute in parallel Number of threads (in principle) is independent of the number of processors

Threads: The Basis of SMP Parallelization Threads are not full UNIX processes. They are lightweight, independent "collections of instructions" that execute within a UNIX process. All threads created by the same process share the same address space. –a blessing and a curse: "inter-thread" communication is efficient, but it is easy to stomp on memory and create race conditions. Because they are lightweight, they are (relatively) inexpensive to create and destroy. –Creation of a thread can take three orders of magnitude less time than process creation! Threads can be created and assigned to multiple processors: This is the basis of SMP parallelism!

Processes vs. Threads Process IP stack code heap IP stack code heap IP stack Threads

Methods of SMP Parallelism 1. Explicit use of threads Pthreads: see "Pthreads Programming" from O'Reilly & Associates, Inc. 2. Using a parallelizing compiler and its directives, you can generate pthreads "under the covers." can use vendor-specific directives (e.g. !SMP$) can use industry-standard directives (e.g. !$OMP and OpenMP)

OpenMP 1997: group of hardware and software vendors announced their support for OpenMP, a new API for multi-platform shared-memory programming (SMP) on UNIX and Microsoft Windows NT platforms. – OpenMP provides comment-line directives, embedded in C/C++ or Fortran source code, for –scoping data –specifying work load –synchronization of threads OpenMP provides function calls for obtaining information about threads. –e.g., omp_num_threads(), omp_get_thread_num()

OpenMP example Subroutine saxpy(z, a, x, y, n) integer i, n real z(n), a, x(n), y !$omp parallel do do i = 1, n z(i) = a * x(i) + y end do return end

OpenMP Threads 1.All OpenMP programs begin as a single process: the master thread 2.FORK: the master thread then creates a team of parallel threads 3.Parallel region statements executed in parallel among the various team threads 4.JOIN: threads synchronize and terminate, leaving only the master thread

Private vs Shared Variables zaxyni Global shared memory Serial execution All data references to global shared memory zaxyn Global shared memory Parallel execution References to z, a, x, y, n are to global shared memory iiii Each thread has a private copy of i References to i are to the private copy

Division of Work Subroutine saxpy(z, a, x, y, n) integer i, n real z(n), a, x(n), y !$omp parallel do do i = 1, n z(i) = a * x(i) + y end do return end i = 11, 20 n = 40, 4 threads i = 21, 30i = 31, 40i = 1, 10 Z(10) n Z(11) Z(20)Z(21) Z(1) Z(30) a y Z(31)Z(40) X(10) X(11) X(20)X(21) X(1) X(30)X(31)X(40) Global shared memory local memory

Variable Scoping The most difficult part of shared memory parallelization. –What memory is shared –What memory is private (i.e. each processor has its own copy) –How private memory is treated vis à vis the global address space. Variables are shared by default, except for loop index in parallel do This must mesh with the Fortran view of memory –Global: shared by all routines –Local: local to a given routine saved vs. non-saved variables (through the SAVE statement or -save option)

Static vs. Automatic Variables Fortran 77 standard allows subprogram local variables to become undefined between calls, unless saved with a SAVE statement STATICAUTOMATIC AIX(default)-qnosave IRIX-static-automatic (default) SunOS(default)-statckvar

OpenMP Directives in Fortran Line continuation: Fixed form: !$OMP PARALLEL DO !$OMP&PRIVATE (JMAX) !$OMP&SHARED(A, B) Free form: !$OMP PARALLEL DO & !$OMP PRIVATE (JMAX) & !$OMP SHARED(A, B)

OpenMP in C Same functionality as OpenMP for FORTRAN Differences in syntax: –#pragma omp for Differences in variable scoping: –variables "visible" when #pragma omp parallel encountered are shared by default static variables declared within a parallel region are also shared heap allocated memory (malloc) is shared (but pointer can be private) automatic storage declared within a parallel region is private (ie, on the stack)

OpenMP Overhead Overhead for parallelization is large (eg cycles for parallel do over 16 processors of SGI Origin 2000) –size of parallel work construct must be significant enough to overcome overhead –rule of thumb: it takes 10 kFLOPS to amortize overhead

OpenMP Use How is OpenMP typically used? OpenMP is usually used to parallelize loops: –Find your most time consuming loops. –Split them up between threads. Better scaling can be obtained using OpenMP parallel regions, but can be tricky!

OpenMP vs. MPI Only for shared memory computers Easy to incrementally parallelize –More difficult to write highly scalable programs Small API based on compiler directives and limited library routines Same program can be used for sequential and parallel execution Shared vs private variables can cause confusion Portable to all platforms Parallelize all or nothing Vast collection of library routines Possible but difficult to use same program for serial and parallel execution variables are local to each processor

References Parallel Programming in OpenMP, by Chandra et al. (Morgan Kauffman) Multimedia tutorial at Boston University: –scv.bu.edu/SCV/Tutorials/OpenMP/ Lawrence Livemore online tutorial – European workshop on OpenMP (EWOMP) –