Computer Engg, IIT(BHU)

Slides:

Advertisements

Similar presentations

Parallel Processing with OpenMP

Advertisements

Scheduling and Performance Issues for Programming using OpenMP

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.

Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.

Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.

PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu

Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.

1 OpenMP—An API for Shared Memory Programming Slides are based on:

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.

CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.

OpenMPI Majdi Baddourah

A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.

1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.

Programming with Shared Memory Introduction to OpenMP

Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.

Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Parallel Programming in Java with Shared Memory Directives.

Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.

Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

OpenMP fundamentials Nikita Panov

Threaded Programming Lecture 4: Work sharing directives.

Introduction to OpenMP

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Threaded Programming Lecture 2: Introduction to OpenMP.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

NCHU System & Network Lab Lab #6 Thread Management Operating System Lab.

Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

B. Estrade, LSU – High Performance Computing Enablement Group OpenMP II B. Estrade.

NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.

Chapter 4 – Thread Concepts

Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.

Processes and threads.

Introduction to OpenMP

Shared Memory Parallelism - OpenMP

SHARED MEMORY PROGRAMMING WITH OpenMP

CS427 Multicore Architecture and Parallel Computing

Chapter 4 – Thread Concepts

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.

Open[M]ulti[P]rocessing

Introduction to OpenMP

Shared-Memory Programming

Computer Science Department

Shared Memory Programming with OpenMP

Chapter 4: Threads Overview Multithreading Models Thread Libraries

Parallel Programming with OpenMP

Programming with Shared Memory

Introduction to High Performance Computing Lecture 20

Programming with Shared Memory Introduction to OpenMP

Shared Memory Programming

Programming with Shared Memory

Introduction to OpenMP

Chapter 4: Threads & Concurrency

Chapter 4: Threads.

CS510 Operating System Foundations

WorkSharing, Schedule, Synchronization and OMP best practices

Presentation transcript:

Computer Engg, IIT(BHU) OpenMP-1 3/12/2013 Computer Engg, IIT(BHU)

OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead, OpenMP specifies a set of subroutines in an existing language (FORTRAN, C) for parallel programming on a shared memory machine

The Idea of OpenMP On a single processor, multithreading generally occurs by time-division multiplexing (as in multitasking): the processor switches between different threads. This context switching generally happens frequently enough that the user perceives the threads or tasks as running at the same time. On a multiprocessor or multi-core system, the threads or tasks will actually run at the same time, with each processor or core running a particular thread or task.

The Goals of OpenMP Standardization Lean and Mean Ease of Use Portability

Programming language OpenMP is a multithreading method of parallelization whereby the master "thread“ "forks" a specified number of slave "threads" and a task is divided among them. The threads then run concurrently, with the runtime environment allocating threads to different processors The section of code that is meant to run in parallel is marked accordingly, with a preprocessor directive that will cause the threads to form before the section is executed. After the execution of the parallelized code, the threads "join" back into the master thread, which continues onward to the end of the program.

The main Advantageous No message passing OpenMP directives or library calls may be incorporated incrementally. The code is in effect a serial code. Code size increase is generally smaller. OpenMP-enabled codes tend to be more readable Vendor involvement

OpenMP Programming Model Shared Memory Model Thread Based Parallelism Explicit Based Parallelism Fork-join Model Compiler-Directive Based Nested Parallelism Dynamic Threads

OpenMP API The OpenMP API is comprised of three distinct components. Compiler Directives Runtime Library Routines Environment Variables

Compiler Directives OpenMP compiler directives are used for various purposes: Spawning a parallel region Dividing blocks of code among threads Distributing loop iterations between threads Serializing sections of code Synchronization of work among threads

Compiler directives have the following syntax: sentinel Directive-name [clause, ...]

Run-Time Library Routines These routines are used for a variety of purposes: Setting and querying the number of threads Querying a thread's unique identifier (thread ID), a thread's ancestor's identifier, the thread team size Setting and querying the dynamic threads feature Querying if in a parallel region, and at what level Setting and querying nested parallelism Setting, initializing and terminating locks and nested locks Querying wall clock time and resolution

Environment Variables These environment variables can be used to control such things as: Setting the number of threads Specifying how loop iterations are divided Binding threads to processors Enabling/disabling nested parallelism; setting the maximum levels of nested parallelism Enabling/disabling dynamic threads Setting thread stack size Setting thread wait policy

OpenMP Directives Thread creation omp parallel: It is used to fork additional threads to carry out the work enclosed in the construct in parallel. The original process will be denoted as master thread with thread ID 0. Work-sharing constructs: used to specify how to assign independent work to one or all of the threads. omp for or omp do: used to split up loop iterations among the threads, also called loop constructs.

Motivation

Work Sharing Constructs Continue sections: assigning consecutive but independent code blocks to different threads single: specifying a code block that is executed by only one thread, a barrier is implied in the end master: similar to single, but the code block will be executed by the master thread only and no barrier implied in the end.

OpenMP clauses Data sharing attribute clauses shared: the data within a parallel region is shared, which means visible and accessible by all threads simultaneously . By default, all variables in the work sharing region are shared except the loop iteration counter. private: the data within a parallel region is private to each thread, which means each thread will have a local copy and use it as a temporary variable. By default, the loop iteration counters in the OpenMP loop constructs are private.

OpenMP Clauses Continue default: allows the programmer to state that the default data scoping within a parallel region will be either shared, or none for C/C++.The none option forces the programmer to declare each variable in the parallel region using the data sharing attribute clauses. firstprivate: like private except initialized to original value. lastprivate: like private except original value is updated after construct.

OpenMP Clauses Continue Synchronization clauses critical section: the enclosed code block will be executed by only one thread at a time, and not simultaneously executed by multiple threads. It is often used to protect shared data from race conditions. atomic: similar to critical section, but advise the compiler to use special hardware instructions for better performance. Compilers may choose to ignore this suggestion from users and use critical section instead. ordered: the structured block is executed in the order in which iterations would be executed in a sequential loop

OpenMP Clauses Continue barrier: each thread waits until all of the other threads of a team have reached this point. A work-sharing construct has an implicit barrier synchronization at the end. nowait: specifies that threads completing assigned work can proceed without waiting for all threads in the team to finish. In the absence of this clause, threads encounter a barrier synchronization at the end of the work sharing construct.

OpenMP Clauses Continue Scheduling clauses schedule(type, chunk): The iteration(s) in the work sharing construct are assigned to threads according to the scheduling method defined by this clause. These are: static: Here, all the threads are allocated iterations before they execute the loop iterations. The iterations are divided among threads equally by default. However, specifying an integer for the parameter "chunk" will allocate "chunk" number of contiguous iterations to a particular thread.

OpenMP Clauses Continue dynamic: Here, some of the iterations are allocated to a smaller number of threads. Once a particular thread finishes its allocated iteration, it returns to get another one from the iterations that are left. The parameter "chunk" defines the number of contiguous iterations that are allocated to a thread at a time. guided: A large chunk of contiguous iterations are allocated to each thread dynamically (as above). The chunk size decreases exponentially with each successive allocation to a minimum size specified in the parameter "chunk"

OpenMP Clauses Continue Initialization Firstprivate(variables): the data is private to each thread, but initialized using the value of the variable using the same name from the master thread. Lastprivate(variables): the data is private to each thread. The value of this private data will be copied to a global variable using the same name outside the parallel region if current iteration is the last iteration in the parallelized loop. A variable can be both firstprivate and lastprivate.

OpenMP Clauses Continue Reduction reduction(operator | intrinsic : list): the variable has a local copy in each thread, but the values of the local copies will be summarized (reduced) into a global shared variable. This is very useful if a particular operation (specified in "operator" for this particular clause) on a datatype that runs iteratively so that its value at a particular iteration depends on its value at a previous iteration. Basically, the steps that lead up to the operational increment are parallelized, but the threads gather up and wait before updating the datatype ,then increments the datatype in order so as to avoid racing condition