Parallel Programming 0024 Week 10 Thomas Gross Spring Semester 2010 May 20, 2010.

Slides:

Advertisements

Similar presentations

Advertisements

2. Processes and Interactions 2.1 The Process Notion 2.2 Defining and Instantiating Processes –Precedence Relations –Implicit Process Creation –Dynamic.

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.

May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.

1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.

Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.

Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.

PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu

1 OpenMP—An API for Shared Memory Programming Slides are based on:

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.

CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.

OpenMPI Majdi Baddourah

INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.

Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.

– 1 – Basic Machine Independent Performance Optimizations Topics Load balancing (review, already discussed) In the context of OpenMP notation Performance.

Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.

Programming with Shared Memory Introduction to OpenMP

Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.

Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

Parallel Programming in Java with Shared Memory Directives.

Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.

OpenMP China MCP.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.

1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.

4.1 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 4: Threads Overview Multithreading Models Thread Libraries  Pthreads  Windows.

Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.

CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.

Chapter 3 Parallel Programming Models. Abstraction Machine Level – Looks at hardware, OS, buffers Architectural models – Looks at interconnection network,

Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j

OpenMP fundamentials Nikita Panov

Threaded Programming Lecture 4: Work sharing directives.

Introduction to OpenMP

DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel.

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (

Threaded Programming Lecture 2: Introduction to OpenMP.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.

CPE779: More on OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.

COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University

OpenMP – Part 2 * *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

Embedded Systems MPSoC Architectures OpenMP: Exercises Alberto Bosio

Introduction to OpenMP

Shared Memory Parallelism - OpenMP

SHARED MEMORY PROGRAMMING WITH OpenMP

CS427 Multicore Architecture and Parallel Computing

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.

Open[M]ulti[P]rocessing

Computer Engg, IIT(BHU)

Supporting OpenMP and other Higher Languages in Dyninst

Introduction to OpenMP

Computer Science Department

Shared Memory Programming with OpenMP

Chapter 4: Threads Overview Multithreading Models Thread Libraries

Multi-core CPU Computing Straightforward with OpenMP

Programming with Shared Memory Introduction to OpenMP

Allen D. Malony Computer & Information Science Department

Introduction to OpenMP

OpenMP Parallel Programming

Mattan Erez The University of Texas at Austin

Presentation transcript:

Parallel Programming 0024 Week 10 Thomas Gross Spring Semester 2010 May 20, 2010

Outline Evaluation Discussion of Homework 09 Presentation of Homework 10 – OpenMP revisited – JOMP – Block Matrix Multiplication Questions?

Evaluation – Vielen Dank 16 Fragebögen + gut vorbereitet + kompetent + begeistert + freundlich und hilfsbereit + gute Fragen - Englische Aussprache + interessante und informative Beispiele + nicht nur Powerpoint + gutes Arbeitsklima + gute Erklärungen

OpenMP in a Nutshell OpenMP is an API that consists of three parts – Directive-based language extension – Runtime library routines – Environment variables Three categories of language extensions – Control structures to express parallelism – Data environment constructs to express communication – Synchronization constructs for synchronization

Parallel Control Structures Alter flow of control in a program – fork/join model single master thread fork worker threads parallel region worker threads join master thread continues

Parallel Control Structures Two kinds of parallel constructs – Create multiple threads (parallel directive)‏ – Divide work between an existing set of threads Parallel directive – Start a parallel region For directive – Exploit data-level parallelism (parallelize loops)‏ Sections directive – Exploit thread-level parallelism (parallelize tasks)‏ (Task directive (OpenMP 3.0))‏ – Task with ordering (not possible with sections)‏

Communication & Data Environment Master thread (MT) exists the entire execution MT encounters a parallel construct – Create a set of worker threads – Stack is private to each thread Data Scoping – Shared variable: single storage location – Private variable: multiple storage locations (1 per thread)‏

Synchronization Co-ordination of execution of multiple threads Critical directive: implement mutual exclusion – Exclusive access for a single thread Barrier directive: event synchronization – Signal the occurrence of an event

Exploiting Loop-Level Parallelism Important: program correctness Data dependencies: – If two threads read from the same location and at least one thread writes to that location Data dependence

Exploiting Loop-Level Parallelism Important: program correctness Data dependencies: – If two threads read from the same location and at least one thread writes to that location Data dependence – Example for (i = 1; i < N; i++)‏ a[i] = a[i] + a[i - 1]; Loop carried dependence

Can the loops be parallelized? for (i = 1; i < n; i+= 2)‏ a[i] = a[i] + a[i – 1]

Can the loops be parallelized? for (i = 1; i < n; i+= 2)‏ a[i] = a[i] + a[i – 1] No dependence

Can the loops be parallelized? for (i = 1; i < n; i+= 2)‏ a[i] = a[i] + a[i – 1] for (i = 0; i < n/2; i++)‏ a[i] = a[i] + a[i + n/2] No dependence

Can the loops be parallelized? for (i = 1; i < n; i+= 2)‏ a[i] = a[i] + a[i – 1] for (i = 0; i < n/2; i++)‏ a[i] = a[i] + a[i + n/2] No dependence

Can the loops be parallelized? for (i = 1; i < n; i+= 2)‏ a[i] = a[i] + a[i – 1] for (i = 0; i < n/2; i++)‏ a[i] = a[i] + a[i + n/2] for (i = 0; i < n/2+1; i++)‏ a[i] = a[i] + a[i + n/2] No dependence

Can the loops be parallelized? for (i = 1; i < n; i+= 2)‏ a[i] = a[i] + a[i – 1] for (i = 0; i < n/2; i++)‏ a[i] = a[i] + a[i + n/2] for (i = 0; i < n/2+1; i++)‏ a[i] = a[i] + a[i + n/2] No dependence Dependence: read(0+n/2)‏ write(n/2)‏

Important directives for the assignment //omp parallel shared (a,b) private (c,d)‏ – Starts a parallel region – Shared: variable is shared across all threads – Private: each thread maintains a private copy

Important directives for the assignment //omp parallel shared (a,b) private (c,d)‏ – Starts a parallel region – Shared: variable is shared across all threads – Private: each thread maintains a private copy //omp for schedule(dynamic or static)‏ – Distribute loop iterations to worker threads – Dynamic: loop-chunks are assigned to threads at runtime – Static: loop-chunk assignment before the loop is executed

Important directives for the assignment //omp critical Code section is executed by a single thread at a time

Assignment 10 Task 1 – Parallelize an existing implementation with OpenMP – Which loop nest would you parallelize? Do you need a critical section? Task 2 – Implement a Block Matrix Multiplication – Divide the source matrices into sub-matrices – Assign a thread to each sub-matrix Which one performs better? Due: 1 Week

OpenMP in Java Not natively supported by Java JOMP: source to source compiler How to use? – Download jar file from course page – Import external jar to your project (classpath)‏ – Perform the following steps java jomp.compiler.Jomp file(.jomp) -> file.java javac file.java java file

Any questions?