September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.

Slides:



Advertisements
Similar presentations
OpenMP.
Advertisements

Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
May 2, 2015©2006 Craig Zilles1 (Easily) Exposing Thread-level Parallelism  Previously, we introduced Multi-Core Processors —and the (atomic) instructions.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations also see:
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
INTEL CONFIDENTIAL OpenMP for Domain Decomposition Introduction to Parallel Programming – Part 5.
CS 470/570 Lecture 7 Dot Product Examples Odd-even transposition sort More OpenMP Directives.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
1 Datamation Sort 1 Million Record Sort using OpenMP and MPI Sammie Carter Department of Computer Science N.C. State University November 18, 2004.
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Lecture 8: Caffe - CPU Optimization
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Hybrid MPI and OpenMP Parallel Programming
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Introduction to OpenMP
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Parallel Processing1 High Performance Computing (CS 540) Shared Memory Programming with OpenMP and Pthreads * Jeremy R. Johnson *Some of this lecture was.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
Threaded Programming Lecture 2: Introduction to OpenMP.
Introduction to Pragnesh Patel 1 NICS CSURE th June 2015.
CS/EE 217 GPU Architecture and Parallel Programming Lecture 23: Introduction to OpenACC.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.
MPI and OpenMP (Lecture 25, cs262a)
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
Shared-memory Programming
CS427 Multicore Architecture and Parallel Computing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Multi-core CPU Computing Straightforward with OpenMP
Parallel Programming with OpenMP
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.
Programming with Shared Memory Introduction to OpenMP
Distributed Systems CS
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Programming with Shared Memory
Hybrid Parallel Programming
Introduction to OpenMP
Programming with Shared Memory
Hybrid MPI and OpenMP Parallel Programming
Introduction to Parallel Computing
OpenMP Parallel Programming
Shared-Memory Paradigm & OpenMP
Presentation transcript:

September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing

September 4, 1997 Introduction Objective: To further study the shared memory model of parallel programming. Introduction to the OpenMP standard for shared memory parallel programming Topics OpenMP vs. Pthreads hello_pthreadsc hello_openmp.c Parallel Regions and execution model Data parallelism with loops Shared vs. private variables Scheduling and chunk size Synchronization and reduction variables Functional parallelism with parallel sections Case Studies Parallel Processing

OpenMP Extension to FORTRAN, C/C++ Shared memory model Uses directives (comments in FORTRAN, pragma in C/C++) ignored without compiler support Some library support required Shared memory model parallel regions loop level parallelism implicit thread model communication via shared address space private vs. shared variables (declaration) explicit synchronization via directives (e.g. critical) library routines for returning thread information (e.g. omp_get_num_threads(), omp_get_thread_num() ) Environment variables used to provide system info (e.g. OMP_NUM_THREADS) Parallel Processing

Benefits Provides incremental parallelism Small increase in code size Simpler model than message passing Easier to use than thread library With hardware and compiler support smaller granularity than message passing. Parallel Processing

Further Information Adopted as a standard in 1997 www.openmp.org Initiated by SGI www.openmp.org computing.llnl.gov/tutorials/openMP Chandra, Dagum, Kohr, Maydan, McDonald, Menon, “Parallel Programming in OpenMP”, Morgan Kaufman Publishers, 2001. Chapman, Jost, and Van der Pas, “Using OpenMP: Portable Shared Memory Parallel Programming,” The MIT Press, 2008. Parallel Processing

Shared vs. Distributed Memory P0 P1 Pn Memory P0 P1 Pn ... ... M0 M1 Mn Interconnection Network Shared memory Distributed memory Parallel Processing

Shared Memory Programming Model Shared memory programming does not require physically shared memory so long as there is support for logically shared memory (in either hardware or software) If logical shared memory then there may be different costs for accessing memory depending on the physical location. UMA - uniform memory access SMP - symmetric multi-processor typically memory connected to processors via a bus NUMA - non-uniform memory access typically physically distributed memory connected via an interconnection network Parallel Processing

Hello_openmp.c #include <stdio.h> #include <stdlib.h> #include <omp.h> int main(int argc, char **argv) { int n; if (argc > 1) { n = atoi(argv[1]); omp_set_num_threads(n); } printf("Number of threads = %d\n",omp_get_num_threads()); #pragma omp parallel int id = omp_get_thread_num(); printf("Hello World from %d\n",id); if (id == 0) exit(0); Parallel Processing

Compiling & Running Hello_openmp % gcc –fopenmp hello_openmp.c –o hello % ./hello 4 Number of threads = 1 Hello World from 1 Hello World from 0 Hello World from 3 Number of threads = 4 Hello World from 2 The order of the print statements is nondeterministic Parallel Processing

Execution Model Master thread Parallel Region Master and slave threads Implicit thread creation (fork) Parallel Region Master and slave threads Implicit barrier synchronization (join) Master thread Parallel Processing

Explicit Barrier #include <stdio.h> #include <stdlib.h> int main(int argc, char **argv) { int n; if (argc > 1) { n = atoi(argv[1]); omp_set_num_threads(n); } printf("Number of threads = %d\n",omp_get_num_threads()); #pragma omp parallel int id = omp_get_thread_num(); printf("Hello World from %d\n",id); #pragma omp barrier if (id == 0) printf("Number of threads = %d\n",omp_get_num_threads()); exit(0); Parallel Processing

Output with Barrier %./hellob 4 Number of threads = 1 Hello World from 1 Hello World from 0 Hello World from 2 Hello World from 3 Number of threads = 4 The order of the “Hello World” print statements are nondeterministic; however, the Number of threads print statement always comes at the end Parallel Processing

Hello_pthreads.c #include <stdio.h> #include <stdlib.h> #include <pthread.h> #include <errno.h> #define MAXTHREADS 32 int main(int argc, char **argv) { int error,i,n; void hello(int *pid); pthread_t tid[MAXTHREADS],mytid; int pid[MAXTHREADS]; if (argc > 1) { n = atoi(argv[1]); if (n > MAXTHREADS) { printf("Too many threads\n"); exit(1); } pthread_setconcurrency(n); printf("Number of threads = %d\n",pthread_getconcurrency()); for (i=0;i<n;i++) { pid[i]=i; error = pthread_create(&tid[i], NULL,(void *(*)(void *))hello, &pid[i]); } for (i=0;i<n;i++) { error = pthread_join(tid[i],NULL); exit(0); Parallel Processing

Hello_pthreads.c void hello(int *pid) { pthread_t tid; tid = pthread_self(); printf("Hello World from %d (tid = %u)\n",*pid,(unsigned int) tid); if (*pid == 0) printf("Number of threads = %d\n",pthread_getconcurrency()); } % gcc -pthread hello.c -o hello % ./hello 4 Number of threads = 4 Hello World from 0 (tid = 1832728912) Hello World from 1 (tid = 1824336208) Hello World from 3 (tid = 1807550800) Hello World from 2 (tid = 1815943504) The order of the print statements is nondeterministic Parallel Processing

Types of Parallelism Data Parallelism Functional Parallelism FORK JOIN LOOP F1 F2 F3 F4 Threads execute same instructions Threads execute different instructions … but on different data … and can read same data but should write different data

Parallel Loop int a[1000], b[1000]; int main() { int i; int N = 1000; for (i=0; i<N; i++) a[i] = i; b[i] = N-i; for (i=0;i<N;i++) { a[i] = a[i] + b[i]; } int a[1000], b[1000]; int main() { int i; int N = 1000; // Serial Initialization for (i=0; i<N; i++) a[i] = i; b[i] = N-i; #pragma omp for shared(a,b), private(i), schedule(static) for (i=0;i<N;i++) { a[i] = a[i] + b[i]; } Parallel Processing

Scheduling of Parallel Loop     + b     1 2 Nthreads-1 tid  Stripmining Parallel Processing

Implementation of Parallel Loop void vadd(int *id) { int i; for (i=*id;i<N;i+=numthreads) { a[i] = a[i] + b[i]; } for (i=0;i<numthreads;i++) { id[i] = i; error = pthread_create(&tid[i],NULL,(void *(*)(void *))vadd, &id[i]); error = pthread_join(tid[i],NULL); Parallel Processing

Scheduling Chunks of Parallel Loop     chunk0 Chunk 1 Chunk 2 b     chunk0 Chunk Nthreads-1 1 2  tid Parallel Processing

Implementation of Chunking #pragma omp for shared(a,b), private(i), schedule(static,CHUNK) for (i=0;i<N;i++) { a[i] = a[i] + b[i]; } void vadd(int *id) { int i,j; for (i=*id*CHUNK;i<N;i+=numthreads*CHUNK) { for (j=0;j<CHUNK;j++) a[i+j] = a[i+j] + b[i+j]; Parallel Processing

Race Condition int x[10000000]; int main(int argc, char **argv) { int sum=0; ……. omp_set_num_threads(numcounters); for (i=0;i<numcounters*limit;i++) x[i] = 1; #pragma omp parallel for schedule(static) private(i) shared(sum,x) { sum = sum + x[i]; if (i==0) printf("num threads = %d\n",omp_get_num_threads()); } Parallel Processing

Critical Sections int x[10000000]; int main(int argc, char **argv) { int sum=0; ……. #pragma omp parallel for schedule(static) private(i) shared(sum,x) for (i=0;i<numcounters*limit;i++) { #pragma omp critical(sum) sum = sum + x[i]; } Parallel Processing

Reduction Variables int x[10000000]; int main(int argc, char **argv) { int sum=0; ……. #pragma omp parallel for schedule(static) private(i) shared(x) reduction(+:sum) for (i=0;i<numcounters*limit;i++) { sum = sum + x[i]; } Parallel Processing

Reduction + + + + + + X[] partial sum partial sum partial sum partial total sum Parallel Processing

Implementing Reduction #pragma omp parallel shared(sum,x) { int i; int localsum=0; int id; id = omp_get_thread_num(); for (i=id;i<numcounters*limit;i+=numcounters) localsum = localsum + x[i]; } #pragma omp critical(sum) sum = sum+localsum; Parallel Processing

Functional Parallelism Example int main() { int i; double a[N], b[N], c[N], d[N]; // Parallel Function #pragma omp parallel shared(a,b,c,d) privite(i) #pragma omp sections #pragma omp section for (i=0; i<N; i++) c[i] = a[i] + b[i]; d[i] = a[i] * b[i]; }