Parallel Programming On the IUCAA Clusters Sunu Engineer.

Slides:



Advertisements
Similar presentations
Parallel Processing with OpenMP
Advertisements

Introduction to Openmp & openACC
Introductions to Parallel Programming Using OpenMP
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
1 Programming Explicit Thread-level Parallelism  As noted previously, the programmer must specify how to parallelize  But, want path of least effort.
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
Parallel Programming in Java with Shared Memory Directives.
Lecture 5: Shared-memory Computing with Open MP. Shared Memory Computing.
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
4.1 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 4: Threads Overview Multithreading Models Thread Libraries  Pthreads  Windows.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
OpenMP Martin Kruliš Jiří Dokulil. OpenMP OpenMP Architecture Review Board Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S. Department of Energy,…
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Introduction to OpenMP
DEV490 Easy Multi-threading for Native.NET Apps with OpenMP ™ and Intel ® Threading Toolkit Software Application Engineer, Intel.
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
MPI and OpenMP.
CS/EE 217 GPU Architecture and Parallel Programming Lecture 23: Introduction to OpenACC.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
OpenMP Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
NPACI Parallel Computing Institute August 19-23, 2002 San Diego Supercomputing Center S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Introduction to OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Martin Kruliš Jiří Dokulil
Shared Memory Parallelism - OpenMP
Lecture 5: Shared-memory Computing with Open MP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Computer Engg, IIT(BHU)
Introduction to OpenMP
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Computer Science Department
Shared Memory Programming with OpenMP
Chapter 4: Threads Overview Multithreading Models Thread Libraries
Multi-core CPU Computing Straightforward with OpenMP
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts.
September 4, 1997 Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson *Parts of this lecture.
Programming with Shared Memory Introduction to OpenMP
DNA microarrays. Infinite Mixture Model-Based Clustering of DNA Microarray Data Using openMP.
Programming with Shared Memory
Introduction to OpenMP
Programming with Shared Memory
OpenMP Martin Kruliš.
OpenMP Parallel Programming
Parallel Programming with OPENMP
Presentation transcript:

Parallel Programming On the IUCAA Clusters Sunu Engineer

IUCAA Clusters  The Cluster – Cluster of Intel Machines on Linux  Hercules – Cluster of HP ES45 quad processor nodes  References:

The Cluster  Four Single Processor Nodes with 100 Mbps Ethernet interconnect.  1.4 GHz, Intel Pentium 4  512 MB RAM  Linux 2.4 Kernel (Redhat 7.2 Distribution)  MPI – LAM  PVM – 3.4.3

Hercules  Four quad processor nodes with Memory Channel interconnect  1.25 GHz Alpha 21264D RISC Processor  4 GB RAM  Tru64 5.1A with TruCluster software  Native MPI  LAM 7.0  PVM 3.4.3

Expected Computational Performance  Intel Cluster  Processor - 512/590  System GFLOPS ~ 2  Algorithm/Benchmark Used – Specint/float/HPL  ES45 Cluster  Processor ~ 679/960  System GFLOPS ~ 30  Algorithm/Benchmark Used – Specint/float/HPL

Parallel Programs  Move towards large scale distributed programs  Larger class of problems with higher resolution  Enhanced levels of details to be explored  …

The Starting Point  Model  Single Processor Program  Multi Processor Program  Model  Multiprocessor Program

Decomposition of a Single Processor Program  Temporal  Initialization  Control  Termination  Spatial  Functional  Modular  Object based

Multi Processor Programs  Spatial delocalization – Dissolving the boundary  Single spatial coordinate - Invalid  Single time coordinate - Invalid  Temporal multiplicity  Multiple streams at different rates w.r.t an external clock.

In comparison  Multiple points of initialization  Distributed control  Multiple points and times of termination  Distribution of the activity in space and time

Breaking up a problem

Yet Another way

And another

Amdahl’s Law

Degrees of refinement  Fine parallelism  Instruction level  Program statement level  Loop level  Coarse parallelism  Process level  Task level  Region level

Patterns and Frameworks  Patterns - Documented solutions to recurring design problems.  Frameworks – Software and hardware structures implementing the infrastructure

Processes and Threads  From heavy multitasking to lightweight multitasking on a single processor  Isolated memory spaces to shared memory space

Posix Threads in Brief  pthread_create(pthread_t id, pthread_attr_t attributes, void *(*thread_function)(void *), void * arguments)  pthread_exit  pthread_join  pthread_self  pthread_mutex_init  pthread_mutex_lock/unlock  Link with –lpthread

Multiprocessing architectures  Symmetric Multiprocessing  Shared memory  Space Unified  Different temporal streams  OpenMP standard

OpenMP Programming  Set of directives to the compiler to express shared memory parallelism  Small library of functions  Environment variables.  Standard language bindings defined for FORTRAN, C and C++

Open MP example #include int main(int argc, char ** argv) { #pragma omp parallel { printf(“Hello World from %d\n”,omp_get_thread_num() ); } return(0); } C An openMP program program openmp !$OMP PARALLEL print *, “Hello world from”, omp_get_thread_num() !$OMP END PARALLEL stop end

Open MP directives Parallel and Work sharing  OMP Parallel [clauses]  OMP do [ clauses]  OMP sections [ clauses]  OMP section  OMP single

Combined work sharing Synchronization  OMP parallel do  OMP parallel sections  OMP master  OMP critical  OMP barrier  OMP atomic  OMP flush  OMP ordered  OMP threadprivate

OpenMP Directive clauses  shared(list)  private(list)/threadprivate  firstprivate/lastprivate(list)  default(private|shared|none)  default(shared|none)  reduction (operator|intrinsic : list)  copyin(list)  if (expr)  schedule(type[,chunk])  ordered/nowait

Open MP Library functions  omp_get/set_num_threads()  omp_get_max_threads()  omp_get_thread_num()  omp_get_num_procs()  omp_in_parallel()  omp_get/set_(dynamic/nested)()  omp_init/destroy/test_lock()  omp_set/unset_lock()

OpenMP environment variables  OMP_SCHEDULE  OMP_NUM_THREADS  OMP_DYNAMIC  OMP_NESTED

OpenMP Reduction and Atomic Operators  Reduction : +,-,*,&,|,&&,||  Atomic : ++,--,+,*,-,/,&,>>,<<,|

Simple loops  do I=1,N z(I) = a * x(I) + y end do !$OMP parallel do do I=1,N z(I) = a * x(I) + y end do

Data Scoping  Loop index private by default  Declare as shared, private or reduction

Private variables  !$OMP parallel do private(a,b,c) do I=1,m do j =1,n b=f(I) c=k(j) call abc(a,b,c) end do #pragma omp parallel for private(a,b,c)

Dependencies  Data dependencies (Lexical/dynamic extent)  Flow dependencies  Classifying and removing the dependencies  Non removable dependencies  Examples Do I=2,n a(I) =a(I)+a(I-1) end do Do I=2,N,2 a(I)= a(I)+a(I-1) End do

Making sure everyone has enough work  Parallel overhead – Creation of threads, synchronization vs. work done in the loop $!OMP parallel do schedule(dynamic,3) schedule type – static, dynamic, guided,runtime

Parallel regions – from fine to coarse parallelism  $!OMP Parallel  threadprivate and copyin  Work sharing constructs  do, sections, section, single Synchronization  critical, atomic, barrier, ordered, master

To distributed memory systems  MPI, PVM, BSP …

Existing parallel libraries and toolkits include:  PUL, the Parallel Utilities Library from EPCC.  The Multicomputer Toolbox from Tony Skjellum and colleagues at LLNL and MSU.  The Portable, Extensible, Toolkit for Scientific computation from ANL.  ScaLAPACK from ORNL and UTK.  ESSL, PESSL on AIX  PBLAS, PLAPACK, ARPACK Some Parallel Libraries