Parallel Programming in Java with Shared Memory Directives.

Slides:



Advertisements
Similar presentations
Scheduling and Performance Issues for Programming using OpenMP
Advertisements

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Indian Institute of Science Bangalore, India भारतीय विज्ञान संस्थान बंगलौर, भारत Supercomputer Education and Research Centre (SERC) Adapted from: o “MPI-Message.
Open[M]ulti[P]rocessing Pthreads: Programmer explicitly define thread behavior openMP: Compiler and system defines thread behavior Pthreads: Library independent.
Mohsan Jameel Department of Computing NUST School of Electrical Engineering and Computer Science 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
1 OpenMP—An API for Shared Memory Programming Slides are based on:
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Software Group © 2006 IBM Corporation Compiler Technology Task, thread and processor — OpenMP 3.0 and beyond Guansong Zhang, IBM Toronto Lab.
Introduction to OpenMP For a more detailed tutorial see: Look at the presentations.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
OpenMPI Majdi Baddourah
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
Programming with Shared Memory Introduction to OpenMP
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Shared Memory Parallelization Outline What is shared memory parallelization? OpenMP Fractal Example False Sharing Variable scoping Examples on sharing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 5 Shared Memory Programming with OpenMP An Introduction to Parallel Programming Peter Pacheco.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
Chapter 17 Shared-Memory Programming. Introduction OpenMP is an application programming interface (API) for parallel programming on multiprocessors. It.
OpenMP China MCP.
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Measuring Synchronisation and Scheduling Overheads in OpenMP J. Mark Bull EPCC University of Edinburgh, UK
ECE 1747 Parallel Programming Shared Memory: OpenMP Environment and Synchronization.
1 OpenMP Writing programs that use OpenMP. Using OpenMP to parallelize many serial for loops with only small changes to the source code. Task parallelism.
OpenMP OpenMP A.Klypin Shared memory and OpenMP Simple Example Threads Dependencies Directives Handling Common blocks Synchronization Improving load balance.
Lecture 8: OpenMP. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism / Implicit parallelism.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
04/10/25Parallel and Distributed Programming1 Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita.
OpenMP Martin Kruliš Jiří Dokulil. OpenMP OpenMP Architecture Review Board Compaq, HP, Intel, IBM, KAI, SGI, SUN, U.S. Department of Energy,…
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
OpenMP fundamentials Nikita Panov
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
Threaded Programming Lecture 4: Work sharing directives.
Introduction to OpenMP
Parallel Programming 0024 Week 10 Thomas Gross Spring Semester 2010 May 20, 2010.
Shared Memory Parallelism - OpenMP Sathish Vadhiyar Credits/Sources: OpenMP C/C++ standard (openmp.org) OpenMP tutorial (
MPI and OpenMP.
Threaded Programming Lecture 2: Introduction to OpenMP.
CS/EE 217 GPU Architecture and Parallel Programming Lecture 23: Introduction to OpenACC.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.
CPE779: Shared Memory and OpenMP Based on slides by Laxmikant V. Kale and David Padua of the University of Illinois.
COMP7330/7336 Advanced Parallel and Distributed Computing OpenMP: Programming Model Dr. Xiao Qin Auburn University
Distributed and Parallel Processing George Wells.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Introduction to OpenMP
Shared Memory Parallelism - OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
CS427 Multicore Architecture and Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Loop Parallelism and OpenMP CS433 Spring 2001
Open[M]ulti[P]rocessing
Computer Engg, IIT(BHU)
Introduction to OpenMP
Shared-Memory Programming
September 4, 1997 Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Parallel Processing.
Computer Science Department
Shared Memory Programming with OpenMP
Programming with Shared Memory Introduction to OpenMP
Allen D. Malony Computer & Information Science Department
Introduction to OpenMP
OpenMP Martin Kruliš.
OpenMP Parallel Programming
Presentation transcript:

Parallel Programming in Java with Shared Memory Directives

2 Overview  API specification  JOMP compiler and runtime library  Performance  Lattice-Boltzmann application

3 Why directives for Java?  Implementing parallel loops using Java threads is a bit messy –Thread fork/join is expensive: need to keep threads running and implement a task pool –Need to define new class with a method containing the loop body and pass an instance of this to the task pool  Relatively simple to automate the process using compiler directives  OpenMP is becoming increasingly familiar to Fortran and C/C++ programmers in HPC  Using directives allows easy maintenance of a single version of source code

4 JOMP  JOMP –An OpenMP-like interface for Java –a research project developed at EPCC –freely available –fully portable  JOMP API –Based heavily on the C/C++ OpenMP standard –Directives embedded as comments (as in Fortran) –//omp –Library functions are class methods of an OMP class –Java system properties take the place of environment variables

5 API  Most OpenMP directives supported: –PARALLEL –FOR –SECTIONS –CRITICAL –SINGLE –MASTER –BARRIER –ONLY (conditional compilation)  Data attribute scoping –DEFAULT, SHARED, PRIVATE, FIRSTPRIVATE, LASTPRIVATE and REDUCTION clauses

6 API (cont.)  Library routines : –Get and set # of threads –Get thread id. –Determine whether in parallel region –Enable/disable nested parallelism –Simple and nested locks  System properties: –Set # of threads ( java -Djomp.threads=8 MyProg ) –Set loop scheduling options –Enable/disable nested parallelism

7 API (cont.)  Some differences from C/C++ API: –No ATOMIC directive –No FLUSH directive –No THREADPRIVATE directive –REDUCTION for arrays (not implemented yet) –No function to return number of processors

8 Example //omp parallel shared(a,b,n) { //omp for for (i=1;i<n;i++) { b[i] = (a[i] + a[i-1]) * 0.5; }

9 JOMP compiler  Built using JavaCC, and based on the free Java 1.1 grammar distributed with JavaCC  JOMP is written in Java, so is fully portable!  Java source code is parsed to produce an abstract syntax tree and symbol table  Directives are added to the grammar  To implement them, JOMP overrides methods in the unparsing phase  Output is pure Java with calls to runtime library

10 JOMP system

11 Implementing a parallel region  On encountering a parallel region, the compiler creates a new class  The class has a go() method, containing the code inside the region, and declarations of private variables  The class contains data members corresponding to shared and reduction variables –need to take care with initialisation (Java compilers are somewhat pedantic!) –more copying required than in C, (no varargs equivalent)  A new instance of the class is created, and passed to the runtime library, which causes the go() method to be executed on each thread

12 Parallel “Hello World” public class Hello { public static void main (String argv[]) { int myid; //omp parallel private(myid) { myid = OMP.getThreadNum(); System.out.println(“Hello from “ + myid); }

13 “Hello World” implementation import jomp.runtime.*; public class Hello { public static void main (String argv[]) { int myid; __omp_class_0 __omp_obj_0 = new __omp_class_0(); try { jomp.runtime.OMP.doParallel(__omp_obj_0); } catch (Throwable __omp_exception) { jomp.runtime.OMP.errorMessage(); }

14 private static class __omp_class_0 extends jomp.runtime.BusyTask { public void go (int __omp_me) throws Throwable { int myid; myid = OMP.grtThreadNum(); System.out.println(“Hello from “ + myid); }

15 Implementation (cont.)  By simulating the original name scope, original code block is reused verbatim  Worksharing directives are replaced with additional code for e.g. loop scheduling  Local and instance variables used to simulate original name scope  Use an inner class for DEFAULT(SHARED), a normal class for DEFAULT(NONE)

16 Runtime library  Performs thread management and assigns tasks to be run to the threads  Implements fast barrier synchronisation (lock-free F-way tournament algorithm)  Uses a variant of the barrier code to implement fast reductions  Support for static and dynamic loop scheduling, and ordered sections in a loop  Implements locks and critical regions using synchronized blocks

17 Summary  Advantages –Simpler and neater than Java threads requires less sequential code modification minimal performance penalty –OpenMP offers a familiar interface used in Fortran and C codes for a number of years –Directives allow easy maintenance of a single version of the code  Disadvantages –Still only a research project –Not yet a defined standard