Hybrid Programming with OpenMP and MPI

Slides:



Advertisements
Similar presentations
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Advertisements

Parallel Processing with OpenMP
Introduction to Openmp & openACC
Distributed Systems CS
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Structure of Computer Systems
A NOVEL APPROACH TO SOLVING LARGE-SCALE LINEAR SYSTEMS Ken Habgood, Itamar Arel Department of Electrical Engineering & Computer Science GABRIEL CRAMER.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
Thoughts on Shared Caches Jeff Odom University of Maryland.
Today’s topics Single processors and the Memory Hierarchy
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Reference: Message Passing Fundamentals.
Parallel Computing Overview CS 524 – High-Performance Computing.
An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
The hybird approach to programming clusters of multi-core architetures.
Advanced Hybrid MPI/OpenMP Parallelization Paradigms for Nested Loop Algorithms onto Clusters of SMPs Nikolaos Drosinos and Nectarios Koziris National.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
Mixed MPI/OpenMP programming on HPCx Mark Bull, EPCC with thanks to Jake Duthie and Lorna Smith.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
GU Junli SUN Yihe 1.  Introduction & Related work  Parallel encoder implementation  Test results and Analysis  Conclusions 2.
Hybrid MPI and OpenMP Parallel Programming
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Distributed Real-time Systems- Lecture 01 Cluster Computing Dr. Amitava Gupta Faculty of Informatics & Electrical Engineering University of Rostock, Germany.
Background Computer System Architectures Computer System Software.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Multi-Grid Esteban Pauli 4/25/06. Overview Problem Description Problem Description Implementation Implementation –Shared Memory –Distributed Memory –Other.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Parallel and Distributed Programming: A Brief Introduction Kenjiro Taura.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Parallel Processing
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to parallel programming
Parallel Programming Styles and Hybrids
The University of Adelaide, School of Computer Science
Task Scheduling for Multicore CPUs and NUMA Systems
CRESCO Project: Salvatore Raia
Multi-Processing in High Performance Computer Architecture:
Parallel Programming in C with MPI and OpenMP
Multi-Processing in High Performance Computer Architecture:
What is Parallel and Distributed computing?
Distributed Shared Memory
Lecture 1: Parallel Architecture Intro
Support for ”interactive batch”
Introduction to Multiprocessors
A Domain Decomposition Parallel Implementation of an Elasto-viscoplasticCoupled elasto-plastic Fast Fourier Transform Micromechanical Solver with Spectral.
Distributed Systems CS
MPJ: A Java-based Parallel Computing System
Chapter 4 Multiprocessors
Introduction, background, jargon
Hybrid MPI and OpenMP Parallel Programming
Department of Computer Science, University of Tennessee, Knoxville
Learning Objectives To be able to describe the purpose of the CPU
Presentation transcript:

Hybrid Programming with OpenMP and MPI Kushal Kedia Reacting Gas Dynamics Laboratory 18.337J –Final Project Presentation, May 13th 2009 kushal@mit.edu

Motivation – Flame dynamics 8 processors on single shared memory, pure OpenMP 2 cycles of 200 Hz flame oscillation (0.01 seconds of real time) takes approximately 5 days!

MPI? OpenMP? MPI – Message Passing Interface Cluster of Computers OpenMP – Open Multi Processing Desktop Symmetric Multiprocessing (SMP)

Modern Clusters (multiple SMPs) Network Connection

MPI OpenMP Pros Cons Pros Cons Portable to distributed and shared memory machines. Scales beyond one node No data placement problem Cons Difficult to develop and debug High latency, low bandwidth Explicit communication Difficult load balancing Pros Easy to implement parallelism Low latency, high bandwidth Implicit Communication Dynamic load balancing Cons Only on shared memory machine Scale within one node Possible data placement problem No specific thread order

Why Hybridization? – best from both the paradigms introducing MPI into OpenMP applications can help scale across multiple SMP nodes introducing OpenMP into MPI applications can help make more efficient use of the shared memory on SMP nodes, thus mitigating the need for explicit intra-node communication introducing MPI and OpenMP during the design/coding of a new application can help maximize efficiency, performance, and scaling

Problem Statement Steady-State Heat Transfer Like Problem

Solution

Grid and parallel Decomposition Distribution among 4 processors

MPI Scenario Each chunk of the grid goes to a separate processor Explicit communication calls are made after every iteration, irrespective of the processor being on the same SMP node or different

Hybrid Scenario A single MPI process on each SMP node Each process spawns 4 threads, which carries out OpenMP iterations Master thread of each SMP node communicates after every iteration

Schematic Hybrid Scenario MASTER THREADS of respective SMP nodes

Computational resources Pharos cluster (a shared and distributed memory architecture), used by the Reacting Gas Dynamics Laboratory (Dept. of Mechanical Engineering, MIT) is used for the parallel simulations. Pharos consists of about 60 Intel Xeon -Harpertown nodes, each consisting of dual quad-core CPUs (8 processors) of 2.66 GHz speed.

Fixed grid size of 2500 x 2500

Constant # of processors = 20

Hybrid Program slower than pure MPI! Seen with many problems. Cases with Hybrid programs faster than pure MPI are few, but work very well if suitable

Why? Possible arguments Less scalability of OpenMP due to implicit parallelism All threads idle except one while inter-node communication is taking place Cache Miss high due to larger dataset and data placement problem Lack of optimized OpenMP libraries compared to MPI libraries Communication/computation ratio is low

References