The hybird approach to programming clusters of multi-core architetures.

Slides:



Advertisements
Similar presentations
Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University.
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Parallel Processing with OpenMP
Introduction to Openmp & openACC
Delivering High Performance to Parallel Applications Using Advanced Scheduling Nikolaos Drosinos, Georgios Goumas Maria Athanasaki and Nectarios Koziris.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Class CS 775/875, Spring 2011 Amit H. Kumar, OCCS Old Dominion University.
GWDG Matrix Transpose Results with Hybrid OpenMP / MPI O. Haan Gesellschaft für wissenschaftliche Datenverarbeitung Göttingen, Germany ( GWDG ) SCICOMP.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Scientific Programming OpenM ulti- P rocessing M essage P assing I nterface.
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
Reference: Message Passing Fundamentals.
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Network Topologies.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Mixed MPI/OpenMP programming on HPCx Mark Bull, EPCC with thanks to Jake Duthie and Lorna Smith.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Artdaq Introduction artdaq is a toolkit for creating the event building and filtering portions of a DAQ. A set of ready-to-use components along with hooks.
Agent-based Model Simulation with Twister Bingjing Zhang, Lilian Weng, B649 Term.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
Parallelization: Area Under a Curve. AUC: An important task in science Neuroscience – Endocrine levels in the body over time Economics – Discounting:
Parallel Programming Dr Andy Evans. Parallel programming Various options, but a popular one is the Message Passing Interface (MPI). This is a standard.
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
GU Junli SUN Yihe 1.  Introduction & Related work  Parallel encoder implementation  Test results and Analysis  Conclusions 2.
 Collectives on Two-tier Direct Networks EuroMPI – 2012 Nikhil Jain, JohnMark Lau, Laxmikant Kale 26 th September, 2012.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
 Topology Topology  Different types of topology Different types of topology  bus topologybus topology  ring topologyring topology  star topologystar.
Hybrid MPI and OpenMP Parallel Programming
1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
Advanced Computer Networks Lecture 1 - Parallelization 1.
NETWORK TOPOLOGY Network topology is the study of the arrangement or mapping of the elements of a network,especially the physical.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Ben Miller.   A distributed algorithm is a type of parallel algorithm  They are designed to run on multiple interconnected processors  Separate parts.
Exploring Parallelism with Joseph Pantoga Jon Simington.
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.
Parallel Computing Presented by Justin Reschke
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Background Computer System Architectures Computer System Software.
Is MPI still part of the solution ? George Bosilca Innovative Computing Laboratory Electrical Engineering and Computer Science Department University of.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
Distributed Shared Memory
MapReduce.
Distributed computing deals with hardware
A Domain Decomposition Parallel Implementation of an Elasto-viscoplasticCoupled elasto-plastic Fast Fourier Transform Micromechanical Solver with Spectral.
Hybrid Programming with OpenMP and MPI
By Brandon, Ben, and Lee Parallel Computing.
Presentation transcript:

The hybird approach to programming clusters of multi-core architetures

High-performance computing  Cluster computing In order to gain high performance many computers are networked together to work on computation heavy problems as a whole  Multi-core computers Since the max single core computer speeds began to level out, multi-core computers became the popular solution

High-performance computing  Multi-core clusters With multi-core computers being popular, nodes in clusters began to make use of this style Having multi-core node would seem to make more powerful nodes and save lot of power and space Programming these multi-core nodes could possibly be different

Programming  Cluster computers The standard model for programing clusters is MPI MPI is the message passing interface that is used mostly with distributed memory  Multi-core computers Multi-core computers have shared memory An efficient way to program on shared memory would be the use of OpenMP

Multi-core Clusters  When programming for multi-core clusters MPI treats cores on different nodes and the same node in a similar manner  MPI completely ignore the fact that core on a single node work on shared memory  This causes problems because duplicate data is on on a single node with multiple cores

MPI  MPI uses communicator objects which connect groups of processes in the MPI session.  MPI supports point-to-point communication between two specific processes.  Collective functions are used to communicate all processes in a process group

Problems with MPI  Collectives Collectives take arguments that are arrays of size equal to the number of processes in the communicator ○ Example of collectives are MPI_Gatherv, MPI_Scatterv, and MPI_Alltoallv In the case of MPI_Alltoallv, it takes arguments for both sends and receives which add up to 4 MB on each process if there is a million processes

Solution  The combination of OpenMP and MPI is a worthy solution  MPI can handle the communication between nodes on distributed memory  OpenMP can handle communication within a single node on shared memory

Implementation  Since OpenMP is not an all or nothing model, it can be injected into certain parts of the program  One can identify the most time consuming loops and place OpenMP directives on them  One can also place directives on loops across undistributed dimensions.

Hybrid masteronly  This model is when only one MPI process is used per node  All communication within the node is done by OpenMP  Problem The idling of other threads in the node while communication my MPI is taking place MPI bandwidth not always fully used with a single communicating thread

Hybrid with overlap  Using this method is a way to avoid idling computed threads during communication  The communication is split into to one or more OpenMP threads to handle communication in parallel with useful calculation

Conclusion  Taking advantage of the hybrid approach has advantages over pure MPI in some cases  This is not valid for all cases, some cases will have reverse effects  If programmed well, applications can be very scalable and have great performance increase