PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

Distributed Systems CS
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Thoughts on Shared Caches Jeff Odom University of Maryland.
Types of Parallel Computers
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Chapter Hardwired vs Microprogrammed Control Multithreading
Sun FIRE Jani Raitavuo Niko Ronkainen. Sun FIRE 15K Most powerful and scalable Up to 106 processors, 576 GB memory and 250 TB online disk storage Fireplane.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Dr. Gheith Abandah, Chair Computer Engineering Department The University of Jordan 20/4/20091.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Multi-core Processing The Past and The Future Amir Moghimi, ASIC Course, UT ECE.
ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Computer System Architectures Computer System Software
1 Programming Multicore Processors Aamir Shafi High Performance Computing Lab
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
View-Oriented Parallel Programming for multi-core systems Dr Zhiyi Huang World 45 Univ of Otago.
Compiler BE Panel IDC HPC User Forum April 2009 Don Kretsch Director, Sun Developer Tools Sun Microsystems.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Parallel Computer Architecture and Interconnect 1b.1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
Design of A Custom Vector Operation API Exploiting SIMD Intrinsics within Java Presented by John-Marc Desmarais Authors: Jonathan Parri, John-Marc Desmarais,
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Parallel Computing Presented by Justin Reschke
Background Computer System Architectures Computer System Software.
Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.
IMPROVEMENT OF COMPUTATIONAL ABILITIES IN COMPUTING ENVIRONMENTS WITH VIRTUALIZATION TECHNOLOGIES Abstract We illustrates the ways to improve abilities.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Intra-Socket and Inter-Socket Communication in Multi-core Systems Roshan N.P S7 CSB Roll no:29.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
These slides are based on the book:
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Introduction to parallel programming
Parallel Processing - introduction
Constructing a system with multiple computers or processors
Chapter 4: Threads.
Chapter 17 Parallel Processing
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Distributed Systems CS
MPJ: A Java-based Parallel Computing System
Chapter 4 Multiprocessors
Introduction, background, jargon
Types of Parallel Computers
Presentation transcript:

PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming techniques to enhance application performance. The Oracle Solaris Studio software provides state-of-the-art optimizing and parallelizing compilers for C, C++ and Fortran, an advanced debugger, and optimized mathematical and performance libraries. Also included are an extremely powerful performance analysis tool for profiling serial and parallel applications, a thread analysis tool to detect data races and deadlock in memory parallel programs, and an Integrated Development Environment (IDE). The Oracle Message Passing Toolkit software provides the high-performance MPI libraries and associated run-time environment needed for message passing applications that can run on a single system or across multiple compute systems connected with high performance networking, including Gigabit Ethernet, 10 Gigabit Ethernet, InfiniBand and Myrinet. Examples of OpenMP and MPI are provided throughout the paper, including their usage via the Oracle Solaris Studio and Oracle Message Passing Toolkit products for development and deployment of both serial and parallel applications on SPARC and x86/x64 based systems. Throughout this paper it is demonstrated how to develop and deploy an application parallelized with OpenMP and/or MPI. Parallel Architectures & Parallel Programming Models These are generic descriptions without any specific information on systems available today.  The Symmetric Multiprocessor (SMP) Architecture  The Non-Uniform Memory Access (NUMA) Architecture  The Hybrid Architecture The Cache Coherent Non-Uniform Memory Access (cc-NUMA) Architecture There are many choices when it comes to selecting a programming model for a parallel system.  Automatic Parallelization  The OpenMP Parallel Programming Model  The Message Passing Interface (MPI) Parallel Programming Model  The Hybrid Parallel Programming Model Alexander Bogdanov (Institute for High-performance computing and the integrated systems) Pyae Sone Ko Ko (Saint-Petersburg state marine technical university) Kyaw Zaya (Saint-Petersburg state marine technical university) Multicore Processor Technology In a multicore processor architecture there are multiple independent processing units available to execute an instruction stream. Such a unit is generally referred to as a core. A processor might consist of multiple cores, with each core capable of executing an instruction stream. Since each core can operate independently, different instruction streams can be executed simultaneously. Nowadays all major chip vendors offer various types of multicore processors. A block diagram of a generic multicore architecture is shown in Figure1. Figure.1. Block diagram of a generic multicore architecture. Why Parallelization? Figure.2. Parellelization reduces the execution time Parallelization is another optimization technique to further enhance the performance. The goal is to reduce the total execution time proportionally to the number of cores used. If the serial execution time is 20 seconds for example, executing the parallel version on a quad core system ideally reduces this to 20/4 = 5 seconds. This is illustrated in Figure2. Conclusion The goal of parallel computing is to reduce the elapsed time of an application. To this end, multiple processors, or cores, are used to execute the application. The expected speed up depends on the number of threads used, but also on the fraction of the execution time that can be parallelized. The choice of the programming model has substantial consequences regarding the implementation, execution and maintenance of the application. We strongly recommend to carefully consider these before making a choice. Performance Results The results were obtained on a Sun SPARC Enterprise T5120 server from Oracle. The system had a single UltraSPARC T2 processor with 8 cores and 8 hardware threads per core. In Figure the elapsed times in seconds for the Automatically Parallelized and OpenMP implementations are plotted as a function of the number of threads used. Note that a log scale is used on the vertical axis. Figure.3. Performance of the Automatically Parallelized and OpenMP implementations.. Figure.4. Performance of the MPI implementation. In Figure4 the performance of the MPI implementation is plotted as a function of the number of processes. Both the computational time (solid line) as well as the total time spent in the MPI functions (bar chart) are shown.