Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.

Slides:



Advertisements
Similar presentations
Performance Analysis Tools for High-Performance Computing Daniel Becker
Advertisements

Multiple Processor Systems
Communication-Avoiding Algorithms Jim Demmel EECS & Math Departments UC Berkeley.
Parallel Processing with OpenMP
Distributed Systems CS
Structure of Computer Systems
Computer Abstractions and Technology
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
Thoughts on Shared Caches Jeff Odom University of Maryland.
Today’s topics Single processors and the Memory Hierarchy
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Types of Parallel Computers
Introductory Courses in High Performance Computing at Illinois David Padua.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
Introduction CS 524 – High-Performance Computing.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
CS 267 Spring 2008 Horst Simon UC Berkeley May 15, 2008 Code Generation Framework for Process Network Models onto Parallel Platforms Man-Kit Leung, Isaac.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
The hybird approach to programming clusters of multi-core architetures.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Parallelization: Conway’s Game of Life. Cellular automata: Important for science Biology – Mapping brain tumor growth Ecology – Interactions of species.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Parallel Communications and NUMA Control on the Teragrid’s New Sun Constellation System Lars Koesterke with Kent Milfeld and Karl W. Schulz AUS Presentation.
Computer System Architectures Computer System Software
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Experience with COSMO MPI/OpenMP hybrid parallelization Matthew Cordery, William Sawyer Swiss National Supercomputing Centre Ulrich Schättler Deutscher.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
The Truth About Parallel Computing: Fantasy versus Reality William M. Jones, PhD Computer Science Department Coastal Carolina University.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Part 3.  What are the general types of parallelism that we already discussed?
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
6/29/1999PDPTA'991 Performance Prediction for Large Scale Parallel Systems Yuhong Wen and Geoffrey C. Fox Northeast Parallel Architecture Center (NPAC)
Interconnection network network interface and a case study.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.
SSU 1 Dr.A.Srinivas PES Institute of Technology Bangalore, India 9 – 20 July 2012.
Background Computer System Architectures Computer System Software.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Parallel OpenFOAM CFD Performance Studies Student: Adi Farshteindiker Advisors: Dr. Guy Tel-Zur,Prof. Shlomi Dolev The Department of Computer Science Faculty.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang
High Performance Computing
A Domain Decomposition Parallel Implementation of an Elasto-viscoplasticCoupled elasto-plastic Fast Fourier Transform Micromechanical Solver with Spectral.
Distributed Systems CS
Hybrid Programming with OpenMP and MPI
EE 4xx: Computer Architecture and Performance Programming
Introduction, background, jargon
Support for Adaptivity in ARMCI Using Migratable Objects
Types of Parallel Computers
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis

2 Climate Simulation We use a computer model of the climate system – a computer program, which simulates an abstract model (mathematical representation) of the climate system – reproducing the relevant features based on – theoretical principles (e.g. laws of nature) – observed relationships

„Blizzard“ – IBM Power6 System Peak performance: 158 TeraFlop/s (158 trillion floating point operations per second) 264 IBM Power6 nodes 16 dual core CPUs per node (altogether 8,448 compute cores) more than 20 TeraByte memory 7,000 TeraByte of disk space until 2011 Infiniband network: 7.6 TeraByte/s (aggregated) High performance computing system „Blizzard“ at DKRZ - compute nodes (orange), infiniband switch (red), disks (green)

Message Passing Hybrid World Node OpenMP

5 Parallel Compiler Why can’t I just say f90 –Parallel mycode.f and everything works fine ? Logical dependencies Data dependencies

Multiprocessor – Shared Memory CPU Network Memory Module Memory Module Memory Module Memory Module CPU

7 Concepts - Shared Memory Directives Single Process Master Thread Parallel Region Team of Threads Single Process Parallel Region Master Thread Team of Threads

8 Amdahls law

Message Passing Hybrid World Node OpenMP

Processes und Threads Message Passing OpenMP

„Blizzard“ – IBM Power6 System Peak performance: 158 TeraFlop/s (158 trillion floating point operations per second) 264 IBM Power6 nodes 16 dual core CPUs per node (altogether 8,448 compute cores) more than 20 TeraByte memory 7,000 TeraByte of disk space until 2011 Infiniband network: 7.6 TeraByte/s (aggregated) High performance computing system „Blizzard“ at DKRZ - compute nodes (orange), infiniband switch (red), disks (green)

Bottlenecks 12 Bottlenecks of Massively Parallel Computing Systems – Memory Bandwidth – Communication Network – Idle Processors

Memory Hierarchy Register L1,L2,L3 Cache Memory 13

Data Movement 14

15 Data Movement in Parallel Systems

Message Passing Hybrid World Node OpenMP

The World of MPI Network CPU Memory Module CPU Memory Module CPU

Processes und Threads Message Passing OpenMP

Improve the efficiency of a parallel program running on High Performance Computers Typical Workflow Motivation 19 Measurement and Runtimeanalysis of the Code Development of a parallel Program Optimizing the Code

Profiling – Summarize performance data per process/thread during execution – „statistical“ Analysis Tracing – Trace record with performance data and timestamp per process/thread – e.g. MPI messages Performance Engineering 20

Optimization Compilers cannot optimize automatically everything Optimization is not just finding the right compiler flag Major algorithmic changes are necessary 21