Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.

Slides:



Advertisements
Similar presentations
Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.
Advertisements

Lawrence Livermore National Laboratory ROSE Compiler Project Computational Exascale Workshop December 2010 Dan Quinlan Chunhua Liao, Justin Too, Robb Matzke,
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Programmability Issues
Optimizing single thread performance Dependence Loop transformations.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.
Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.
Program Representations. Representing programs Goals.
AUTOMATIC GENERATION OF CODE OPTIMIZERS FROM FORMAL SPECIFICATIONS Vineeth Kumar Paleri Regional Engineering College, calicut Kerala, India. (Currently,
CPSC Compiler Tutorial 9 Review of Compiler.
Bronis R. de Supinski Center for Applied Scientific Computing Lawrence Livermore National Laboratory June 2, 2005 The Most Needed Feature(s) for OpenMP.
1 Ivan Lanese Computer Science Department University of Bologna Roberto Bruni Computer Science Department University of Pisa A mobile calculus with parametric.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
1 Dan Quinlan, Markus Schordan, Qing Yi Center for Applied Scientific Computing Lawrence Livermore National Laboratory Semantic-Driven Parallelization.
Describing Syntax and Semantics
Software Uniqueness: How and Why? Puneet Mishra Dr. Mark Stamp Department of Computer Science San José State University, San José, California.
In conclusion our tool: can be used with any operator overloading AD package replaces the manual process, which is slow and overestimates the number active.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Invitation to Computer Science 5th Edition
1 CASC ROSE Compiler Infrastructure Source-to-Source Analysis and Optimization Dan Quinlan Rich Vuduc, Qing Yi, Markus Schordan Center for Applied Scientific.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
{ Graphite Grigory Arashkovich, Anuj Khanna, Anirban Gangopadhyay, Michael D’Egidio, Laura Willson.
November 13, 2006 Performance Engineering Research Institute 1 Scientific Discovery through Advanced Computation Performance Engineering.
Clone-Cloud. Motivation With the increasing use of mobile devices, mobile applications with richer functionalities are becoming ubiquitous But mobile.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
CS266 Software Reverse Engineering (SRE) Reversing and Patching Java Bytecode Teodoro (Ted) Cipresso,
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
C++ Programming: From Problem Analysis to Program Design, Fourth Edition Chapter 6: User-Defined Functions I.
Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Chapter 3 Part II Describing Syntax and Semantics.
Compiler Construction (CS-636)
LLNL-PRES-xxxxxx This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Bronis R. de Supinski and Jeffrey S. Vetter Center for Applied Scientific Computing August 15, 2000 Umpire: Making MPI Programs Safe.
May08-21 Model-Based Software Development Kevin Korslund Daniel De Graaf Cory Kleinheksel Benjamin Miller Client – Rockwell Collins Faculty Advisor – Dr.
Reverse Engineering. Reverse engineering is the general process of analyzing a technology specifically to ascertain how it was designed or how it operates.
CS223: Software Engineering
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
ICS312 Introduction to Compilers Set 23. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
Prologue Sung-Dong Kim, Dept. of Computer Engineering, Hansung University.
7. Symbol Table Chih-Hung Wang Compilers References 1. C. N. Fischer and R. J. LeBlanc. Crafting a Compiler with C. Pearson Education Inc., D.
Kai Li, Allen D. Malony, Sameer Shende, Robert Bell
Advanced Computer Systems
Chapter 6: User-Defined Functions I
Introduction to Compiler Construction
8. Symbol Table Chih-Hung Wang
SOFTWARE DESIGN AND ARCHITECTURE
State your reasons or how to keep proofs while optimizing code
Compiler Construction
High Coverage Detection of Input-Related Security Faults
Front End vs Back End of a Compilers
Compiler Construction
Heat Simulations with COMSOL
Software Architecture
Subject: Language Processor
Advanced Compiler Design
CH 4 - Language semantics
Presentation transcript:

Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication can be overlapped with computation. Although there have been many attempts to move MPI communication codes automatically, very few results have been claimed to be successful because most compilers cannot understand user level libraries. We construct an MPI Code Motion optimization using the ROSE framework, a tool for building source-to-source translators. By safely moving MPI communication calls upward, the optimization creates an overlap between communication and computation. The final products are the optimized MPI codes in which all the communication calls are placed at better locations, enhancing the overall performance. Han Suk Kim, University of California, San Diego Daniel J. Quinlan, Center for Applied Scientific Computing Lawrence Livermore National Laboratory 3. Discussion and Future Work The ROSE framework represents programs with abstract syntax trees and it helps optimization modules understand high level user defined libraries. The Code Motion optimization developed in this work is largely based on the ROSE framework. Future work will include 1) application of the optimization to many real MPI applications developed in Lawrence Livermore National Laboratory, 2) formal proof whether or not the transformation is safe, meaning that the semantic of the program stays the same, and 3) generalization of the optimization so that other similar optimizations can be implemented easily in the ROSE framework. UCRL-POST This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. In our MPI Code Motion translator, four representations of a program are extensively used: 1) system dependence graph, 2) control flow graph, 3) call graph, and 4) abstract syntax tree. System dependence graph provides data and control dependences among variables. Abstract syntax tree in ROSE is used to detect MPI communication patterns in codes. Control flow graph is traversed in order to analyze the order of execution. By using call graphs, interprocedural analysis helps to find the relationship between callers and callees. 2. Analysis and Transformation Four Program Representations for MPI Code Motion Optimization MPI Code Motion ROSE Framework Control Flow Graph System Dependence Graph Abstract Syntax Tree Interprocedural Analysis int main() { … MPI_Barrier(); // long computation for(…) { … } MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); … } int main() { … MPI_Barrier(); // long computation for(…) { … } MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); … } Original Code int main() { … MPI_Barrier(); MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); // overlap with // long computation for(…) { … } … } int main() { … MPI_Barrier(); MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); // overlap with // long computation for(…) { … } … } Optimized Code Automatic Transform Code Motion Example Code Motion Example: The original code issues communication after a long computation. However, the optimized code posts non-blocking communication before the computation so that the computation and communication can be processed simultaneously. 1. Introduction Code Motion is a compiler optimization technique that changes the order of execution to make the code perform better. In scientific applications implemented with the MPI library, the overlap between communication and computation can significantly improve the execution time. Therefore, finding and moving communication codes to more appropriate locations within the codes have been regarded as an important skill. Compiler communities, however, have not successfully supported this optimization. Accordingly, application scientists often have to analyze and modify their MPI codes by hand. In this work, with the help of the ROSE framework, we construct a source-to-source translator that takes an arbitrary MPI code and transforms it to a code that runs faster by exploiting overlaps. The contributions of this work are 1) tedious code optimizations during code development are no longer needed, and 2) since the translator produces the output in an MPI code, not a binary, the codes can be verified immediately by programmers.