Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.

Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication can be overlapped with computation. Although there have been many attempts to move MPI communication codes automatically, very few results have been claimed to be successful because most compilers cannot understand user level libraries. We construct an MPI Code Motion optimization using the ROSE framework, a tool for building source-to-source translators. By safely moving MPI communication calls upward, the optimization creates an overlap between communication and computation. The final products are the optimized MPI codes in which all the communication calls are placed at better locations, enhancing the overall performance. Han Suk Kim, University of California, San Diego Daniel J. Quinlan, Center for Applied Scientific Computing Lawrence Livermore National Laboratory 3. Discussion and Future Work The ROSE framework represents programs with abstract syntax trees and it helps optimization modules understand high level user defined libraries. The Code Motion optimization developed in this work is largely based on the ROSE framework. Future work will include 1) application of the optimization to many real MPI applications developed in Lawrence Livermore National Laboratory, 2) formal proof whether or not the transformation is safe, meaning that the semantic of the program stays the same, and 3) generalization of the optimization so that other similar optimizations can be implemented easily in the ROSE framework. UCRL-POST-233388 This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48. In our MPI Code Motion translator, four representations of a program are extensively used: 1) system dependence graph, 2) control flow graph, 3) call graph, and 4) abstract syntax tree. System dependence graph provides data and control dependences among variables. Abstract syntax tree in ROSE is used to detect MPI communication patterns in codes. Control flow graph is traversed in order to analyze the order of execution. By using call graphs, interprocedural analysis helps to find the relationship between callers and callees. 2. Analysis and Transformation Four Program Representations for MPI Code Motion Optimization MPI Code Motion ROSE Framework Control Flow Graph System Dependence Graph Abstract Syntax Tree http://www.ida.liu.se/~vaden/ Interprocedural Analysis int main() { … MPI_Barrier(); // long computation for(…) { … } MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); … } int main() { … MPI_Barrier(); // long computation for(…) { … } MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); … } Original Code int main() { … MPI_Barrier(); MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); // overlap with // long computation for(…) { … } … } int main() { … MPI_Barrier(); MPI_Irecv(buf1, dest); MPI_Isend(buf2, src); // overlap with // long computation for(…) { … } … } Optimized Code Automatic Transform Code Motion Example Code Motion Example: The original code issues communication after a long computation. However, the optimized code posts non-blocking communication before the computation so that the computation and communication can be processed simultaneously. 1. Introduction Code Motion is a compiler optimization technique that changes the order of execution to make the code perform better. In scientific applications implemented with the MPI library, the overlap between communication and computation can significantly improve the execution time. Therefore, finding and moving communication codes to more appropriate locations within the codes have been regarded as an important skill. Compiler communities, however, have not successfully supported this optimization. Accordingly, application scientists often have to analyze and modify their MPI codes by hand. In this work, with the help of the ROSE framework, we construct a source-to-source translator that takes an arbitrary MPI code and transforms it to a code that runs faster by exploiting overlaps. The contributions of this work are 1) tedious code optimizations during code development are no longer needed, and 2) since the translator produces the output in an MPI code, not a binary, the codes can be verified immediately by programmers.

Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.

Similar presentations

Presentation on theme: "Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication.

Similar presentations

Presentation on theme: "Code Motion for MPI Performance Optimization The most common optimization in MPI applications is to post MPI communication earlier so that the communication."— Presentation transcript:

Similar presentations

About project

Feedback