University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

OpenMP.
Compilation and Parallelization Techniques with Tool Support to Realize Sequence Alignment Algorithm on FPGA and Multicore Sunita Chandrasekaran1 Oscar.
Course Outline Traditional Static Program Analysis Software Testing
Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
The OpenUH Compiler: A Community Resource Barbara Chapman University of Houston March, 2007 High Performance Computing and Tools Group
1 Code Optimization. 2 The Code Optimizer Control flow analysis: control flow graph Data-flow analysis Transformations Front end Code generator Code optimizer.
NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.
Program Representations. Representing programs Goals.
AUTOMATIC GENERATION OF CODE OPTIMIZERS FROM FORMAL SPECIFICATIONS Vineeth Kumar Paleri Regional Engineering College, calicut Kerala, India. (Currently,
1 Lawrence Livermore National Laboratory By Chunhua (Leo) Liao, Stephen Guzik, Dan Quinlan A node-level programming model framework for exascale computing*
Parallel Inclusion-based Points-to Analysis Mario Méndez-Lojo Augustine Mathew Keshav Pingali The University of Texas at Austin (USA) 1.
PARALLEL PROGRAMMING WITH OPENMP Ing. Andrea Marongiu
Presented by Rengan Xu LCPC /16/2014
TM Pro64™: Performance Compilers For IA-64™ Jim Dehnert Principal Engineer 5 June 2000.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Implementing an OpenMP Execution Environment on InfiniBand Clusters Jie Tao ¹, Wolfgang Karl ¹, and Carsten Trinitis ² ¹ Institut für Technische Informatik.
X := 11; if (x == 11) { DoSomething(); } else { DoSomethingElse(); x := x + 1; } y := x; // value of y? Phase ordering problem Optimizations can interact.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.
Making Sequential Consistency Practical in Titanium Amir Kamil and Jimmy Su.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
10/04/2011CS4961 CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011.
Budapest, November st ALADIN maintenance and phasing workshop Short introduction to OpenMP Jure Jerman, Environmental Agency of Slovenia.
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Parallel Programming in Java with Shared Memory Directives.
OMPi: A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
Exascale: Why It is Different Barbara Chapman University of Houston High Performance Computing and Tools Group P2S2 Workshop.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Unified Parallel C at LBNL/UCB The Berkeley UPC Compiler: Implementation and Performance Wei Chen the LBNL/Berkeley UPC Group.
AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author : Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source : Proceedings of the 2nd IASTED.
OpenMP: Open specifications for Multi-Processing What is OpenMP? Join\Fork model Join\Fork model Variables Variables Explicit parallelism Explicit parallelism.
Extending Open64 with Transactional Memory features Jiaqi Zhang Tsinghua University.
Architecture for a Next-Generation GCC Chris Lattner Vikram Adve The First Annual GCC Developers'
CS 838: Pervasive Parallelism Introduction to OpenMP Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from online references.
Work Replication with Parallel Region #pragma omp parallel { for ( j=0; j
High-Performance Parallel Scientific Computing 2008 Purdue University OpenMP Tutorial Seung-Jai Min School of Electrical and Computer.
10/02/2012CS4230 CS4230 Parallel Programming Lecture 11: Breaking Dependences and Task Parallel Algorithms Mary Hall October 2,
Parallel Programming 0024 Week 10 Thomas Gross Spring Semester 2010 May 20, 2010.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Unified Parallel C at LBNL/UCB Compiler Optimizations in the Berkeley UPC Translator Wei Chen the Berkeley UPC Group.
9/22/2011CS4961 CS4961 Parallel Programming Lecture 9: Task Parallelism in OpenMP Mary Hall September 22,
Threaded Programming Lecture 2: Introduction to OpenMP.
Program Representations. Representing programs Goals.
10/05/2010CS4961 CS4961 Parallel Programming Lecture 13: Task Parallelism in OpenMP Mary Hall October 5,
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
3/6/20161 WHIRL SSA: A New Optimization Infrastructure for Open64 Keqiao Yang, Zhemin Yang Parallel Processing Institute, Fudan University, Shanghai Hui.
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
SHARED MEMORY PROGRAMMING WITH OpenMP
SHARED MEMORY PROGRAMMING WITH OpenMP
Compositional Pointer and Escape Analysis for Java Programs
Introduction to OpenMP
Graph-Based Operational Semantics
Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.
An Overview to Compiler Design
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, /04/2012 CS4230.
A Practical Stride Prefetching Implementation in Global Optimizer
Introduction to High Performance Computing Lecture 20
Topic 5a Partial Redundancy Elimination and SSA Form
Implementing an OpenMP Execution Environment on InfiniBand Clusters
Allen D. Malony Computer & Information Science Department
Introduction to OpenMP
The SGI Pro64 Compiler Infrastructure
Question 1 How are you going to provide language and/or library (or other?) support in Fortran, C/C++, or another language for massively parallel programming.
Presentation transcript:

University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64 Workshop, CGO ‘08

University of Houston Goals Exploit the compiler analysis and optimizations for OpenMP programs Enable high level optimizations by taking OpenMP semantics into consideration Build a general framework for OpenMP compiler optimizations 2

University of Houston OpenUH Compiler based on Open64 IPA (Inter Procedural Analyzer) Source code w/ OpenMP directives Source code with runtime library calls Linking CG (code for IA-32, IA-64, Opteron) WOPT (global scalar optimizer) Object files LOWER_MP (Transformation of OpenMP ) A Native Compiler A Native Compiler Executables A Portable OpenMP Runtime library A Portable OpenMP Runtime library FRONTENDS (C/C++, Fortran 90, OpenMP) Open64 Compiler infrastructure LNO (Loop Nest Optimizer) OMP_PRELOWER (Preprocess OpenMP ) WHIRL2C & WHIRL2F (IR-to-source for none-Itanium )

University of Houston OpenUH Compiler based on Open64 IPA (Inter Procedural Analyzer) Source code w/ OpenMP directives Source code with runtime library calls Linking CG (code for IA-32, IA-64, Opteron) WOPT (global scalar optimizer) Object files LOWER_MP (Transformation of OpenMP ) A Native Compiler A Native Compiler Executables A Portable OpenMP Runtime library A Portable OpenMP Runtime library FRONTENDS (C/C++, Fortran 90, OpenMP) Open64 Compiler infrastructure LNO (Loop Nest Optimizer) OMP_PRELOWER (Preprocess OpenMP ) WHIRL2C & WHIRL2F (IR-to-source for none-Itanium )

University of Houston Motivation Compiler flags -O3-O3 –mp3 PRE- example NAS FT NAS UA Why different performance?

University of Houston A PRE Example

University of Houston A PRE Example copy propagation no copy propagation!

University of Houston Parallel Data Flow Analysis Compilers need to further optimize OpenMP codes Most current OpenMP compilers perform optimizations after OpenMP constructs have been lowered to threaded codes –Have to restrict the traditional optimizations inside an OpenMP construct, not crossing synchronizations Need to enable global optimizations –Missed opportunity to perform high-level OpenMP optimizations Such as barrier elimination

University of Houston Solution Method Based on the OpenMP Memory Model –Relaxed Consistency –Flush is the key operation! Design a Parallel Control Flow Graph to represent a OpenMP program

University of Houston Barrier a=1;b=1; Flush(a,b) Else… a=0; b=0; #pragma omp parallel sections { #pragma omp section { a=1; #pragma omp flush(a,b) IF (b == 0){ Critical1; a:= 0; #pragma omp flush(a) }ELSE else1; #pragma omp section { b=1; #pragma omp flush(a,b) IF (a == 0){ Critical2; b= 0; #pragma omp flush(b) }ELSE else2; } A: an OpenMP section example B: The corresponding PCFG Super node: Composite node: Basic Node: Parallel edge: Sequential edge: Entry Conflict edge: If (a ==0) Flush(b) b=0 Else… If (b ==0) Flush(a) a=0

University of Houston CFG HSSA IVR CP DCE CP DCE Emit Input WHIRL tree Output WHIRL tree -Construct CFG -Control Flow Analyses -Flow Free Alias Analysis -Construct HSSA representation -Points-to and Pointer Alias Analysis -Create CODEMAP representation -PREOPT SSA-based optimizations “Flow free copy propagation” -Emit new WHIRL from optimized CFG/SSA PCFG HSSA IVR CP DCE CP DCE Emit Input WHIRL tree Output WHIRL tree -Construct CFG -Control Flow Analyses -Parallel Control Flow Analysis -Flow Free Alias Analysis -Construct HSSA representation -Phi insertion for conflict edges -Points-to and Pointer Alias Analysis -Create CODEMAP representation -SSA-based optimizations “Flow free copy propagation” -Emit new WHIRL from optimized CFG/SSA SSAPRE -Perform PRE on OpenMP code

University of Houston Conclusion Implementing in the OpenUH compiler Improve the scalability of OpenMP programs A framework for conducting more aggressive optimizations for Cluster OpenMP Can be used in conjunction with data race detection tools