Parallelizing Compilers Presented by Yiwei Zhang.

Slides:



Advertisements
Similar presentations
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Advertisements

Optimizing Compilers for Modern Architectures Coarse-Grain Parallelism Chapter 6 of Allen and Kennedy.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 3 Developed By:
Optimizing Compilers for Modern Architectures Allen and Kennedy, Chapter 13 Compiling Array Assignments.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Loops or Lather, Rinse, Repeat… CS153: Compilers Greg Morrisett.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
1 Chapter 8: Code Generation. 2 Generating Instructions from Three-address Code Example: D = (A*B)+C =* A B T1 =+ T1 C T2 = T2 D.
Programmability Issues
Types of Recursive Methods
Bernstein’s Conditions. Techniques to Exploit Parallelism in Sequential Programming Hierarchy of levels of parallelism: Procedure or Methods Statements.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Preliminary Transformations Chapter 4 of Allen and Kennedy Harel Paz.
Parallel and Cluster Computing 1. 2 Optimising Compilers u The main specific optimization is loop vectorization u The compilers –Try to recognize such.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Stanford University CS243 Winter 2006 Wei Li 1 Data Dependences and Parallelization.
Idiom Recognition in the Polaris Parallelizing Compiler Bill Pottenger and Rudolf Eigenmann Presented by Vincent Yau.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic B: Loop Restructuring José Nelson Amaral
 2007 Pearson Education, Inc. All rights reserved C Program Control.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Recursion Chapter 7. Chapter 7: Recursion2 Chapter Objectives To understand how to think recursively To learn how to trace a recursive method To learn.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Unit 1. Sorting and Divide and Conquer. Lecture 1 Introduction to Algorithm and Sorting.
DATA STRUCTURE Subject Code -14B11CI211.
Abstract Data Types (ADTs) Data Structures The Java Collections API
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Analysis of Algorithm Lecture 3 Recurrence, control structure and few examples (Part 1) Huma Ayub (Assistant Professor) Department of Software Engineering.
Lecture 8. How to Form Recursive relations 1. Recap Asymptotic analysis helps to highlight the order of growth of functions to compare algorithms Common.
Comp 249 Programming Methodology Chapter 10 – Recursion Prof. Aiman Hanna Department of Computer Science & Software Engineering Concordia University, Montreal,
Recursion Chapter 7. Chapter Objectives  To understand how to think recursively  To learn how to trace a recursive method  To learn how to write recursive.
Chapter 12 Recursion, Complexity, and Searching and Sorting
Toward Efficient Flow-Sensitive Induction Variable Analysis and Dependence Testing for Loop Optimization Yixin Shou, Robert A. van Engelen, Johnnie Birch,
Problem Solving Techniques. Compiler n Is a computer program whose purpose is to take a description of a desired program coded in a programming language.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Data Structures R e c u r s i o n. Recursive Thinking Recursion is a problem-solving approach that can be used to generate simple solutions to certain.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
High-Level Transformations for Embedded Computing
ECE 1747H: Parallel Programming Lecture 2-3: More on parallelism and dependences -- synchronization.
C++ How to Program, 7/e © by Pearson Education, Inc. All Rights Reserved.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
Memory-Aware Compilation Philip Sweany 10/20/2011.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
CSC 212 – Data Structures Lecture 15: Big-Oh Notation.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.
BIL 104E Introduction to Scientific and Engineering Computing Lecture 4.
Introduction to Optimization
Code Optimization.
Unit 1. Sorting and Divide and Conquer
Introduction to Algorithms
Presented by: Huston Bokinsky Ying Zhang 25 April, 2013
Introduction to Optimization
Optimizing Transformations Hal Perkins Autumn 2011
Objective of This Course
Compiler Code Optimizations
Introduction to Optimization
Lecture 2 The Art of Concurrency
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Optimizing single thread performance
Code Optimization.
Presentation transcript:

Parallelizing Compilers Presented by Yiwei Zhang

Reference Paper Rudolf Eigenmann and Jay Hoeflinger, ” Parallelizing and Vectorizing Compilers ”, Purdue Univ. School of ECE, High-Performance Computing Lab. ECE-HPCLab-99201, January 2000 Rudolf Eigenmann and Jay Hoeflinger, ” Parallelizing and Vectorizing Compilers ”, Purdue Univ. School of ECE, High-Performance Computing Lab. ECE-HPCLab-99201, January 2000

Importance of parallelizing With the rapid development of multi-core processors, parallelized programs can take such advantage to run much faster than serial programs With the rapid development of multi-core processors, parallelized programs can take such advantage to run much faster than serial programs Compilers created to convert serial programs to run in parallel are parallelizing compilers

Role of parallelizing compilers Attempt to relieve the programmer from dealing with the machine details Attempt to relieve the programmer from dealing with the machine details They allow the programmer to concentrate on solving the object problem, while the compiler concerns itself with the complexities of the machine They allow the programmer to concentrate on solving the object problem, while the compiler concerns itself with the complexities of the machine

Role of parallelizing compilers Much more sophisticated analysis is required of the compiler to generate efficient machine code for different types of machines Much more sophisticated analysis is required of the compiler to generate efficient machine code for different types of machines

How to Parallelizing a Program Find the dependency in the program Find the dependency in the program Try to avoid or eliminate the dependency Try to avoid or eliminate the dependency Reduce overhead cost Reduce overhead cost

Dependence Elimination and Avoidance A data dependence between two sections of a program indicates that during execution those two sections of code must be run in certain order A data dependence between two sections of a program indicates that during execution those two sections of code must be run in certain order anti dependence: READ before WRITE anti dependence: READ before WRITE flow dependence: WRITE before READ flow dependence: WRITE before READ output dependence: WRITE before WRITE output dependence: WRITE before WRITE

Data Privatization and Expansion Data privatization can remove anti and output dependences. Data privatization can remove anti and output dependences. These dependences are not due to computation having to wait for data values produced by others. Instead, it waits because it wants to assign a value to a variable that is still in use These dependences are not due to computation having to wait for data values produced by others. Instead, it waits because it wants to assign a value to a variable that is still in use The basic idea is to use a new storage location so that the new assignment does not overwrite the old value too soon The basic idea is to use a new storage location so that the new assignment does not overwrite the old value too soon

Data Privatization and Expansion Example: Example:

Idiom Recognition: Inductions, Reductions, Recurrences flow dependence These transformations can remove true dependence, e.g., flow dependence This is the case where This is the case where one computation has to wait for another to produce a needed data value The elimination is only possible if we can express the computation in a different way

Induction Variable Substitution Finds variables which form arithmetic and geometric progressions which can be expressed as functions of the indices of enclosing loops Finds variables which form arithmetic and geometric progressions which can be expressed as functions of the indices of enclosing loops Replaces these variables with the expressions involving loop indices Replaces these variables with the expressions involving loop indices

Induction Variable Substitution Example: Example: We can replace variable ‘ ind ’ with an arithmetic form of indice i We can replace variable ‘ ind ’ with an arithmetic form of indice i

I Identification of induction variables Through pattern matching (e.g., the compiler finds statements that modify variables in the described way) Through abstract interpretation (identifying the sequence of values assumed by a variable in a loop)

Reductions Reduction operations abstract the values of an array into a form with lesser dimensionality. The typical example is an array being summed up into a scalar variable

Reductions We can split the array into p parts, sum them up individually by different processors, and then combine the results. The transformed code has two additional loops, for initializing and combining the partial results.

Recurrence Recurrences use the result of one or several previous loop iterations for computing the value of next iteration However, for certain forms of linear recurrences, algorithms are known that can be parallelized

Recurrence Compiler has recognized a pattern of linear recurrences for which a parallel solver is known. The compiler then replaces this code by a call to a mathematical library that contains the corresponding parallel solver algorithm

Parallel Loop Restructuring A parallel computation usually incurs an overhead when starting and terminating The larger the computation in the loop, the better this overhead can be amortized Use techniques such as loop fusion, loop coalescing, and loop interchange to optimize performance

Loop fusion Loop fusion combines two adjacent loops into a single loop.

Loop coalescing Loop coalescing merges two nested loops into a single loop

Loop interchange Move an inner parallel loop to an outer position in a loop nest As a result, overhead is only incurred once

Questions?