1 Removing Impediments to Loop Fusion Through Code Transformations Bob Blainey 1, Christopher Barton 2, and Jos’e Nelson Amaral 2 1 IBM Toronto Software.

Slides:



Advertisements
Similar presentations
Generalized Index-Set Splitting Christopher Barton Arie Tal Bob Blainey Jose Nelson Amaral.
Advertisements

March 14, CMPUT680 - Winter 2006 Topic C: Loop Fusion Kit Barton
Optimizing Compilers for Modern Architectures Syllabus Allen and Kennedy, Preface Optimizing Compilers for Modern Architectures.
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 3 Developed By:
1 Optimization Optimization = transformation that improves the performance of the target code Optimization must not change the output must not cause errors.
© 2009 IBM Corporation July, 2009 | PADTAD Chicago, Illinois A Proposal of Operation History Management System for Source-to-Source Optimization.
Instruction-Level Parallel Processors {Objective: executing two or more instructions in parallel} 4.1 Evolution and overview of ILP-processors 4.2 Dependencies.
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Compiler techniques for exposing ILP
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
9. Code Scheduling for ILP-Processors TECH Computer Science {Software! compilers optimizing code for ILP-processors, including VLIW} 9.1 Introduction 9.2.
AUTOMATIC GENERATION OF CODE OPTIMIZERS FROM FORMAL SPECIFICATIONS Vineeth Kumar Paleri Regional Engineering College, calicut Kerala, India. (Currently,
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Mitigating the Compiler Optimization Phase- Ordering Problem using Machine Learning.
The Use of Traces for Inlining in Java Programs Borys J. Bradel Tarek S. Abdelrahman Edward S. Rogers Sr.Department of Electrical and Computer Engineering.
Instruction Level Parallelism (ILP) Colin Stevens.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Addressing Optimization for Loop Execution Targeting DSP with Auto-Increment/Decrement Architecture Wei-Kai Cheng Youn-Long Lin* Computer & Communications.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
© 2002 IBM Corporation IBM Toronto Software Lab October 6, 2004 | CASCON2004 Interprocedural Strength Reduction Shimin Cui Roch Archambault Raul Silvera.
CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic B: Loop Restructuring José Nelson Amaral
Multiscalar processors
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Data Dependencies A dependency type that can cause a stall.
Optimizing Compilers Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Course Overview John Cavazos University.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
Performance Simulators José Nelson Amaral CMPUT 429 Dept. of Computing Science University of Alberta.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Advanced Compilers CMPSCI 710 Spring 2004 Lecture 1 Emery Berger University of Massachusetts,
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Reuse Distance as a Metric for Cache Behavior Kristof Beyls and Erik D’Hollander Ghent University PDCS - August 2001.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
CS5222 Advanced Computer Architecture Part 3: VLIW Architecture
Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.
High-Level Transformations for Embedded Computing
October 11, 2007 © 2007 IBM Corporation Multidimensional Blocking in UPC Christopher Barton, Călin Caşcaval, George Almási, Rahul Garg, José Nelson Amaral,
Introduction to Flow Chart It is pictorial representation of process of a system or processes. Types of Flow charts –Program Flow Chart –System Flow chart.
Eliminating affinity tests and simplifying shared accesses in UPC Rahul Garg*, Kit Barton*, Calin Cascaval** Gheorghe Almasi**, Jose Nelson Amaral* *University.
Intro to Programming Web Design ½ Shade Adetoro. Programming Slangs IDE - Integrated Development Environment – the software in which you develop an application.
Compiler Construction Dr. Naveed Ejaz Lecture 4. 2 The Back End Register Allocation:  Have each value in a register when it is used. Instruction selection.
Carnegie Mellon Compiler Optimization of Scalar Value Communication Between Speculative Threads Antonia Zhai, Christopher B. Colohan, J. Gregory Steffan.
Recursion Unrolling for Divide and Conquer Programs Radu Rugina and Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Memory-Aware Compilation Philip Sweany 10/20/2011.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
Code Optimization.
Chapter 1 Introduction.
Chapter 1 Introduction.
Antonia Zhai, Christopher B. Colohan,
CSL718 : VLIW - Software Driven ILP
Optimizing Transformations Hal Perkins Autumn 2011
Register Pressure Guided Unroll-and-Jam
CHAPTER 8: CPU and Memory Design, Enhancement, and Implementation
Optimizing Transformations Hal Perkins Winter 2008
Instruction Level Parallelism (ILP)
October 18, 2018 Kit Barton, IBM Canada
Reducing pipeline hazards – three techniques
How to improve (decrease) CPI
Loop-Level Parallelism
Optimizing Compilers CISC 673 Spring 2009 Course Overview
Introduction to Optimization
Code Optimization.
Presentation transcript:

1 Removing Impediments to Loop Fusion Through Code Transformations Bob Blainey 1, Christopher Barton 2, and Jos’e Nelson Amaral 2 1 IBM Toronto Software Laboratory, Toronto, Canada 2 Department of Computing Science, University of Alberta, Edmonton, Canada Yao-Huei Fang Adviser: Professor Chung Yung

2 Outline 1. Introduction 2. Overview of Loop Optimizations 3. Loop Fusion Algorithm 4. Results

Introduction Two important typically performed in a loop restructuring compiler are loop fusion and loop distribution. The loop fusion’s advantages: decreases the number of loop branches executed creates opportunities for data reuse offers more instructions for the scheduler to balance the use of functional units The loop fusion’s disadvantages: increased code size increased register pressure potential over-commiting of hardware resources and the formation more complex control flow 3

Overview of Loop Optimizations 4

Example of aggressive copy propagation 5

Loop Fusion Algorithm We divide loops into two classes: loops that are eligible for fusion and loops that are not eligible for fusion. two loops that are eligible for fusion must satisfy the following conditions: – they must be conforming, – they must be control equivalent, – they must be adjacent, and – there can be only forward dependences between the loop bodies. 6

Conti. 7

8

Example of Algorithm 9

Conti. 10

Conti. 11

12

Result 13

Conti. 14