Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari

Slides:

Advertisements

Similar presentations

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.

Advertisements

® IA-64 Architecture Innovations John Crawford Architect & Intel Fellow Intel Corporation Jerry Huck Manager & Lead Architect Hewlett Packard Co.

VLIW Very Large Instruction Word. Introduction Very Long Instruction Word is a concept for processing technology that dates back to the early 1980s. The.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

Compiler techniques for exposing ILP

Loop Unrolling & Predication CSE 820. Michigan State University Computer Science and Engineering Software Pipelining With software pipelining a reorganized.

POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:

Static Single Assignment CS 540. Spring Efficient Representations for Reachability Efficiency is measured in terms of the size of the representation.

1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.

1 ILP (Recap). 2 Basic Block (BB) ILP is quite small –BB: a straight-line code sequence with no branches in except to the entry and no branches out except.

Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.

EECC551 - Shaaban #1 Fall 2005 lec# Static Compiler Optimization Techniques We examined the following static ISA/compiler techniques aimed.

Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.

Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.

Program Representations. Representing programs Goals.

From Sequences of Dependent Instructions to Functions An Approach for Improving Performance without ILP or Speculation Ben Rudzyn.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.

Chapter 3 Instruction-Level Parallelism and Its Dynamic Exploitation – Concepts 吳俊興高雄大學資訊工程學系 October 2004 EEF011 Computer Architecture 計算機結構.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.

Instruction Level Parallelism (ILP) Colin Stevens.

1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)

CMPUT Compiler Design and Optimization1 CMPUT680 - Winter 2006 Topic H: SSA for Predicated Code José Nelson Amaral

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

Multiscalar processors

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Precision Going back to constant prop, in what cases would we lose precision?

Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.

1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.

Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.

CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.

1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.

CIS 662 – Computer Architecture – Fall Class 16 – 11/09/04 1 Compiler Techniques for ILP  So far we have explored dynamic hardware techniques for.

Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.

1 Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July.

IA64 Complier Optimizations Alex Bobrek Jonathan Bradbury.

Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.

Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.

Global Register Allocation Based on

Computer Architecture Principles Dr. Mike Frank

Static Single Assignment

CSL718 : VLIW - Software Driven ILP

Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.

Yingmin Li Ting Yan Qi Zhao

Static Single Assignment Form (SSA)

Instruction Level Parallelism (ILP)

Optimizations using SSA

Data Flow Analysis Compiler Design

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/4/2019 CPEG421-05S/Topic5.

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/17/2019 CPEG421-05S/Topic5.

Dynamic Hardware Prediction

How to improve (decrease) CPI

Loop-Level Parallelism

Static Scheduling Techniques

rePLay: A Hardware Framework for Dynamic Optimization

CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019

Presentation transcript:

Predicated Static Single Assignment (PSSA) Presented by AbdulAziz Al-Shammari

2 Papers Path Analysis and Renaming for Predicated Instruction Scheduling - L. Carter, E.Simon, B. Calder, L. Carter, and J. Ferrante Predicated Static Single Assignment - L. Carter, E.Simon, B. Calder, L. Carter, and J. Ferrante

3 Motivation Increasing instruction level parallelism (ILP) are needed to exploit the potential parallelism available in wide issues architectures (e.g. EPIC). Limitation to ILP: - Control-flow dependencies. - Data-flow dependencies.

4 Motivation (cont.) Static Single Assignment (SSA) removes false data dependencies across basic block boundaries in a CFG revealing more ILP but … Still need to create a larger pool of sequential instructions for even more ILP… How? - Eliminate control-flow dependencies using predicated executing.

5 Motivation (cont.) Problem…! - Introduces Predicate dependencies: exists between every operation and the definition(s) of its guarding predicate. SSA doesn’t deal with such dependencies Solution A predicate-sensitive implementation of SSA PSSA

6 Outline Predicated Execution Predicated Static Single Assignment (PSSA) Code Optimizations - Predicated Speculation - Control Height Reduction Experimental Results Future Work Conclusions

7 Predicated Execution Each operation is guarded by one of the predicate registers (provided by hardware arch.) that holds the value of its guarding predicate and committed only if the value of its guarding predicate is true. Predicate registers: –a feature in hardware to support predicated code Advantages –removes hard-to-predict branches –provides a larger pool for ILP (by combining several smaller basic blocks into one larger hyperblock)

8 Predicated Execution- Hyperblock A hyperblock is “a predicated region of code consisting of a straight-line sequence of instructions with a single entry point and possibly multiple exit points ”. If-conversion: “The process of replacing branches with compare operations and associating operations with predicate defined by that compare”.

9 Predicated Execution- Example

10 PSSA Introduce Full-Path Predicates to extend SSA to handle predicate dependencies and the multiple control paths that are merged together in a single predicated region. Allows effective predicated scheduling by (1) eliminating false dependencies -via renaming (2) Creating full-path predicates, and (3) providing path-sensitive data flow analysis Goal: to accomplish the same objectives as SSA for a predicated hyperblock

11 PSSA- transformation..! Each operations is processed in turn at the top of the hyperblock and proceeding to the end. PSSA form: First, PSSA assigns each target of an assignment operation in the hyperblock a unique variables. Second, PSSA summarizes under what conditions each of the multiple definitions of a variable reaches a joint in the hyperblock (using full-path predicates).

12 Example

13 PSSA- transformation (cont.) Two phases: - Hyperblocks are converted to PSSA form before optimization. - After optimization, PSSA inserts clean-up code on edges leaving the hyperblock. Optimizations: –Predicated Speculation –Control Height Reduction

14 Predicated Speculation (PSpec) PSpec schedules a normal operations at its earliest schedulable cycle. The speculative operation is scheduled earlier than the operation it is control dependent on, and predicated on true.

15 Control Height Reduction (CHR) CHR eases control constrains between multiple control statements. Allows successive control operations on the control path to be scheduled in the same cycle, effectively reducing control dependence height.

16 Experimental Results Executed cycles normalized to the number of cycles to execute the original code produced by Trimaran for a 16 issue machine Reduce execution time from 12% to 68%.

17 Experimental Results (cont.) Weighted average number of operations schedule per cycle for hyperblocks when used using PSSA with PSpec and CHR

18 Experimental Results (cont.) Weighted average register pressure in hyperblocks when using PSSA with PSpec and CHR

19 Experimental Results (cont.) Static and Dynamic Code Expansion normalized to original code size.

20 Future Work Examine different PSSA representations to reduce code duplication and the number of full-path predicates created. Apply more optimization techniques Study the advantages in implementing ø -functions for non- critical path names. Create a more efficient implementation of PSSA.

21 Conclusions PSSA is an extension of SSA for a predicate code PSSA enables using code optimization such as: - Predicated Speculation - Control Height Reduction (+) Using PSSA to enable Predicated peculation and Constant Height Reduction reduce execution time from 12% to 68%. (-) PSSA increase code size significantly.