CS 201 Compiler Construction

Slides:

Advertisements

Similar presentations

1 Compiling for VLIWs and ILP Profiling Region formation Acyclic scheduling Cyclic scheduling.

Advertisements

Lecture 11: Code Optimization CS 540 George Mason University.

CS 378 Programming for Performance Single-Thread Performance: Compiler Scheduling for Pipelines Adopted from Siddhartha Chatterjee Spring 2009.

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

1 CS 201 Compiler Construction Software Pipelining: Circular Scheduling.

Compiler techniques for exposing ILP

1 CS 201 Compiler Construction Machine Code Generation.

EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.

1 COMP 740: Computer Architecture and Implementation Montek Singh Tue, Feb 24, 2009 Topic: Instruction-Level Parallelism IV (Software Approaches/Compiler.

9. Code Scheduling for ILP-Processors TECH Computer Science {Software! compilers optimizing code for ILP-processors, including VLIW} 9.1 Introduction 9.2.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.

1 CS 201 Compiler Construction Lecture 12 Global Register Allocation.

1 CS 201 Compiler Construction Lecture 5 Code Optimizations: Copy Propagation & Elimination.

Cpeg421-08S/final-review1 Course Review Tom St. John.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

EECC551 - Shaaban #1 Winter 2002 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.

EECC551 - Shaaban #1 Spring 2006 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.

EECC551 - Shaaban #1 Fall 2005 lec# Pipelining and Instruction-Level Parallelism. Definition of basic instruction block Increasing Instruction-Level.

Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.

EECC551 - Shaaban #1 Winter 2011 lec# Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level.

Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.

Improving Code Generation Honors Compilers April 16 th 2002.

EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.

Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.

Data Flow Analysis Compiler Design Nov. 8, 2005.

PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.

CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.

1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.

Recap Multicycle Operations –MIPS Floating Point Putting It All Together: the MIPS R4000 Pipeline.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.

High-level optimization Jakub Yaghob

Code Optimization.

Optimizing Compilers Background

CS203 – Advanced Computer Architecture

CSL718 : VLIW - Software Driven ILP

CSCI1600: Embedded and Real Time Software

Instruction Scheduling Hal Perkins Summer 2004

Instruction Scheduling Hal Perkins Winter 2008

Register Pressure Guided Unroll-and-Jam

CS 201 Compiler Construction

Siddhartha Chatterjee Spring 2008

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Instruction Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Instruction Scheduling Hal Perkins Autumn 2005

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Dynamic Hardware Prediction

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

How to improve (decrease) CPI

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Loop-Level Parallelism

Instruction Scheduling Hal Perkins Autumn 2011

CSCI1600: Embedded and Real Time Software

CS 201 Compiler Construction

Presentation transcript:

CS 201 Compiler Construction Instruction Scheduling: Trace Scheduler

Instruction Scheduling Modern processors can exploit Instruction Level Parallelism (ILP) by simultaneously executing multiple instructions. Instruction scheduling influences effectiveness with which ILP is exploited. Pipelined processors (e.g., ARM): reordering of instructions avoids delays due hazards. EPIC/VLIW processors (e.g. Itanium): a single long instruction is packed with multiple operations (conventional instructions) that can be simultaneously executed.

Compiler Support Analyze dependences and rearrange the order of instructions, i.e. perform instruction scheduling. Pipelined: limited amount of ILP is required -- can be uncovered by reordering instructions within each basic block. EPIC/VLIW: much more ILP is required -- can be uncovered by examining code from multiple basic blocks.

Compiler Support Two techniques that go beyond basic block boundaries to uncover ILP: (Acyclic Schedulers) Trace Scheduling: examines a trace – a sequence of basic blocks along an acyclic program path; instruction scheduling can result in movement of instructions across basic block boundaries. (Cyclic Schedulers) Software Pipelining: examines basic blocks corresponding to consecutive loop iterations; instruction scheduling can result in movement of instructions across loop iterations.

Trace Scheduling A trace is a sequence of basic blocks that does not extend across loop boundaries. Select a trace Determine the instruction schedule for the trace Introduce compensation code to preserve program semantics Repeat the above steps till some part of the program is yet to be scheduled

Trace Selection Selection of traces is extremely important for overall performance – traces should represent paths that are executed frequently. A fast instruction schedule for one path is obtained at the expense of a slower schedule for the other path due to speculative code motion.

Picking Traces O – operation/instruction Count(o) – number of times o is expected to be executed during an entire program run. Prob(e) – probability that an edge e will be executed -- important for conditional branches. Count(e) = Count(branch) x Prob(e) Counts are estimated using profiling – measure counts by running the program on a representative input.

Algorithm for Trace Construction Pick an operation with the largest execution count as the seed of the trace. Grow the trace backward from the seed. Grow the trace forward from the seed. Given that p is in the trace, include s in the trace iff: Of all the edges leaving p, e has the largest execution count. Of all the edges entering s, e has the highest execution count. Same approach taken to grow the trace backward.

Algorithm Contd.. Trace stops growing forward when: Count(e1) < count(e2) Premature termination of trace can occur in the above algorithm. To prevent this, a slight modification is required.

Algorithm Contd.. Lets say A-B-C-D has been included in the current trace. Count(D-E) > Count(D-F) => add E Count(C-E) > Count(D-E) => do not add E 15 10 Premature termination occurs because the trace that can include C-E can no longer be formed because C is already in the current trace. 9 6 Modification: consider only edges P-E st P is not already part of the current trace.

Algorithm Contd.. Trace cannot cross loop boundaries: if the edge encountered is a loop back edge; or if edge enters/leaves a loop then stop growing the trace. 1 & 2 cannot be placed in the same trace because the edge directly connecting them is a loop back edge and edges indirectly connecting them cross loop boundaries.

Instruction Scheduling Construct a DAG for the selected trace. Generate an instruction schedule using a scheduling heuristic: list scheduling with critical path first. Following generation of the instruction schedule introduction of compensation code may be required to preserve program semantics.

DAG and List Scheduling DAG – nodes are statements, edges dependences List Scheduling – critical (longest) path first i = j / 2 k = i + 4 If i<3 n = m+ 1 m = x +4 k live i = j / 2 m = x + 4 n = m+ 1 If i<3 k = i + 4 m = x + 4 i = j / 2 If i<3 k = i + 4 n = m+ 1

Compensation Code Consider movement of instructions across basic block boundaries, i.e. past splits and merges in the control flow graph. 1. Movement of a statement past/below a Split: compensation code i = n + 1 k = j + 4 If e k = i + 1 i = n + 1 k = j + 4 If e k = i + 1

Compensation Code Contd.. 2. Movement of a statement above a Join: i = i + 1 d = c - 2 c = a + b c = a /2 i = i + 1 d = c - 2 c = a + b c = a / 2 compensation code

Compensation Code Contd.. 3. Movement of a statement above a Split: i = j + 1 i = i + 2 If e k = j + 1 i = j + 1 i = i + 2 If e k = j + 1 no compensation code No compensation code introduced – speculation. Note that i=i+2 can be moved above spilt if i is dead along the off-trace path.

Compensation Code Contd.. 4. Movement of a statement below a Join: i = i + 1 d = c - 2 c = a + b c = a /2 i = i + 1 d = c - 2 c = a + b c = a /2 illegal to move unless i=i+1 is deadcode This case will not arise assuming dead code has been removed.

Compensation Code Contd.. 5. Movement of a branch below a split. i = i + 1 If e1 C If e2 D If e2 C i = i + 1 If e1 i = i + 1 If e1 D

Compensation Code Contd.. 6. Movement of a branch above a join. i = j + 1 x = y + z If e D C i = j + 1 x = y + z If e D C

Negatives: Redundant Code A=B+C A=B+C A=B+C

Negatives: Code Explosion B1 C2 A2 B2 Cn An Bn Cn An Cn-1 An-1 C1 A1 Order of instructions along the trace after scheduling

Code Explosion Contd.. O(n2)  1 step O(nn)  after processing the off-trace paths Cn An Cn-1 An-1 C2 A2 C1 A1 Bn B1 B2 C3 A3 Cn-2 An-2 Bn-1 1 trace of length n n more traces of length (n-1) created each trace will give rise to (n-1) traces of size (n-2) n + n(n-1) + n(n-1)(n-2) + …. O(nn)

Building a DAG for Scheduling DAG contains the following edges: Write-After-Read data dependence Write-After-Write data dependence Read-After-Write data dependence Write-after-conditional-read edge between IF e & x=c+d to prevent movement of x=c+d above IF e. x = a-b x = c + d If e z = x+1

Building a DAG Contd.. 5. Condition jumps: Introduce off-live edge between x=a-b and IF e. This edge does not constrain movement past IF e; it indicates that if x=a-b is moved past IF e then it can be eliminated from the trace but a copy must be placed along the off-trace path. x = a-b y = c+d If e z = x+1

Sample Problem: Introduce Compensation Code A=B+C D=C+1 X=Y+1 If () A=A+1 Z=D+1 P=Q+1 ????? S A=B+C A=A+1 If () D=C+1 Z=D+1 P=Q+1 X=Y+1 S