Instruction Scheduling Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Slides:

Advertisements

Similar presentations

Target Code Generation

Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

ECE 667 Synthesis and Verification of Digital Circuits

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

1 CS 201 Compiler Construction Machine Code Generation.

11/21/2002© 2002 Hal Perkins & UW CSEO-1 CSE 582 – Compilers Instruction Scheduling Hal Perkins Autumn 2002.

CS6290 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Evaluating Heuristics for the Fixed-Predecessor Subproblem of Pm | prec, p j = 1 | C max.

ECE 2162 Tomasulo’s Algorithm. Implementing Dynamic Scheduling Tomasulo’s Algorithm –Used in IBM 360/91 (in the 60s) –Tracks when operands are available.

Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.

Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.

9. Code Scheduling for ILP-Processors TECH Computer Science {Software! compilers optimizing code for ILP-processors, including VLIW} 9.1 Introduction 9.2.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.

Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.

1 Handling nested procedures Method 1 : static (access) links –Reference to the frame of the lexically enclosing procedure –Static chains of such links.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Instruction Selection, II Tree-pattern matching Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in.

Improving Code Generation Honors Compilers April 16 th 2002.

Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Saman Amarasinghe ©MIT Fall 1998 Simple Machine Model Instructions are executed in sequence –Fetch, decode, execute, store results –One instruction.

Instruction Selection Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.1. Basic idea of instruction pipelining.

Fall 2002 Lecture 14: Instruction Scheduling. Saman Amarasinghe ©MIT Fall 1998 Outline Modern architectures Branch delay slots Introduction to.

Spring 2014Jim Hogg - UW - CSE - P501O-1 CSE P501 – Compiler Construction Instruction Scheduling Issues Latencies List scheduling.

1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,

CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.

1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.

Local Register Allocation & Lab Exercise 1 Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.

Instruction Selection and Scheduling. The Problem Writing a compiler is a lot of work Would like to reuse components whenever possible Would like to automate.

1 EECS 6083 Compiler Theory Based on slides from text web site: Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Lecture 3: Uninformed Search

Local Instruction Scheduling — A Primer for Lab 3 — Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.

Local Instruction Scheduling — A Primer for Lab 3 — Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.

Introduction to Code Generation Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.

1 March 16, March 16, 2016March 16, 2016March 16, 2016 Azusa, CA Sheldon X. Liang Ph. D. Azusa Pacific University, Azusa, CA 91702, Tel: (800)

Instruction Scheduling: Beyond Basic Blocks Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.

Advanced Architectures

Local Register Allocation & Lab Exercise 1

Pipeline Implementation (4.6)

Local Instruction Scheduling

Instruction Scheduling for Instruction-Level Parallelism

Instruction Scheduling Hal Perkins Summer 2004

Intermediate Representations

Introduction to Code Generation

Wrapping Up Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit.

Instruction Scheduling: Beyond Basic Blocks

Instruction Scheduling Hal Perkins Winter 2008

Local Instruction Scheduling — A Primer for Lab 3 —

CS 201 Compiler Construction

Intermediate Representations

Static Code Scheduling

Local Register Allocation & Lab Exercise 1

Lecture 16: Register Allocation

Instruction Scheduling: Beyond Basic Blocks

Instruction Scheduling Hal Perkins Autumn 2005

Code Generation Part II

Target Code Generation

CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019

Instruction Scheduling Hal Perkins Autumn 2011

Presentation transcript:

Instruction Scheduling Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

What Makes Code Run Fast? Many operations have non-zero latencies Modern machines can issue several operations per cycle Execution time is order-dependent (and has been since the 60’s) Assumed latencies (conservative) Operation Cycles load3 store3 loadI1 add1 mult2 fadd1 fmult2 shift 1 branch 0 to 8 Loads & stores may or may not block > Non-blocking  fill those issue slots Branch costs vary with path taken Branches typically have delay slots > Fill slots with unrelated operations > Percolates branch upward Scheduler should hide the latencies

Example w  w * 2 * x * y * z Cycles Simple scheduleCycles Schedule loads early 2 registers, 20 cycles3 registers, 13 cycles Reordering operations for speed is called instruction scheduling

Instruction Scheduling (Engineer’s View) The Problem Given a code fragment for some target machine and the latencies for each individual operation, reorder the operations to minimize execution time The Concept Scheduler slow code fast code Machine description The task Produce correct code Minimize wasted cycles Avoid spilling registers Operate efficiently

Instruction Scheduling (The Abstract View) To capture properties of the code, build a dependence graph G Nodes n  G are operations with type(n) and delay(n) An edge e = (n 1,n 2 )  G if & only if n 2 uses the result of n 1 The Code a b c d e f g h i The Dependence Graph

Instruction Scheduling (Definitions) A correct schedule S maps each n  N into a non-negative integer representing its cycle number, and 1. S(n) ≥ 0, for all n  N, obviously 2. If (n 1,n 2 )  E, S(n 1 ) + delay(n 1 ) ≤ S(n 2 ) 3. For each type t, there are no more operations of type t in any cycle than the target machine can issue The length of a schedule S, denoted L(S), is L(S) = max n  N (S(n) + delay(n)) The goal is to find the shortest possible correct schedule. S is time-optimal if L(S) ≤ L(S 1 ), for all other schedules S 1 A schedule might also be optimal in terms of registers, power, or space….

Instruction Scheduling (What’s so difficult?) Critical Points All operands must be available Multiple operations can be ready Moving operations can lengthen register lifetimes Placing uses near definitions can shorten register lifetimes Operands can have multiple predecessors Together, these issues make scheduling hard ( NP-C omplete) Local scheduling is the simple case Restricted to straight-line code Consistent and predictable latencies

Instruction Scheduling The big picture 1. Build a dependence graph, P 2. Compute a priority function over the nodes in P 3. Use list scheduling to construct a schedule, one cycle at a time a. Use a queue of operations that are ready b. At each cycle I. Choose a ready operation and schedule it II. Update the ready queue Local list scheduling The dominant algorithm for twenty years A greedy, heuristic, local technique

Local List Scheduling Cycle  1 Ready  leaves of P Active  Ø while (Ready  Active  Ø) if (Ready  Ø) then remove an op from Ready S( op )  Cycle Active  Active  op Cycle  Cycle + 1 for each op  Active if (S( op ) + delay( op ) ≤ Cycle) then remove op from Active for each successor s of op in P if (s is ready) then Ready  Ready  s Removal in priority order op has completed execution If successor’s operands are ready, put it on Ready

Scheduling Example 1. Build the dependence graph The Code a b c d e f g h i The Dependence Graph

Scheduling Example 1. Build the dependence graph 2. Determine priorities: longest latency-weighted path The Code a b c d e f g h i The Dependence Graph

Scheduling Example 1. Build the dependence graph 2. Determine priorities: longest latency-weighted path 3. Perform list scheduling  r1 1) a: addr1,r1  r14) b:  r2 2) c: multr1,r2  r15) d:  r3 3) e: multr1,r3  r17)  r26) g: multr1,r2  r1 9) h: 11) i:storeAIr1  The Code a b c d e f g h i The Dependence Graph New register name used

More List Scheduling List scheduling breaks down into two distinct classes Forward list scheduling Start with available operations Work forward in time Ready  all operands available Backward list scheduling Start with no successors Work backward in time Ready  result >= all uses