Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

Mathematical Preliminaries
Constraint Satisfaction Problems
Advanced Piloting Cruise Plot.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Towards theoretical frameworks for comparing constraint satisfaction models and algorithms Peter van Beek, University of Waterloo CP 2001 · Paphos, Cyprus.
1 Discreteness and the Welfare Cost of Labour Supply Tax Distortions Keshab Bhattarai University of Hull and John Whalley Universities of Warwick and Western.
Predicting Performance Impact of DVFS for Realistic Memory Systems Rustam Miftakhutdinov Eiman Ebrahimi Yale N. Patt.
1 Outline relationship among topics secrets LP with upper bounds by Simplex method basic feasible solution (BFS) by Simplex method for bounded variables.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Chapter 4: Informed Heuristic Search
1 Column Generation. 2 Outline trim loss problem different formulations column generation the trim loss problem master problem and subproblem in column.
ABC Technology Project
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
2 |SharePoint Saturday New York City
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
Universität Kaiserslautern Institut für Technologie und Arbeit / Institute of Technology and Work 1 Q16) Willingness to participate in a follow-up case.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
U1A L1 Examples FACTORING REVIEW EXAMPLES.
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Januar MDMDFSSMDMDFSSS
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Compiler Construction
Constraint Programming for Compiler Optimization March 2006.
Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis
Constraint Programming and Backtracking Search Algorithms
Presentation transcript:

Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University of California, Davis CP 2001 · Paphos, Cyprus November 2001

2 Local instruction scheduling Schedule basic-block ·straight-line sequence of code with single entry, single exit Single-issue pipelined processors ·single instruction can begin execution each clock cycle ·delay or latency before result is available Classic problem ·lots of attention in literature Remains important ·single-issue RISC processors used in embedded systems

3 Example: evaluate (a + b) + c instructions A r1 a B r2 b C r3 c D r1 r1 + r2 E r1 r1 + r AB DC E dependency DAG

4 Example: evaluate (a + b) + c non-optimal schedule Ar1 a Br2 b nop Dr1 r1 + r2 Cr3 c nop Er1 r1 + r3 AB DC E dependency DAG

5 Example: evaluate (a + b) + c optimal schedule Ar1 a Br2 b Cr3 c nop Dr1 r1 + r2 Er1 r1 + r3 AB DC E dependency DAG

6 Local instruction scheduling problem Given a labeled dependency DAG G = (N, E) for a basic block, find a schedule S that specifies a start time S( i ) for each instruction such that S( i ) S( j ), i, j N, i j, and S( j ) S( i ) + latency( i, j ), ( i, j ) E, and max{ S( i ) | i N } is minimized.

7 Previous work NP-Complete if arbitrary latencies (Hennessy & Gross, 1983; Palem & Simons, 1993) Polynomial special cases (Bernstein & Gertner, 1989; Palem & Simons, 1993; Wu et al., 2000) Optimal algorithms ·dynamic programming (e.g., Kessler, 1998) ·integer linear programming (e.g., Wilken et al., 2000) ·constraint programming (e.g., Ertl & Krall, 1991)

8 Minimal constraint model variables A, B, C, D, E domains {1, …, m} constraints D A + 3 D B + 3 E C + 3 E D + 1 all-diff(A, B, C, D, E) AB DC E dependency DAG

9 Bounds consistency [1, 3] [4, 6] variable A B C D E domain [1, 6] D A + 3 constraints D B + 3 E C + 3 E D + 1 all-diff(A, B, C, D, E) [4, 5] [3, 3] [6, 6] [1, 2] For each constraint C and for each variable x in C, min has a support in C and max has a support in C

10 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

11 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

12 Distance constraints: Regions A pair of nodes i, j define a region in a DAG G if: (i) there is more than one path from i to j, and (ii) not all paths from i to j go through some node k distinct from i and j.

13 Distance constraints: Initial estimate A B ED H FG C

14 Distance constraints: Initial estimate A B ED H FG C jj+1j+2j+3j+4j+5 5 A F

15 Distance constraints: Initial estimate A B ED H FG C jj+1j+2j+3j+4j+5 E H 5

16 Distance constraints: Initial estimate A B ED H FG C A jj+1j+2j+3j+4j+5 j+6j+7j+8j+9 H

17 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

18 Improved distance constraints for small regions A B ED H FG C [1,1] [10,10] [2,3] [5,6] [6,7] [2,3] propagate latency propagate all-diff Extract region from DAG Post constraints Test consistency of A 1 H 10 Given H A + 9

19 Improved distance constraints for small regions Repeat with H A + 10 Extract region from DAG Post constraints A B ED H FG C [1,1] [10,10] [2,3] [5,6] [6,7] [2,3] propagate latency Test consistency of A 1 H 10 Given H A + 9 propagate all-diff inconsistent

20 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

21 Predecessor constraints [4, ] 3 1 A B DCE H FG [,14] [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11

22 Predecessor constraints DE G A B C H F [4, ] [,14] 3 3 [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11 [9,12] 56789

23 Predecessor constraints [4, ] 3 1 A B DCE H FG [,14] [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11 [9,12] [12,14]

24 Successor constraints [4, ] 3 1 A B DCE H FG [,14] [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11 [9,12] [12,14] [4,6] 6789

25 Solving instances of the model Use constraints to establish: ·lower bound on length m of optimal schedule ·lower and upper bounds of variables Backtracking search ·maintains bounds consistency Pugets (1998) all-diff propagator and optimizations Lecontes (1996) optimizations ·branches on lower(x), lower(x)+1, … If no solution found, increment m and repeat search

26 Experimental results Embedded in Gnu Compiler Collection (GCC) Compared with: ·GCCs critical path list scheduling ·ILP scheduler (Wilken et al., 2000) SPEC95 floating point benchmarks ·compiled using highest level of optimization (-O3) Target processor: ·single-issue ·latency of 3 for loads, 2 for floating point, 1 for integer ops

27 Experimental results: SPEC95 floating point benchmarks Total basic blocks (BB) BB passed to CSP scheduler BB solved optimally by CSP scheduler BB with improved schedule Static cycles improved Total benchmark cycles CSP scheduling time (sec.) Baseline compile time (sec.) 7, ,

28 Scheduling time for CSP and ILP schedulers

29 Quantifying contributions of three model improvements Problems solved (/15)

30 Conclusions CP approach to local instruction scheduling ·single-issue processors ·arbitrary latencies Optimal and fast on very large, real problems ·experimental evaluation on SPEC95 benchmarks ·20-fold improvement over previous best approach Key was an improved constraint model

31 Good ideas not included Cycle cutsets (e.g., Dechter, 1990) ·most larger problems had small cutsets (2 to 20 nodes) that split problem into equal-sized independent subproblems Singleton consistency (e.g., Prosser et al., 2000) ·often reduced domains dramatically prior to search Symmetry breaking constraints ·many symmetric (non) schedules