Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University.

Similar presentations


Presentation on theme: "Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University."— Presentation transcript:

1 Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University of California, Davis CP 2001 · Paphos, Cyprus November 2001

2 2 Local instruction scheduling Schedule basic-block ·straight-line sequence of code with single entry, single exit Single-issue pipelined processors ·single instruction can begin execution each clock cycle ·delay or latency before result is available Classic problem ·lots of attention in literature Remains important ·single-issue RISC processors used in embedded systems

3 3 Example: evaluate (a + b) + c instructions A r1 a B r2 b C r3 c D r1 r1 + r2 E r1 r1 + r AB DC E dependency DAG

4 4 Example: evaluate (a + b) + c non-optimal schedule Ar1 a Br2 b nop Dr1 r1 + r2 Cr3 c nop Er1 r1 + r3 AB DC E dependency DAG

5 5 Example: evaluate (a + b) + c optimal schedule Ar1 a Br2 b Cr3 c nop Dr1 r1 + r2 Er1 r1 + r3 AB DC E dependency DAG

6 6 Local instruction scheduling problem Given a labeled dependency DAG G = (N, E) for a basic block, find a schedule S that specifies a start time S( i ) for each instruction such that S( i ) S( j ), i, j N, i j, and S( j ) S( i ) + latency( i, j ), ( i, j ) E, and max{ S( i ) | i N } is minimized.

7 7 Previous work NP-Complete if arbitrary latencies (Hennessy & Gross, 1983; Palem & Simons, 1993) Polynomial special cases (Bernstein & Gertner, 1989; Palem & Simons, 1993; Wu et al., 2000) Optimal algorithms ·dynamic programming (e.g., Kessler, 1998) ·integer linear programming (e.g., Wilken et al., 2000) ·constraint programming (e.g., Ertl & Krall, 1991)

8 8 Minimal constraint model variables A, B, C, D, E domains {1, …, m} constraints D A + 3 D B + 3 E C + 3 E D + 1 all-diff(A, B, C, D, E) AB DC E dependency DAG

9 9 Bounds consistency [1, 3] [4, 6] variable A B C D E domain [1, 6] D A + 3 constraints D B + 3 E C + 3 E D + 1 all-diff(A, B, C, D, E) [4, 5] [3, 3] [6, 6] [1, 2] For each constraint C and for each variable x in C, min has a support in C and max has a support in C

10 10 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

11 11 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

12 12 Distance constraints: Regions A pair of nodes i, j define a region in a DAG G if: (i) there is more than one path from i to j, and (ii) not all paths from i to j go through some node k distinct from i and j.

13 13 Distance constraints: Initial estimate A B ED H FG C

14 14 Distance constraints: Initial estimate A B ED H FG C jj+1j+2j+3j+4j+5 5 A F

15 15 Distance constraints: Initial estimate A B ED H FG C jj+1j+2j+3j+4j+5 E H 5

16 16 Distance constraints: Initial estimate A B ED H FG C A jj+1j+2j+3j+4j+5 j+6j+7j+8j+9 H

17 17 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

18 18 Improved distance constraints for small regions A B ED H FG C [1,1] [10,10] [2,3] [5,6] [6,7] [2,3] propagate latency propagate all-diff Extract region from DAG Post constraints Test consistency of A 1 H 10 Given H A + 9

19 19 Improved distance constraints for small regions Repeat with H A + 10 Extract region from DAG Post constraints A B ED H FG C [1,1] [10,10] [2,3] [5,6] [6,7] [2,3] propagate latency Test consistency of A 1 H 10 Given H A + 9 propagate all-diff inconsistent

20 20 Three improvements to minimal model 1. Initial distance constraints defined over nodes which define regions 2. Improved distance constraints for small regions 3. Predecessor and successor constraints defined over nodes with multiple predecessors or multiple successors

21 21 Predecessor constraints [4, ] 3 1 A B DCE H FG [,14] [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11

22 22 Predecessor constraints DE G A B C H F [4, ] [,14] 3 3 [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11 [9,12] 56789

23 23 Predecessor constraints [4, ] 3 1 A B DCE H FG [,14] [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11 [9,12] [12,14]

24 24 Successor constraints [4, ] 3 1 A B DCE H FG [,14] [5,9] [8,12] [9,12] [5,9][6,9] [5,8] 7 11 [9,12] [12,14] [4,6] 6789

25 25 Solving instances of the model Use constraints to establish: ·lower bound on length m of optimal schedule ·lower and upper bounds of variables Backtracking search ·maintains bounds consistency Pugets (1998) all-diff propagator and optimizations Lecontes (1996) optimizations ·branches on lower(x), lower(x)+1, … If no solution found, increment m and repeat search

26 26 Experimental results Embedded in Gnu Compiler Collection (GCC) Compared with: ·GCCs critical path list scheduling ·ILP scheduler (Wilken et al., 2000) SPEC95 floating point benchmarks ·compiled using highest level of optimization (-O3) Target processor: ·single-issue ·latency of 3 for loads, 2 for floating point, 1 for integer ops

27 27 Experimental results: SPEC95 floating point benchmarks Total basic blocks (BB) BB passed to CSP scheduler BB solved optimally by CSP scheduler BB with improved schedule Static cycles improved Total benchmark cycles CSP scheduling time (sec.) Baseline compile time (sec.) 7, ,

28 28 Scheduling time for CSP and ILP schedulers

29 29 Quantifying contributions of three model improvements Problems solved (/15)

30 30 Conclusions CP approach to local instruction scheduling ·single-issue processors ·arbitrary latencies Optimal and fast on very large, real problems ·experimental evaluation on SPEC95 benchmarks ·20-fold improvement over previous best approach Key was an improved constraint model

31 31 Good ideas not included Cycle cutsets (e.g., Dechter, 1990) ·most larger problems had small cutsets (2 to 20 nodes) that split problem into equal-sized independent subproblems Singleton consistency (e.g., Prosser et al., 2000) ·often reduced domains dramatically prior to search Symmetry breaking constraints ·many symmetric (non) schedules


Download ppt "Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University."

Similar presentations


Ads by Google