Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/

Similar presentations


Presentation on theme: "Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/"— Presentation transcript:

1 Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/

2 2 Outline Background Background Existing Solutions Existing Solutions Optimal Solution Optimal Solution Experimental Results Experimental Results Summary and Future Work Summary and Future Work

3 3 Overview “Instruction Scheduling is the most fundamental ILP-oriented phase”. [Josh Fisher et al., “Embedded Computing”] “Instruction Scheduling is the most fundamental ILP-oriented phase”. [Josh Fisher et al., “Embedded Computing”] Scheduler tries to find an instruction order that minimizes pipeline stalls Scheduler tries to find an instruction order that minimizes pipeline stalls Schedule must preserve program’s semantics and honor hardware constraints Schedule must preserve program’s semantics and honor hardware constraints

4 4 Elements of Instruction Scheduling Region Formation Region Formation Schedule Construction (the focus of our research) Schedule Construction (the focus of our research)

5 5 Region Formation Scheduler’s scope is a sub-graph of the program’s control flow graph (CFG) Scheduler’s scope is a sub-graph of the program’s control flow graph (CFG) Local scheduling: single basic block Local scheduling: single basic block Global scheduling: multiple basic blocks: Global scheduling: multiple basic blocks: Trace Trace Superblock and hyperblock Superblock and hyperblock Treegion Treegion General acyclic: e.g. Wavefront (2000) General acyclic: e.g. Wavefront (2000)

6 6 Schedule Construction NP-Hard problem for realistic machines NP-Hard problem for realistic machines Heuristic Solutions: Virtually all production compilers and most research Heuristic Solutions: Virtually all production compilers and most research Optimal Approaches: Recent research Optimal Approaches: Recent research Local: Integer Programming and enumeration Local: Integer Programming and enumeration Global: Integer Programming Global: Integer Programming

7 7 The Superblock Single-entry multiple-exit sequence of basic blocks Single-entry multiple-exit sequence of basic blocks Data and control dependencies and allowed code motions are represented by a Directed Acyclic Graph (DAG) Data and control dependencies and allowed code motions are represented by a Directed Acyclic Graph (DAG)

8 8 B E G C D I F H 0.3 0.2 0.5 A 11 11 0 3 0 1 3 0 Example Superblock DAG ABCABC G H I 0.3 0.2 ABCABC DEFDEF

9 9 List Scheduling Most common method in practice Most common method in practice Approximate greedy algorithm that runs fast in practice Approximate greedy algorithm that runs fast in practice Data-ready instructions stored in a priority list Data-ready instructions stored in a priority list Priorities assigned according to heuristics Priorities assigned according to heuristics If ready list is not empty, schedule top priority instruction If ready list is not empty, schedule top priority instruction Else schedule a stall Else schedule a stall Advance to next issue slot Advance to next issue slot

10 10 Critical-Path Heuristic B E G C D I F H 0.3 0.2 0.5 A 11 11 0 3 0 1 3 0 5 0 4 3 3 1 4 3 0 Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I

11 11 Superblock Heuristics Critical Path Critical Path Successive Retirement Successive Retirement Dependence height and speculative yield (DHASY) Dependence height and speculative yield (DHASY) G* G* Speculative Hedge Speculative Hedge Balance Scheduling Balance Scheduling

12 12 Optimal Scheduling Can make improvement over heuristics Can make improvement over heuristics Accurate heuristic methods are already complex Accurate heuristic methods are already complex In some applications, longer compile times can be tolerated In some applications, longer compile times can be tolerated Reference for evaluating accuracy of heuristics and studying ILP limits Reference for evaluating accuracy of heuristics and studying ILP limits

13 13 Objective S : A given schedule P i : Probability of exit i D i : Delay of exit i from its lower bound L i E : # of side exits Find a schedule with minimum cost

14 14 B E G C D I F H 0.3 0.2 0.5 A 11 11 0 3 0 1 3 0 [0,0] [6,7] [1,2] [2,3] [3,4] [3,6] [1,4] [2,5] [8,8] Cycle Instruction 0 A 1 B 2 G 3 C 4 D 5 H 6 E 7 F 8 I Cost Function Example: CP Cost = 0.3*1 + 0.2*1 + 0.5*0 = 0.5

15 15 Heuristic Solution Lower Bounds Cost = 0 YES NO Optimal Algorithm Fix BranchesEnumerate Feasible Done YES NO

16 16 Enumeration List scheduling with backtracking List scheduling with backtracking Explores one target length at a time Explores one target length at a time A subset of instructions can be fixed A subset of instructions can be fixed Branch-and-Bound approach with four feasibility tests (pruning techniques) Branch-and-Bound approach with four feasibility tests (pruning techniques) - Node superiority - LB tightening - History-based domination - Relaxed Scheduling

17 17 Enumeration Example I2I2 I3I3 I1I1 I4I4 I5I5 2 2 2 2 I1I1 I2I2 I3I3 stall I2I2 I3I3 I4I4 I5I5 Infeasible! Backtrack Target length = 4

18 18 Branch Combinations & Subset Sum Branch Combination Problem is NP- Complete! Branch Combination Problem is NP- Complete! Can be reduced to Subset Sum Can be reduced to Subset Sum In practice, the number of branches and ranges are small. In practice, the number of branches and ranges are small. Solved efficiently using Dynamic Programming Solved efficiently using Dynamic Programming

19 19 B E G C D I F H 0.3 0.2 0.5 A 11 11 0 3 0 1 3 0 [0,0] [6,7] [1,2] [2,3] [3,4] [3,6] [1,4] [2,5] [8,8] Start with CP heuristic Cost = 0.5 Only length 8 is interesting Branch Comb C F Cost (0, 0) 2 6 0.0 (0, 1) 2 7 0.2 (1, 0) 3 6 0.3 Complete Example

20 20 0 : A 1 : B 2 : C 3 : D 4 : G 5 : E A Relaxed Sched H X ? Infeasible Branch Combination (0,0) Cost = 0.0 B E G C D I F H 0.3 0.2 0.5 A 11 11 0 3 0 1 3 0 [0,0] [6,6] [1,1] [2,2] [3,3] [3,5] [1,4] [2,5] [8,8]

21 21 A G E D E H E E F I H B C G D G Optimal Schedule A, B, C, G, D, H, E, F, I with cost 0.2 B E G C D I F H 0.3 0.2 0.5 A 11 11 0 3 0 1 3 0 [0,0] [7,7] [1,1] [2,2] [3,4] [3,6] [1,4] [2,5] [8,8] Branch Combination (0,1) Cost = 0.2

22 22 Experimental Results Superblocks imported from GCC using SPEC CPU2000, FP and INT Superblocks imported from GCC using SPEC CPU2000, FP and INT Scheduled for 4 machine models: Scheduled for 4 machine models: single-issue single-issue dual-issue dual-issue quad-issue quad-issue six-issue. six-issue. Time limit set to 1 second per problem Time limit set to 1 second per problem

23 23 Superblock Statistics FP2000INT200 MaxAvgMaxAvg DAG Size 12362445417 Exit Count 312.8423.3 Final-Exit Probability (%) 99689966 Side-Exit 48174914

24 24 INT2000 Results Issue Rate 1246Avg Hard Blocks 2513213116855731726 %Timeouts1.40.81.10.91.1 Avg Soln Time (ms) 55997 %Improved Blocks 8570828179 % Cycle Improvement 2.92.43.54.13

25 25 Summary & Future Work An optimal superblock scheduling technique has been developed An optimal superblock scheduling technique has been developed About 99% of hard problems solved within 1 sec About 99% of hard problems solved within 1 sec 80% improved 80% improved Next Goal: explore other global regions. Trace is strongest candidate Next Goal: explore other global regions. Trace is strongest candidate


Download ppt "Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis www.ece.ucdavis.edu/aco/"

Similar presentations


Ads by Google