Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instruction Scheduling: Beyond Basic Blocks Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp.

Similar presentations


Presentation on theme: "Instruction Scheduling: Beyond Basic Blocks Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp."— Presentation transcript:

1 Instruction Scheduling: Beyond Basic Blocks Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make copies of these materials for their personal use.

2 Local Scheduling As long as we stay within a single block List scheduling does well Problem is hard, so tie-breaking matters  More descendants in dependence graph  Prefer operation with a last use over one with none  Breadth first makes progress on all paths  Tends toward more ILP & fewer interlocks  Depth first tries to complete uses of a value  Tends to use fewer registers Classic work on this is Gibbons & Muchnick

3 Local Scheduling Forward and backward can produce different results cbr cmpstore 1 store 2 store 3 store 4 store 5 add 1 add 2 add 3 add 4 addI loadI 1 lshiftloadI 2 loadI 3 loadI 4 Block from SPEC benchmark “go” Operation loadloadIaddaddIstorecmp Latency 112141 1 255555 77777 88888 Latency to the cbr Subscript to identify

4 Local Scheduling Int Mem 1 loadI 1 lshift 2 loadI 2 loadI 3 3 loadI 4 add 1 4 add 2 add 3 5 add 4 addIstore 1 6 cmpstore 2 7 store 3 8 store 4 9 store 5 10 11 12 13 cbr ForwardScheduleForwardSchedule Int Mem 1 loadI 4 2 addIlshift 3 add 4 loadI 3 4 add 3 loadI 2 store 5 5 add 2 loadI 1 store 4 6 add 1 store 3 7 store 2 8 store 1 9 10 11 cmp 12 cbr 13 BackwardScheduleBackwardSchedule Using latency to root as the priority

5 Local Scheduling Schielke’s RBF algorithm Run 5 passes of forward list scheduling and 5 passes of backward list scheduling Break each tie randomly Keep the best schedule  Shortest time to completion  Other metrics are possible ( shortest time + fewest registers ) In practice, this does very well Randomized Backward & Forward

6 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

7 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths  {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts  Moving an op out of B 1 causes problems abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

8 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths  {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts  Moving an op out of B 1 causes problems abcdabcd g c,e f hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 no c here !

9 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths  {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts  Moving an op out of B 1 causes problems  Must insert “compensation” code in B 3  Increases code space abcdabcd cgcg c,e f hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 This one wasn’t done for speed!

10 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths  {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts  Moving an op into B 1 causes problems abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

11 Scheduling Larger Regions Superlocal Scheduling Work EBB at a time Example has four EBBs Only two have nontrivial paths  {B 1,B 2,B 4 } & {B 1,B 3 } Having B 1 in both causes conflicts  Moving an op into B 1 causes problems  Lengthens {B 1,B 3 }  Adds computation to {B 1,B 3 }  May need compensation code, too  Renaming may avoid “undo f ” a b c d,f undo f g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 This makes the path even longer!

12 Scheduling Larger Regions Superlocal Scheduling How much can we get?  Schielke saw 11 to 12% speed ups  Constrained away compensation code Why was this harder than DVNT ?  DVNT moved information  Scheduling moves ops  DVNT moves forward  Scheduling moves both ways  Value tables partition nicely  Dependence graph does not abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 Value numbering is the best case for superlocal scope

13 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 Join points create blocks that must work in multiple contexts 2 paths 3 paths

14 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine  Single successor, single predecessor abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B 6a B 5a B3B3 jkjk B 5b ll B 6b B 6c

15 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine  Single successor, single predecessor abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B 6a B 5a B3B3 jkjk B 5b ll B 6b B 6c

16 Scheduling Larger Regions More Aggressive Superlocal Scheduling Clone blocks to create more context Some blocks can combine  Single successor, single predecessor Now schedule EBBs {B 1,B 2,B 4 }, {B 1,B 2,B 5q }, {B 1,B 3,B 5b }  Pay heed to compensation code Works well for forward motion Backward motion still has off-path problems  Speeding up one path can slow down others (undo) abcdabcd g efef hilhil jkljkl B1B1 B2B2 B4B4 B 5a B3B3 jkljkl B 5b

17 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges  Obtained by profiling abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3

18 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges  Obtained by profiling Pick the “hot” path abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 10 3 7 5 55 3 2 Block counts could mislead us — see B 5

19 Scheduling Larger Regions Trace Scheduling Start with execution counts for edges  Obtained by profiling Pick the “hot” path  B 1,B 2,B 4,B 6 Schedule it  Compensation code in B 3,B 5 if needed  Get the hot path right! If we picked the right path, the other blocks do not matter as much  Places a premium on quality profiles abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 10 3 7 5 55 3 2

20 Scheduling Larger Regions Trace Scheduling Entire CFG Pick & schedule hot path Insert compensation code Remove hot path from CFG Repeat the process until CFG is empty Idea Hot paths matter Farther off hot path, less it matters abcdabcd g efef hihi l jkjk B1B1 B2B2 B4B4 B6B6 B5B5 B3B3 10 3 7 5 55 3 2


Download ppt "Instruction Scheduling: Beyond Basic Blocks Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp."

Similar presentations


Ads by Google