Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Postdominance for Speculative Parallelization

Similar presentations


Presentation on theme: "Exploiting Postdominance for Speculative Parallelization"— Presentation transcript:

1 Exploiting Postdominance for Speculative Parallelization
Mayank Agarwal, Kshitiz Malik, Kevin Woley, Sam Stone, Matthew Frank Implicitly Parallel Architectures Group University of Illinois at Urbana-Champaign Originally in HPCA-13 Modified and Presented By: Borys Bradel Talk slowly … carefully

2 Outline Motivation Introduction PolyFlow Architecture Evaluations
Conclusions Talk slowly … carefully CARG March 14, 2007

3 Speculative Parallelization
Parallelize single-threaded applications Dynamically break execution into concurrent tasks Multi-threaded and multi-core systems Maintain sequential semantics A Current trend in microprocessor towards multi-core and multi-threaded single threaded performance not increasing. Speculative parallelization attempts to automatically parallelize hard sequential applications An important aspect is deciding how to choose the concurrent speculative tasks A B C D B PU1 PU2 PU3 PU4 C D CARG March 14, 2007

4 Task Extraction Policies
Identify possible points for task creation Critical to successful parallelization Desirable features Large set of possible tasks Restrict amount of speculation Exploit different kinds of parallelism Work for varying application behaviors Balance the benefit of speculative concurrency with costs of misspeculation Diff types of parallelism – Loop-level parallelism, memory level parallelism CARG March 14, 2007

5 Limitations of Branch Prediction
Branch mispredicts limit exploitable amount of ILP Superscalars discard all instrs fetched after mispred branch Not all need to be discarded Immediate postdominator Earliest control-equivalent point Control flow guaranteed to reconverge at E A B C D E Work on presentation F CARG March 14, 2007

6 Control-Equivalent Spawning
Start new task at Immediate PostDom of branch Spawn E as a new task at B Control-equivalent to B Main thread can speculate past B Spawned thread as (control) speculative as branch B A Spawner B C D Spawnee E Merge the two slides F CARG March 14, 2007

7 Control-Equivalent Spawning
PU1 PU2 PU3 A F E D B A C Task Spawn B Resolve Mispredict Spawned Task E D C F Reconnect A Resolve several mispredicts in parallel Reduce wastage from a single mispredict Task Spawn B Spawned Task Resolve Mispredict E D C F Reconnect CARG March 14, 2007

8 Managing Data Dependences
.. Branch Prod1 Prod2 Spawned tasks Control-equivalent to spawner Data dependent Restrict data speculation Delay dependent instructions Register and memory Until data becomes available Independent instructions can execute in parallel Spawner Spawned Task ... Cons1 Cons2 Cons3 CARG March 14, 2007

9 Control-Equivalent Parallelization
Spawn immediate postdominator of branch Task control-equivalent to spawner Benefits Subsumes heuristics based on program structures Better performance than hybrid heuristic policies Amenable to dynamic implementations Give some salient details about each point Spend some time CARG March 14, 2007

10 Outline Motivation Introduction PolyFlow Architecture Evaluations
Conclusions Talk slowly … carefully CARG March 14, 2007

11 Immediate Postdominator Spawns
Broad classification into 4 categories: Hammocks Loop fall-throughs Procedure fall-throughs Others Say sthg about “others” CARG March 14, 2007

12 Hammocks A ends in if-then-else branch Upon reaching A Merits
Main Task A ends in if-then-else branch D postdominates A Upon reaching A Spawn new task starting at D Main task resolves branch Merits Spawns across mispredicts Finds useful work beyond mispredicts Parallelize inner loops Not directly exploited in most systems A B C Imm PDom D Careful … not directly exploited E Spawned Task CARG March 14, 2007

13 Loop Fall-Throughs D ends in a loop branch Upon reaching D Merits:
Start new task at E Main task executes loop New task executes fall-through Merits: Exploit parallelism in outer loops Reduce wastage from mispredicted loop branch A Main Task B C D Change blue to a lighter color Imm PDom E Spawned Task CARG March 14, 2007

14 Procedure Fall-Throughs
Main Task C postdominates call instruction Upon reaching B Spawn new task at C Main task executes procedure New task executes fall-through Merits Spawns tasks in distant regions Warms up ICache A Proc X B call x Imm PDom C Spawned Task CARG March 14, 2007

15 Others Remaining immediate postdoms
Postdominators of indirect calls and jumps Complex control flow ~5-10% of static postdominators Important in several programs CARG March 14, 2007

16 Dynamic Spawn Distribution
- Hammock and Others constitute ~65% of dynamic spawns - Not captured by most Speculative Parallelization Systems CARG March 14, 2007

17 Twolf new_dbox_a Processor 1 Processor 2 Processor 3 Processor 4
spawn 9dbc spawn 9dc8 Processor 2 spawn 9dd8 Processor 3 spawn 9dec Processor 4 Processor 5 CARG March 14, 2007

18 Outline Motivation Introduction PolyFlow Architecture Evaluations
Conclusions Talk slowly … carefully CARG March 14, 2007

19 PolyFlow Task Spawn Unit if (nextPC==x) spawn y Fetch PC 1-8
Unified Scheduler Divert Queue Execute 1-8 Retire Flush CARG March 14, 2007

20 The PolyFlow Architecture
Speculative parallelization system Current evaluations on wide SMT core Extend SMT system with task spawn unit Manage task spawn, reconnection Learn dependence and handle misspeculation Use compiler-generated postdominators Passed as hints to dynamic system Stored in a separate “spawn hint cache” CARG March 14, 2007

21 Evaluation Environment
Baseline Superscalar 8-wide fetch/issue OOO core 64-entry scheduler, 512-entry ROB 8K 2-way assoc L1 ICache, 16K 4-way assoc L1 DCache 512K 8-way assoc L2 Cache Speculative Parallelization System 8-context SMT CARG March 14, 2007

22 Limitations Each thread can spawn one successor
Only outer most branch in if-else nest 512 entries in reorder buffer Cannot reclaim resources Limits parallelism Superscalar – fetch 1 taken branch per cycle PolyFlow – from 2 tasks per cycle, 1 taken branch/c CARG March 14, 2007

23 Outline Motivation Introduction PolyFlow Architecture Evaluations
Conclusions Talk slowly … carefully CARG March 14, 2007

24 Individual Spawn Heuristics
FT=fall through Don’t say extremely, much C-eq is NOT better than best individual heuristics No single heuristic suitable for all applications Control-equivalent spawning performs well overall CARG March 14, 2007

25 Hybrid Spawn Policies CARG March 14, 2007

26 Dynamic Implementation
Dynamic Reconvergence Analysis* Learns immediate postdominators dynamically Trains quickly Can Drive Control-Equivalent Spawning Spawn reconvergence point of branches Alternative to compiler hints * J. D. Collins et al, Control Flow Optimization Via Dynamic Reconvergence Prediction, MICRO 2004 CARG March 14, 2007

27 Outline Motivation Introduction Polyflow Architecture Evaluations
Conclusions Talk slowly … carefully CARG March 14, 2007

28 Conclusions Control-Equivalent Spawning For an SMT-based system
Reduces control speculation in spawned tasks Generalizes common heuristics For an SMT-based system Over twice the speedups of best heuristics Better than an aggressive hybrid policy Amenable to dynamic implementations CARG March 14, 2007

29 Thank You


Download ppt "Exploiting Postdominance for Speculative Parallelization"

Similar presentations


Ads by Google