Presentation is loading. Please wait.

Presentation is loading. Please wait.

Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas.

Similar presentations


Presentation on theme: "Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas."— Presentation transcript:

1 Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas Yiapanis, Adam Pocock, Gavin Brown, Mikel Lujan, Ian Watson, and Marcelo Cintra University of Edinburgh cts/VESPA University of Manchester cts/iTLS

2 Intl. Symp. on Workload Characterization - December Introduction  Thermal/power constraints, complexity and time-to- market reasons lead to CMPs  Many simple cores = high TLP but low ILP –Ok for throughput computing and embarrassingly parallel applications  Problem: –No benefits for sequential applications –Parallel applications with large sequential parts are still limited by Amdahl  => Thread Level Speculation (TLS)

3 Intl. Symp. on Workload Characterization - December Modivation  Shortcoming of prior work in assessing TLS performance potential –Evaluations often tied to particular TLS architectural configuration –Proposals of new extensions naturally focused on particular extensions not investigating interplay with other features –Workload choice often limited to one particular domain or programming style

4 Intl. Symp. on Workload Characterization - December Contributions  In-depth implementation-independent study of TLS performance potential  Evaluate TLS architectural features  Evaluate workloads from a variety of domains  Investigate load imbalance and coverage within the context of TLS

5 Intl. Symp. on Workload Characterization - December Outline  Introduction  Background  Methodology  Results  Conclusions

6 Intl. Symp. on Workload Characterization - December Thread Level Speculation  Compiler deals with: –Task selection –Code generation  HW deals with: –Different context –Spawn threads –Detecting violations –Replaying –Arbitrate commit Thread 1 Thread 2 Speculative Time

7 Intl. Symp. on Workload Characterization - December Architectural Extensions  Multiversioned caches  Support for out-of-order spawning  Dynamic dependence synchronization  Intermediate checkpointing  Data value prediction

8 Intl. Symp. on Workload Characterization - December Outline  Introduction  Background  Methodology  Results  Conclusions

9 Intl. Symp. on Workload Characterization - December Methodology  Benchmarks –Imperative:  SPEC CPU 2006  Mediabench II  Instrumentation –GCC4 pass  Annotate loop iterations and method bodies  Mark induction, reduction variables and use of return values  Operate after the intermediate optimizations –Object oriented:  SPEC JVM 98  DaCapo –Jikes RVM modification

10 Intl. Symp. on Workload Characterization - December Methodology  Trace Generation –Simics, full-system functional simulator –Non-intrusive trace of memory accesses  Trace-Driven Simulation –In-house Simulator-tool  Extracts threads out of loop iterations and/or method call cont.  Simulates: multi-versioned caches, OoO spawning, dynamic dependence synch, and value prediction

11 Intl. Symp. on Workload Characterization - December Methodology  Task Selection –In-order loop-level speculation  Innermost loops  Best loops out of three dynamic depth levels –In-order method and Out-of-Order speculation  Dynamic thread spawning policy favoring safer threads  Maximum thread size heuristic –All loops and/or methods are candidates

12 Intl. Symp. on Workload Characterization - December Outline  Introduction  Background  Methodology  Results  Conclusions

13 Intl. Symp. on Workload Characterization - December Loop-level speculation - Innermost Iter. 1 Iter. 2 Speculative Iter. n … for(i=0;i

14 Intl. Symp. on Workload Characterization - December Loop-level speculation - Innermost

15 Intl. Symp. on Workload Characterization - December Iter. 1 Iter. 2 Speculative Iter. n for(i=0;i

16 Intl. Symp. on Workload Characterization - December Loop-level speculation – Best loop depth

17 17 Method-level speculation - In-Order method Cont. Speculative pid = spawn_thread(); If(pid !=0) method(); method _Cont.

18 Intl. Symp. on Workload Characterization - December Method-level speculation - In-Order

19 19 Method-level speculation - OoO method1 method2 Cont. Speculative pid = spawn_thread(); If(pid !=0) method1(); method1 _Cont. method1() { method1_body1 pid = spawn_thread(); If(pid !=0) method1(); method2_cont } method1 Cont. Time

20 Intl. Symp. on Workload Characterization - December Method-level speculation - OoO

21 Intl. Symp. on Workload Characterization - December Mixed speculation - In-Order

22 Intl. Symp. on Workload Characterization - December Mixed speculation - OoO

23 Intl. Symp. on Workload Characterization - December Load Imbalance and Coverage

24 Intl. Symp. on Workload Characterization - December Results – Multi-versioning to the rescue?

25 Intl. Symp. on Workload Characterization - December Outline  Introduction  Background  Methodology  Results  Conclusions

26 Intl. Symp. on Workload Characterization - December Conclusions  Load imbalance and limited coverage important factors in realizing TLS performance  Support for OoO spawning not providing significant benefits for the task policy employed  Multi-versioned caches unlock performance in some cases but not panacea  Task selection critical

27 Intl. Symp. on Workload Characterization - December Also in the paper  In-depth analysis of high coverage loops for selected benchmarks  Comparison of TLS loop-level speculation with a state- of-the-art auto-parallelizing compiler  OoO Loop-level speculation  Outline most of the proposed architectural and compiler extensions for TLS systems

28 Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas Yiapanis, Adam Pocock, Gavin Brown, Mikel Lujan, Ian Watson, and Marcelo Cintra University of Edinburgh cts/VESPA University of Manchester cts/iTLS

29 Intl. Symp. on Workload Characterization - December Backup slides – Auto parallelizing compiler comparison

30 Intl. Symp. on Workload Characterization - December Backup slides – OoO loop


Download ppt "Toward a More Accurate Understanding of the Limits of the TLS Execution Paradigm Nikolas Ioannou, Jeremy Singer, Salman Khan, Polychronis Xekalakis, Paraskevas."

Similar presentations


Ads by Google