Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Performance Tuning for Speculative Threads Yangchun Luo, Venkatesan Packirisamy, Nikhil Mungre, Ankit Tarkas, Wei-Chung Hsu, and Antonia Zhai Dept.

Similar presentations


Presentation on theme: "Dynamic Performance Tuning for Speculative Threads Yangchun Luo, Venkatesan Packirisamy, Nikhil Mungre, Ankit Tarkas, Wei-Chung Hsu, and Antonia Zhai Dept."— Presentation transcript:

1 Dynamic Performance Tuning for Speculative Threads Yangchun Luo, Venkatesan Packirisamy, Nikhil Mungre, Ankit Tarkas, Wei-Chung Hsu, and Antonia Zhai Dept. of Computer Science and Engineering Dept. of Electronic Computer Engineering University of Minnesota – Twin Cities

2 Motivation Multicore processors brought forth large potentials of computing power Exploiting these potentials demands thread-level parallelism 2 Intel Core i7 die photo

3 Exploiting Thread-Level Parallelism Potentially more parallelism with speculation 3 dependence Sequential Store *p Load *q Store *p Time p != q Thread-Level Speculation (TLS) Traditional Parallelization Load *q p != q ?? Store 88 Load 20 Parallel execution Store 88 Load 88 Speculation Failure Time Load 88 Compiler gives up p == q But Unpredictable

4 Addressing Unpredictability State-of-the-art approach: Profiling-directed optimization Compiler analysis Source code Profiling run Profiling info Region selection Parallel code generation Feedback Machine code A typical compilation framework 4 Speculative code optimization

5 Problem #1 0 1 2 3 3 Time signal allow commit wait spawning thread inter-thread synchronization speculation failure …… load imbalance 5 4 Hard to model TLS overhead statically, even with profiling

6 Problem #2 6 Profiling Behavior Training Input Actual Input Actual Behavior Profile-based decision may not be appropriate for actual input

7 Problem #3 7 Time phase1phase2 Metric phase3 Behavior of the same code changes over time

8 Problem #4 8 Load *p=0x88 Prefetching Extra misses Pollution Load 0x22 miss Load 0x22 hit Load 0x22 miss SequentialParallel PPP How important are cache impacts? L1$ P Load *p=0x88 P L1$ P P Load *p=0x80 Load *p=0x08

9 Impact on Cache Performance 9 60% load *p=0x88 Cache: p == p Prefetched (+) load *p=0x88 Cache impact is difficult to model statically A major loop from benchmark art (scanner.c:594) 2.5X

10 Our proposal Proposed Performance Tuning Framework Compiler AnalysisRuntime Analysis 10 Source code Profiling info Region selection Parallel code generation Machine code Speculative code optimization Accurate Timing & Cache Impacts Adaptations for inputs and phase changes Performance Evaluation Tuning Policy Compiler Annotation Runtime Information Dynamic Execution Performance Profile Table

11 Parallelization Decisions at Runtime 11 … Evaluate performance Thread spawningEnd of a parallel execution spawn parallelize spawn serialize … spawn Parallel executionSequential execution Parallel Execution New execution model facilitates dynamic tuning Performance Profile Table

12 Evaluating Performance Impact 12 Runtime Analysis Simple metric: cost of speculative failure Comprehensive evaluation: sequential performance prediction Performance Evaluation Tuning Policy Runtime Information Dynamic Execution Performance Profile Table

13 Sequential Performance Prediction 13 Predict PP Programmable Cycle Classifier Information from ROB Hardware Counters iFetch tick Cache Busy ExeStall Speculative Parallel Predicted sequential Busy iFetch ExeStall Cache TLS overhead Performance Impact

14 Cache Behavior: Scenario #1 14 TLS Count the miss in squashed execution load p (unmodified) miss load p hit Predicted load p hit SEQ load p miss T0 Squash

15 Cache Behavior: Scenario #2 15 TLS load p miss load p miss Predicted load p miss*2 SEQ load p miss store p invalidate p store p Not count a miss if tag matched invalid status Not count a miss if tag matched invalid status store p T0

16 Tune the performance 16 Exhaustive search Prioritized search Runtime Analysis Performance Evaluation Tuning Policy Runtime Information Dynamic Execution Performance Profile Table

17 How to prioritize the search 17 Search order: Inside-Out Compiler suggestion Outside-In Performance summary metric: Speedup or not? Improvement in cycles Search order PerformanceSummary metric Speedup or not? Inside-Out Compiler suggestion Improve. in cycles More Aggressive Quantitative +StaticHint Quantitative Simple Suboptimal

18 Evaluation Infrastructure Benchmarks SPEC2000 written in C, -O3 optimization Underlying architecture 4-core, chip-multiprocessor (CMP) speculation supported by coherence Simulator Superscalar with detailed memory model simulates communication latency models bandwidth and contention Detailed, cycle-accurate simulation C C P C P Interconnect C P C P 18

19 Comparing Tuning Policies 19 1.17x 1.23x 1.37x Parallel Code Overhead

20 Profile-Based vs. Dynamic 20 Dynamic outperformed static by 10% (1.37x/1.25x) 1.25x Potentially go up to 15% (1.46x/1.27x) 1.27x 1.46x

21 Related works: Region Selection Runtime Loop Detection [Tubella HPCA98] [Marcuello ICS98] Compiler Heuristics [Vijaykumar Micro98] 21 StaticallyDynamically Saving Power from Useless Re- spawning [Renau ICS05] Simple Profiling Pass [Liu PPoPP06] (POSH) Extensive Profiling [Wang LCPC05] Balanced Min-Cut [Johnson PLDI04] Profile-Based Search [Johnson PPoPP07] Lack of adaptability Lack of high-level information Dynamic Performance Tuning [Luo ISCA09] (this work) Dynamic Performance Tuning [Luo ISCA09] (this work)

22 Conclusions Deciding how to extract speculative threads at runtime Quantitatively summarize performance profile Compiler hints avoid suboptimal decisions Compiler and hardware work together. 37% speedup over sequential execution 10% speedup over profile-based decisions 22 Dynamic performance tuning can improve TLS efficiency Configurable HPMs enable efficient dynamic optimizations


Download ppt "Dynamic Performance Tuning for Speculative Threads Yangchun Luo, Venkatesan Packirisamy, Nikhil Mungre, Ankit Tarkas, Wei-Chung Hsu, and Antonia Zhai Dept."

Similar presentations


Ads by Google