Increasing the Energy Efficiency of TLS Systems Using Intermediate Checkpointing Salman Khan 1, Nikolas Ioannou 2, Polychronis Xekalakis 3 and Marcelo Cintra 2 1 University of Manchester 2 University of Edinburgh 3 Intel Labs Barcelona - UPC
HiPC Introduction Power efficiency, complexity and time-to-market reasons lead to CMPs Problem: –No benefits for sequential applications –Even for mostly parallel applications Amdahl’s Law limits performance gains with many cores Solution: Thread Level Speculation(TLS) –But performance through TLS costs in energy Can we reduce the wastefulness of re-execution due to misspeculation without losing performance?
3 Key Contributions Propose checkpointing to improve efficiency of speculative execution Evaluate dependence prediction techniques to guide checkpoint placement Our approach results in an energy saving of up to 14%, with 7% on average over normal TLS execution, with no significant effect on speedup. HiPC 2011
4 Outline Introduction Checkpointing Dependence Predictors Checkpointing Policy Experimental Setup and Results Conclusions HiPC 2011
Thread Level Speculation 5HiPC 2011
Thread Level Speculation with Checkpointing 6HiPC 2011
7 Outline Introduction Checkpointing Dependence Predictors Checkpointing Policy Experimental Setup and Results Conclusions HiPC 2011
Placing Checkpoints Stride Dependence Prediction –Address based –Program Counter Based –Hybrid HiPC 20118
Dependence Prediction HiPC 20119
Hybrid Dependence Predictor HiPC
11 Outline Introduction Checkpointing Dependence Predictors Checkpointing Policy Experimental Setup and Results Conclusions HiPC 2011
Placing Checkpoints Limited number of checkpoints Placing a checkpoint has a cost Checkpointing on every positive prediction results in too many checkpoints HiPC
13 Outline Introduction Checkpointing Dependence Predictors Checkpointing Policy Experimental Setup and Results Conclusions HPCA 2010
Setup Simulator, Compiler and Benchmarks: –SESC ( –POSH (Liu et al. PPoPP ‘06) –Spec 2000 Int. Architecture: –Four way CMP, 4-Issue cores –16KB L1 Data (multi-versioned) and Instruction Caches –1MB unified L2 Caches –Cycles from Violation to Kill/Restart: 12 –Cycles to Spawn: 12 HiPC
Measuring Dependence Prediction HiPC
ICS
HiPC Wasted Instructions: Unnecessarily squashed instructions.
HiPC
HiPC
20 Outline Introduction Checkpointing Dependence Predictors Checkpointing Policy Experimental Setup and Results Conclusions HPCA 2010
Conclusions Effective checkpointing improves the efficiency of TLS Placing checkpoints by stride is not sufficient to reduce waste significantly Checkpointing using dependence predication obtains energy saving of up to 14%, with 7% on average over normal TLS execution, with no significant effect on speedup. HiPC
Read the paper for… Complete results Microarchitectural issues that arise from checkpointing running tasks Modified squash/restart mechanism that is needed to avoid performance degradation from checkpointing HiPC