Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT

Overview 4Branch Prediction Techniques 4Context Switching 4Compression of Branch Tables 4Simulation 4Hardware Model 4Results 4Analysis

Case for Branch Prediction 4Multiple instructions at one time 8Between 15 and 20 4Branches occur every 5 instructions 8if, while, for, function calls, etc. 4Stalling pipeline is unacceptable 8Lose all advantage of multiple instruction issue

Context Switch Time 4Cause program execution to be paused 8State of program is saved 8New program is executed 4Eventually, original program begins executing again 4Not all of the CPU state is saved 8Such as the branch predictor tables

Context Switch Time 41 set of branch predictor state 4Context switch causes a new application to use the previous application’s branch predictor state 8Degrades performance for all applications 4Solution: Save the state of the branch predictor at context switch time

Saving Branch State Table 4Simple branch predictors still have large number of bits 4Storing and restoring the branch predictor should not take too long 8Lose the gain of storing/restoring if it takes longer than the “warm-up” time of the branch predictor

Compression 4Compression is the key 8Requires less storage 4Needs to be done carefully 8Some lossless compression schemes can inflate number of bits 8Luckily, lossy compression is acceptible

Semi-Lossy Compression 4Applies to 2-bit predictors 4Key is to store just taken/not-taken state 8Ignores strong/weak S S TTT NT W W

Semi-Lossy Decompression SNTWNT WTST NT T T T T T

Lossy Compression 4Branch prediction is just an educated guess 4Achieve higher compression ratio if some information is lost 4Majority rules 8Used by correlating branch predictor

Lossy Compression TT T T T T T T T T T T T T T T T NT T T T T 4x

Lossy Decompression 4Reinitialize all elements for an address to the stored value 4Best case -- all elements are correct 4Worst cast -- 50% of elements are correct 4Remember: Branch predictors are just educated guesses

Simulation 4Modified SimpleScalar’s sim-bpred to support context switching 8Not necessary to actually switch between programs 8On context switch, corrupt branch predictor table according to a “dirty” percentage to simulate another program running

Simulation 4Testing compression/decompression becomes simple 8Instead of corrupting branch predictor table, replace entries with the value after compression/decompression 8Testing with: 22-bit semi-lossy compression 24-bit lossy compression 28-bit lossy compression

Hardware Model 4Compression and decompression blocks are fully pipelined 4Compression and decompression blocks can handle n bits of compressed data at a time 4Compression and decompression occur simultaneously

Hardware Model 4Utilize data independence 8Compress 128 bits into 64 bits at one time 8Pipeline overhead should be minimal compared to clock cycle savings

Programs Simulated 4Several SPEC2000 CINT200 programs simulated 8164.gzip Compression 8175.vpr FPGA Place and route 8181.mcf Combinatorial Optimization 8197.parser Word Processing 8256.bzip2 Compression

Predictor Types 42048 entry bimodal predictor (4096 bits) 44096 entry bimodal predictor (8192 bits) 41024 entry two-level predictor with 4-bit history size (16384 bits) 44096 entry two-level predictor with 8-bit history size (1048576 bits) 48192 entry two-level predictor with 8-bit history size (2097152 bits)

2048 Entry Bimodal Predictor

4096 Entry Bimodal Predictor

1024 entry two-level predictor with 4- bit history size

Timing Comparison Miss Penalty 10 clock cycles Bandwidth 64 bits per clock cycle

Timing Equations General Timing Equation Special Case for ratio of 0

Summary 4Dynamic Branch Prediction is necessary for modern high-performance processors 4Context switches reduce the effect of dynamic branch prediction 4Naïvely saving the branch predictor state is costly

Summary 4Compression can be used to improve the cost of saving branch predictor state 4Higher compression ratios improve fixed save/restore time at a cost of increasing the number of mispredictions 8For low frequency context switches, yields an improvement in performance

Questions

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

Similar presentations

Presentation on theme: "Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT.

Similar presentations

Presentation on theme: "Dynamic Branch Prediction During Context Switches Jonathan Creekmore Nicolas Spiegelberg T NT."— Presentation transcript:

Similar presentations

About project

Feedback