Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.

Similar presentations


Presentation on theme: "Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information."— Presentation transcript:

1 Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information Systems (ELIS) Ghent University, Belgium CAECW’01, January 21, 2001

2 2 Outline Introduction Statistical Simulation –Statistical profiling –Synthetic trace generation Methodology Evaluation Conclusion

3 3 Introduction Architectural simulation –trace-driven or execution-driven –accurate –long simulation times –long traces to be stored Need for fast simulation techniques –take part of a full trace –analytical modeling –trace sampling –statistical simulation

4 4 Goal Previous work used SPEC benchmarks to evaluate statistical simulation In this talk we use both commercial and scientific workloads –SPECint, SPECfp, system traces, multimedia, X graphics, database

5 5 Statistical Simulation Three steps: –extract statistical profile from a program execution –generate synthetic trace from it –simulate on a trace-driven simulator Two major advantages: –statistical profile is more compact than full trace –fast simulation due to statistical nature  design space exploration in limited time

6 6 statistical profile Statistical Simulation real trace (e.g. SPEC benchmark) branch profiling cache profiling instruction profiling branch statistics cache statistics instruction statistics synthetic trace generator synthetic trace trace-driven simulator

7 7 Statistical Profiling Microarchitecture-independent statistics –instruction statistics Microarchitecture-dependent statistics –branch statistics –cache statistics Result: statistical simulation only to explore design options of processor core (cache and branch predictor are fixed)

8 8 Statistical Profiling Instruction Statistics Instruction mix (13 classes) Number of register operands Age of register operands –probability that register operand was produced  instructions before it in the trace (only RAW) Memory dependencies –probability that load is memory-dependent on the  -th store before it in the trace (only RAW)

9 9 Statistical Profiling Branch Statistics Six branch types –conditional branch, unconditional branch, call with offset, indirect jump, indirect call, return Distinction –branch prediction accuracy: refill pipeline on branch misprediction –branch target prediction accuracy: single- cycle bubble in pipeline on correct branch prediction but target misprediction

10 10 Statistical Profiling Cache Statistics D-cache statistics –L1 D-cache miss rate –L2 D-cache miss rate I-cache statistics –L1 I-cache miss rate –L2 I-cache miss rate

11 11 Synthetic Trace Generation Instruction-by-instruction through random number generation Determine instruction type number of operands age of register operands memory dependency branch behavior D-cache behavior I-cache behavior st add ld br mispredicted D-cache miss I-cache miss

12 12 Methodology: microarchitecture Out-of-order processor –8 and 16 issue –windows of 64 and 128 instructions McFarling branch predictor ‘small’ cache configuration –8KB DM L1 I-cache, 8KB DM L1 D-cache, 64KB 2WSA unified L2 cache ‘large’ cache configuration –32KB DM L1 I-cache, 64KB 2WSA L1 D-cache, 512KB 4WSA unified L2 cache Access time –L1 I-cache (1 cycle), L1 D-cache (2 cycles), L2 cache (10 cycles), main memory (80 cycles)

13 13 Methodology: benchmarks 8 SPECint95 benchmarks 5 SPECfp95 benchmarks (hydro2d, su2cor, swim, tomcatv, wave5) 8 IBS system traces (mpeg, jpeg, gs, verilog, gcc, sdet, nroff, groff) 4 MediaBench applications (g721, gs, gsm, mpeg2) 4 X graphics benchmarks (DooM, POVRay, Xanim, Quake) 2 TPC-D queries running on Postgres 6.3  ~ 200 million instructions / trace

14 14 Evaluation IPC prediction error = IPC real trace - IPC synthetic trace IPC real trace IPC real trace = IPC when running real trace on trace-driven simulator IPC synthetic trace = IPC when running synthetic trace generated from the statistical profile of the real trace Simulation speed: s IPC /x IPC less than 1% after simulating 1 million instructions

15 15 IPC prediction error (1) 157%135% -30% -20% -10% 0% 10% 20% 30% 40% hydro2d su2cor swim tomcatv wave5 mpeg jpeg gs verilog real_gcc sdet nroff groff g721_e gs gsm_e mpeg2 xanim xdoom xpovray xquake tpc-d.17 tpc-d.2 IPC prediction error SPECint95SPECfp95IBSMediaBenchX graphicsTPC-D li gcc compress go ijpeg vortex m88ksim perl 16-issue, 128-entry window, ‘small’ cache configuration high D-cache miss rate high D-cache miss rate

16 16 IPC prediction error (2) -30% -20% -10% 0% 10% 20% 30% li gcc compress go ijpeg vortex m88ksim perl hydro2d su2cor swim tomcatv wave5 mpeg jpeg gs verilog real_gcc sdet nroffgroff g721_e gs gsm_e mpeg2 xanim xdoom xpovray xquake tpc-d.17 tpc-d.2 IPC prediction error SPECint95SPECfp95IBSMediaBenchX graphicsTPC-D 16-issue, 128-entry window, ‘large’ cache configuration

17 17 IPC prediction error vs. static instruction count -40% -20% 0% 20% 40% 60% 80% 100% 120% 140% 160% 020000400006000080000100000120000140000160000 static instruction count (number of instructions executed at least once) IPC prediction error w = 64; i = 8; 'small' cache w = 128; i = 16; 'small' cache w = 64; i = 8; 'large' cache w = 128; i = 16; 'large' cache DooM Quake DooM Quake gs (IBS) gcc gcc (IBS) mpeg (IBS) groff mpeg (IBS) groff nroff jpeg (IBS) verilog sdet nroff jpeg (IBS) verilog sdet TPC-D vortex go vortex go

18 18 Conclusion (1) Higher IPC prediction errors for applications with smaller static instruction count: –MediaBench applications –SPECfp95 benchmarks –2 X graphics benchmarks (POVRay and Xanim) –5 SPECint95 benchmarks

19 19 Conclusion (2) Smaller IPC prediction errors for applications with larger instruction footprint: –IBS system traces –TPC-D traces –2 X graphics benchmarks (DooM and Quake) –3 SPECint95 benchmarks (go, gcc, vortex)  IPC prediction error between -1% and 25%

20 20 Conclusion (3) Statistical simulation is a useful fast simulation technique for commercial workloads –due to higher variability in instructions –since commercial workloads have larger instruction footprint –which makes a statistical technique more powerful


Download ppt "Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information."

Similar presentations


Ads by Google