Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.

Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage Thierry.Lafage@irisa.fr September 2000

Thierry Lafage2 Introduction Microarchitecture simulation: –Accurate, but slow (execution  1000-10000) –“On-the-fly” (vs. trace-driven): Enables execution-driven simulation (complex microprocessors) Simulation of long running workloads Complete microprocessor simulation requires: –Realistic workloads and working sets –Huge amount of CPU time

September 2000Thierry Lafage3 Realistic simulations in an affordable time  simulations of a reduced number of instructions : One “big slice” (eg. after program start-up phase) Trace sampling Introduction (2)  Representativeness of the simulated execution slices? On-the-fly simulations  fast forwarding  Current tools “fast” forwarding mode: >20  execution slowdown 01.5B. 1B. 500M.... 01B.500M.

September 2000Thierry Lafage4 Outline 1. Speeding up the fast forwarding mode –Approach –Implementation –Performance on the SPEC95 benchmarks –Conclusion 2. Selecting representative execution slices –Approach –Application to data cache simulations –Conclusion Conclusion and Future Work

September 2000Thierry Lafage5 Speeding up the fast forwarding mode Two execution modes: A really fast mode (static code annotation)  Rapid positioning of the execution where to begin the simulation with direct execution An emulation mode (embedded instruction- set emulator)  Calls to analysis routines (user provided)  At run time: Dynamic switches between both modes

September 2000Thierry Lafage6 DICE Host ISA Emulator User analysis routines Implementation Original code SPARC V9 assembly code calvin2 Static Code Annotation Tool checkpoint Switching event Emulation mode Switching event

September 2000Thierry Lafage7 Performance on the SPEC95 Benchmarks calvin2+DICE: –Average slowdown in fast mode: 1.31 (checkpoints at procedure calls and inside loops) –Average slowdown in emulation mode (instruction and data addresses trace): 117.47 Shade (instruction and data address generation enabled): –Average slowdown in “fast forward” mode: 17.07 (empty analysis routine) –Average slowdown in emulation mode: 82.19 (tracing analysis routine)

September 2000Thierry Lafage8 A Simple Example of Microprocessor Simulation Simulation of 1% of a 1 hour workload Additional  1000 slowdown Direct ExecutionEmulation + Simulation  With calvin2+DICE: 0.99  1.31 + 0.01  (117.45 + 1000) = 12.5 hours Fast ForwardEmulation + Simulation  With Shade: 0.99  17.07 + 0.01  (82.19 + 1000) = 27.7 hours

September 2000Thierry Lafage9 Conclusion for calvin2+DICE Performance of the emulator: not an issue Overall performance given by the performance of the fast forwarding mode (long running workloads)  calvin2+DICE enables simulations on slices spread over a whole application

September 2000Thierry Lafage10 Outline 1. Speeding up the fast forwarding mode –Approach –Implementation –Performance on the SPEC95 benchmarks –Conclusion 2. Selecting representative execution slices –Approach –Application to cache simulations –Conclusion Conclusion and Future Work

September 2000Thierry Lafage11 On-the-fly simulations using realistic applications in an affordable time  simulations of a reduced number of instructions –Before: one “big slice” (after program start-up phase) –With calvin2+DICE: on-the-fly statistical sampling Number of simulated instructions often determined by: –The simulation time –Empirical results Introduction  Representativeness of the simulated instructions? 01B.500M. 01.5B. 1B. 500M....

September 2000Thierry Lafage12 Our Approach Dynamic characterization of the target programs Select representative execution slices for simulations (classification) Aim:  Tune a per-program amount of simulated activity  Reduce simulation time or increase simulation result accuracy

September 2000Thierry Lafage13 Dynamic Characterization of the Target Programs 0 1 2 N Execution Slices Program Characterization Metrics independent from the implementation detail of the simulated components

September 2000Thierry Lafage14 Selection of Representative Execution Slices 01234 Hierarchical Classification 02341 {2,1,3},{0,4} Two slices selected

September 2000Thierry Lafage15 Selection of Class Representatives  Wmdc indicator: weighted mean of distances from class centers Class centers Class representatives

September 2000Thierry Lafage16 Application to the Data Stream Data stream characterization: –Temporal locality: data reuse distances –Spatial locality: data reuse distances with several line sizes Data reuse distance (in instructions) Relative frequency (%)

September 2000Thierry Lafage17 Results for Trained Cache Simulations on the SPEC95 Benchmarks Cache configurations: 4-way set associative, LRU write back, write allocate  sizes from 4KB to 512KB  line sizes from 16B to 128B

September 2000Thierry Lafage18 Conclusion for representative slice selection Similar results with: –Branch characterization for branch predictor simulations –Data stream characterization, branch characterization, instruction mix and basic block sizes for data cache simulations and branch predictor simulations  Program characterization actually helps in tuning the amount of simulated activity

September 2000Thierry Lafage19 General Conclusion calvin2+DICE enables simulations on slices spread over a whole application Our approach enables to select representative execution slices Future Work Complete execution-driven simulations (complex microprocessor) Operating system activity: LiKE, a Linux Kernel Emulator

September 2000Thierry Lafage20 Static Code Annotation with calvin2 Light instrumentation: –Use of the S ALTO library (assembly language level) –Instrumentation of SPARC V9 code Checkpoint code insertion: –At each beginning of procedure –Inside each loop

September 2000Thierry Lafage21 DICE: A Dynamic Inner Code Emulator Host and target: SPARC V9 ISA Architectural resources (registers) modeling DICE: an archive library –Able to receive control or return to direct execution at any moment –Access to complete target program state (registers, memory, …) User-defined analysis routines called for each emulated instruction (trace information passed as parameter)

September 2000Thierry Lafage22 LiKE: A Linux Kernel Emulator Derived from DICE Host and target: SPARC V9 ISA (full 64 bits) Dynamically loaded module Receive control at the beginning of the system calls Return to direct execution at the end of system calls Not yet implemented: Support for all system calls and other OS. activity Interface with on-the-fly simulator shared with user-space emulated program Full debugging

Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.

Similar presentations

Presentation on theme: "Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.

Similar presentations

Presentation on theme: "Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000."— Presentation transcript:

Similar presentations

About project

Feedback