Download presentation
Presentation is loading. Please wait.
Published byCaren Hopkins Modified over 9 years ago
1
- 1 - Copyright © 2006 Intel Corporation. All Rights Reserved. Techniques for Speeding up Pin-based Simulation Harish Patil
2
- 2 - Copyright © 2006 Intel Corporation. All Rights Reserved. IS : High-level techniques for speeding up Pin-based simulation IS : High-level techniques for speeding up Pin-based simulation IS Not : low-level optimizations (in- lining etc.) of Pintools IS Not : low-level optimizations (in- lining etc.) of Pintools Two usage models Objective Pin-tool Simulator Pin-toolSimulator
3
- 3 - Copyright © 2006 Intel Corporation. All Rights Reserved. Outline Two techniques: Two techniques: 1. Selective simulation 2. Conditional instrumentation PinPoints : Selecting simulation regions with Pin and SimPoint PinPoints : Selecting simulation regions with Pin and SimPoint Case Study: Pin SimpleScalar.x86 Case Study: Pin SimpleScalar.x86
4
- 4 - Copyright © 2006 Intel Corporation. All Rights Reserved. Instruction Counts : Some IPF Applications
5
- 5 - Copyright © 2006 Intel Corporation. All Rights Reserved. Problem: Whole-Program Simulation is Slow
6
- 6 - Copyright © 2006 Intel Corporation. All Rights Reserved. Solution: Select Simulation Points Select One Point Select One Point –At the beginning (no skip) –After 1 billion instructions –After skipping a random number of instructions Select Multiple Points Select Multiple Points –Manually by looking at performance data –Randomly anywhere –Randomly from uniform regions –By program phase analysis (SimPoint : UCSD) –Fine-grain sampling (SMARTS: CMU) Fast-forwardSimulationFast-forwardSimulation
7
- 7 - Copyright © 2006 Intel Corporation. All Rights Reserved. How Pin Supports Selective Simulation? Class CONTROL : in InstLib/control.H (via instlib.H) Pintool includes the class and provides a “Handler” for “start and end of region” Class CONTROL : in InstLib/control.H (via instlib.H) Pintool includes the class and provides a “Handler” for “start and end of region” Provides a number of switches: Provides a number of switches: –For specifying “start of region” -skip -start_address … –For specifying “end of region” -length -stop_address …
8
- 8 - Copyright © 2006 Intel Corporation. All Rights Reserved. InstlibExamples/control $ pin –t control –skip 100 –length 500 –- hello ip: 0x40000e00 104 Start ip: 0x4000105e 598 Stop ip: 0x4000105e 598 Stop Hello world Other example switches: One region: 1. -start_address foo:10 -length 500 Multiple regions: 2. -uniform_period 1000 uniform_length 200 3. -ppfile foo.pp
9
- 9 - Copyright © 2006 Intel Corporation. All Rights Reserved. #include "instlib.H" using namespace INSTLIB; // Contains knobs and instrumentation to recognize start/stop points CONTROL control; VOID Handler(CONTROL_EVENT ev, VOID *v, CONTEXT *ct, VOID *ip, VOID *tid) { std::cout << "ip: " << ip << " " << icount.Count() ; switch(ev){ switch(ev){ case CONTROL_START: case CONTROL_START: std::cout << "Start" << endl; std::cout << "Start" << endl; break; break; case CONTROL_STOP: case CONTROL_STOP: std::cout << "Stop" << endl; std::cout << "Stop" << endl; break; break; default: default: ASSERTX(false); ASSERTX(false); break; } break; }} main() {... control.CheckKnobs(Handler, 0); control.CheckKnobs(Handler, 0);} analysis routine InstLibExamples/control.C Instrumentation (hidden)
10
- 10 - Copyright © 2006 Intel Corporation. All Rights Reserved. Recap: Instrumentation vs. Analysis Instrumentation routines define where instrumentation is inserted Instrumentation routines define where instrumentation is inserted –e.g. before instruction Occurs first time an instruction is executed Analysis routines define what to do when instrumentation is activated Analysis routines define what to do when instrumentation is activated –e.g. increment counter Occurs every time an instruction is executed
11
- 11 - Copyright © 2006 Intel Corporation. All Rights Reserved. Selective Simulation: Naive Approach: Conditional Analysis LOCALVAR INT32 enabled = 0; VOID Simulation() { if(!enabled) return; // Analysis code for detailed simulation // Analysis code for detailed simulation} VOID Handler { switch(ev){ switch(ev){ case CONTROL_START: case CONTROL_START: enabled = 1; enabled = 1; break; break; case CONTROL_STOP: case CONTROL_STOP: enabled = 0; enabled = 0; break; break;} Conditional Analysis routine Instrumentation always present !
12
- 12 - Copyright © 2006 Intel Corporation. All Rights Reserved. Changing Instrumentation on-the-fly PIN_RemoveInstrumentation() All instrumentation is removed. When application code is executed the instrumentation routines will be called to re- instrument all code PIN_RemoveInstrumentation() All instrumentation is removed. When application code is executed the instrumentation routines will be called to re- instrument all code Removes old instrumentation, forces instrumentation to be done again (after a delay) Removes old instrumentation, forces instrumentation to be done again (after a delay) PIN_ExecuteAt ( const CONTEXT * ctxt ) Starts execution at an arbitrary point given the architectural state. –CONTEXT passed in to Handler() –Currently only on IA32 and IA32E
13
- 13 - Copyright © 2006 Intel Corporation. All Rights Reserved. Selective Simulation: Faster Approach: Conditional Instrumentation LOCALVAR INT32 enabled = 0; VOID Trace(){ if(!enabled) return; // Add instrumentation for detailed simulation // Add instrumentation for detailed simulation} VOID Handler (... CONTEXT *ctxt... ) { switch(ev){ switch(ev){ case CONTROL_START: case CONTROL_START: enabled = 1; enabled = 1; PIN_RemoveInstrumentation(); PIN_RemoveInstrumentation(); if (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32E break; break; case CONTROL_STOP: case CONTROL_STOP: enabled = 0; enabled = 0; PIN_RemoveInstrumenation(); PIN_RemoveInstrumenation(); if (ctxt) PIN_ExecuteAt(ctxt); // Only on IA32/IA32E break; break;} Conditional instrumentation routine Instrumentation only in simulation regions DebugTrace/debugtrace.C
14
- 14 - Copyright © 2006 Intel Corporation. All Rights Reserved. Comparing Naïve vs. Fast Approach naïve_debugtrace vs. debugtrace Switches: -skip 100000000 -length 1000 -instruction -memory -early_out Naïve approach : Conditional Analysis Fast approach (default) : Conditional Instrumentation
15
- 15 - Copyright © 2006 Intel Corporation. All Rights Reserved. debugtrace: Conditional Analysis vs Conditional Instrumentation Fast-forwarding is 5X faster with conditional instrumentation! Fast-forwardSimulationFast-forwardSimulation
16
- 16 - Copyright © 2006 Intel Corporation. All Rights Reserved. Simulation Point Selection: Re-visited Select One Point Select One Point –At the beginning (no skip) –After 1 billion instructions –After skipping a random number of instructions Select Multiple Points Select Multiple Points –Manually by looking at performance data –Randomly anywhere –Randomly from uniform regions –By program phase analysis (SimPoint : UCSD) –Fine-grain sampling (SMARTS: CMU) Question: Are the simulation points representative?
17
- 17 - Copyright © 2006 Intel Corporation. All Rights Reserved. CPI: Average Error SPEC2000(IA32) Whole Program vs. Selected Points
18
- 18 - Copyright © 2006 Intel Corporation. All Rights Reserved. PinPoints http://rogue.colorado.edu/Pin/PinPoints/ Pin (Intel) + SimPoint (UCSD) What are PinPoints? Representative regions of programs What are PinPoints? Representative regions of programs –Automatically chosen –Validated ( represent whole-program behavior) –For trace-driven or execution-driven simulation F Found/validated PinPoints for long running (trillions of instructions) programs [IA-32, EM64T, Itanium]
19
- 19 - Copyright © 2006 Intel Corporation. All Rights Reserved. Phase Detection + PinPoint Selection PinPoint 1: Weight 30%PinPoint 2: Weight 70% Choose one simulation point per phase … 3503518 … 123504232 …… 1210224232… … Profile with isimpoint Intervals : 100 million Instructions each PinPoints file 3518 Find phases Two Phases => Two PinPoints Bb-vectors Analyze with SimPoint
20
- 20 - Copyright © 2006 Intel Corporation. All Rights Reserved. Inside a PinPoints file Region-number Slice-number Weight Start-address Count1 End-address Count2 Region-number Slice-number Weight Start-address Count1 End-address Count2 Start-of-region : When Start-address is reached Count1 times Start-of-region : When Start-address is reached Count1 times End-of-region : When End-address is reached Count2 times End-of-region : When End-address is reached Count2 times Example usage: pin –t simulator –ppfile foo.pp –- foo Fast-forwardSimulationFast-forwardSimulation
21
- 21 - Copyright © 2006 Intel Corporation. All Rights Reserved. PinPoints: Estimating Total Execution Time Total Execution Time = Total Cycles / Frequency –We know the simulated Frequency; need to know Total Cycles for *full* run of the binary on the Simulator Total Cycles Simulated = (Weighted CPI) * (Total Instructions) –PinPoints provides the Total number of instructions in the PinPoints file. Weighted CPI can be determined through simulation of PinPoints regions and weighting of results: Weighted CPI = Weight i * CPI i CAUTION: Use the formula only for statistics normalized by instructions : CPI computation OK; IPC computation is NOT OK CAUTION: Use the formula only for statistics normalized by instructions : CPI computation OK; IPC computation is NOT OK
22
- 22 - Copyright © 2006 Intel Corporation. All Rights Reserved. PinPoints : Usage Model Pin-based profiler Simulation Point Selection BB Profile PinPoints Pin-based Trace Generator Pin-based Branch Predictor Your Simulator Here CONTROL
23
A Case Study: Pin + SimpleScalar.x86
24
- 24 - Copyright © 2006 Intel Corporation. All Rights Reserved. Ad-hoc system call side-effect emulation switch (syscall_id) case SC1 : // Action for SC1 case SC2 : // Action for SC2 Ad-hoc system call side-effect emulation switch (syscall_id) case SC1 : // Action for SC1 case SC2 : // Action for SC2 Simplescalar(Alpha) emulates 80+ syscalls (enough to run SPEC2000 only) Simplescalar(Alpha) emulates 80+ syscalls (enough to run SPEC2000 only) User-level Simulation with SimpleScalar (Alpha): Old Approach Host Operating System Host Machine User Level Simulator Architecture Simulation Engine System Call Emulation Engine syscall(id, arg 1,…,arg n ) Register and memory updates Executes syscall natively
25
- 25 - Copyright © 2006 Intel Corporation. All Rights Reserved. No ad-hoc processing of system calls needed No ad-hoc processing of system calls needed Ease of porting to newer OSes (MacOS/Windows) Ease of porting to newer OSes (MacOS/Windows) Simulation of many more applications (non-SPEC) feasible Simulation of many more applications (non-SPEC) feasible pinSEL : A tool for Automatic System-call Side-effect Logging pinSEL Log of syscall side-effects // At a system call // set memory // locations as // specified in the log
26
- 26 - Copyright © 2006 Intel Corporation. All Rights Reserved. Coming Soon : pinSEL + SimpleScalar-x86 pinSEL : Pin-based “System Effects Log” generator (alternative to EIO traces) pinSELSimpleScalar- x86 SELs PinPoints CONTROL pinSEL Key Advantages Automated system-call effect analysis Easy port to MacOS and Windows
27
- 27 - Copyright © 2006 Intel Corporation. All Rights Reserved. Example : pinSEL for SimpleScalar.x86 $ pin -t pinSEL -ppfile perlbmk.makerand.pp - tracefile perlbmk.makerand -- perlbmk.exe -I lib makerand.pl START:icount:13 do_trace: 1 PinPoint #: 1 phase id: 2 weight: 25.64 slice_size: 30000000 SEL file names: perlbmk.makerand_1_0.sel perlbmk.makerand_1_0.ssi END: icount:30000786 do_trace: 0 Selective Simulation Conditional Instrumentation
28
- 28 - Copyright © 2006 Intel Corporation. All Rights Reserved. Summary Techniques for speeding up Pin-based simulation 1. Be selective : choose simulation regions 2. Instrument conditionally : Only in “regions of interest” Coming Soon [ from UCSD] : pinSEL + SimpleScalar-x86
29
- 29 - Copyright © 2006 Intel Corporation. All Rights Reserved. Resources Pin Manual: Instrumentation Library: Library for common instrumentation tasks Controller : Identify start and stop points for instrumentation PinPoints: Harish Patil, Robert Cohn, Mark Charney, Rajiv Kapoor, Andrew Sun, and Anand Karunanidhi. “Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation” MICRO-37(2004) pinSEL: Satish Narayanasamy, Cristiano Pereira, Harish Patil, Robert Cohn, and Brad Calder. “Automatic Logging of Operating System Effects to Guide Application-Level Architecture Simulation” SIGMETRICS’06
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.