Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright 2004 David J. Lilja1 Measurement tools and techniques Fundamental strategies Interval timers Program profiling Tracing Indirect measurement.

Similar presentations


Presentation on theme: "Copyright 2004 David J. Lilja1 Measurement tools and techniques Fundamental strategies Interval timers Program profiling Tracing Indirect measurement."— Presentation transcript:

1 Copyright 2004 David J. Lilja1 Measurement tools and techniques Fundamental strategies Interval timers Program profiling Tracing Indirect measurement

2 Copyright 2004 David J. Lilja2 Events Most measurement tools based on events Some predefined change to system state Definition depends on metric being measured Memory reference Disk access Change in a register’s state Network message Processor interrupt

3 Copyright 2004 David J. Lilja3 Event Classification Count metrics The number of times event X occurs Number of cache misses Number of I/O operations

4 Copyright 2004 David J. Lilja4 Event Classification Secondary-event metrics Record a value when triggered by some event Record block size for each I/O operation Count number of operations Find average I/O transfer size

5 Copyright 2004 David J. Lilja5 Event Classification Profiles Characterization of overall behavior Aggregate/big picture view of an application program Time spent in each function

6 Copyright 2004 David J. Lilja6 Event-Driven Strategies Record necessary information only when selected event occurs Modify system to record event Dump data when program terminates May need intermediate dumps also E.g. simple counter in page fault routine

7 Copyright 2004 David J. Lilja7 Event-Driven Strategies System overhead Only when the event of interest actually occurs Infrequent events → little perturbation Frequent events → high perturbation No longer “typical” behavior? Perturbation changes system being measured

8 Copyright 2004 David J. Lilja8 Event-Driven Strategies Inter-event time is unpredictable Depends on when events actually occur Makes it hard to estimate perturbation How long to measure? Event-driven measurement tools → Good for low-frequency events

9 Copyright 2004 David J. Lilja9 Event-Driven Strategies Counts 8 events exactly +1

10 Copyright 2004 David J. Lilja10 Tracing Similar to event-driven But record additional system state Event has occurred – count Additional information to uniquely identify event E.g. addresses that cause page faults Overhead Additional memory or disk storage Time to save state Relatively large system perturbation

11 Copyright 2004 David J. Lilja11 Tracing Counts 8 events plus extra data +1; Addr +1; Addr +1; Addr +1; Addr +1; Addr +1; Addr +1; Addr +1; Addr

12 Copyright 2004 David J. Lilja12 Sampling Record necessary state at fixed time intervals Overhead Independent of specific event frequency Depends on sampling frequency Misses some events Produces statistical summary May miss infrequent events Each replication will produce different results

13 Copyright 2004 David J. Lilja13 Sampling Counts 3 events out of 5 samples +1

14 Copyright 2004 David J. Lilja14 Comparisons Event count TracingSampling Resolution Exact count Detailed info Statistical summary OverheadLowHighConstant Perturbation~ #eventsHighFixed

15 Copyright 2004 David J. Lilja15 Comparison Event counting Best for low frequency events Required if exact counts needed Sampling Best for high frequency events If statistical summary is adequate Tracing When additional detail is required

16 Copyright 2004 David J. Lilja16 Indirect Measurements Used when desired metric is not directly accessible Measure one thing directly Derive or deduce desired metric Highly dependent on creativity of performance analyst

17 Copyright 2004 David J. Lilja17 Interval Timers Fundamental tool of performance measurement Measure execution time of any portion of a program Provide time basis for sampling

18 Copyright 2004 David J. Lilja18 Interval Timers Actually count clock pulses between two events Event 1 Event 2 x 1 = Counter x 2 = Counter TcTc T e =(x 2 – x 1 )T c

19 Copyright 2004 David J. Lilja19 Using an Interval Timer Within an application program Start_count = read_timer(); Portion of program to be measured Stop_count = read_timer(); Elapsed_time = (stop_count – start_count) * clock_period;

20 Copyright 2004 David J. Lilja20 Hardware Timer Clock n-bit counter To CPU input port TcTc T e =(x 2 – x 1 )T c

21 Copyright 2004 David J. Lilja21 Software Timer T e =(x 2 – x 1 )T c Clock T’ c Prescalar (divide-by-n) Software counter CPU interrupt input TcTc

22 Copyright 2004 David J. Lilja22 Quantization Errors

23 Copyright 2004 David J. Lilja23 Quantization Error Timer resolution → quantization error Repeated measurements nT c < T e < (n+1)T c T e rounded to ± one clock tick Completely unpredictable rounding → Want T c to be as small as possible

24 Copyright 2004 David J. Lilja24 Timer Rollover n-bit counter count = [0, 2 n -1] Rollover = transition from (2 n – 1) → 0 If rollover occurs between start/stop events Then count = (x 2 – x 1 ) < 0 Check for count < 0 Measure again Add 2 n to count

25 Copyright 2004 David J. Lilja25 Timer Rollover Counter width, n Resolution (T c ) 163264 10 ns655 us43 s58.5 cent 1 us65.5 ms1.2 h5,580 cent 100 us6.55 s5 days585,000 cent 1 ms1.1 min50 days5,850,000 cent

26 Copyright 2004 David J. Lilja26 Timer Overhead Start_count = read_timer(); Portion of program to be measured Stop_count = read_timer(); Elapsed_time = (stop_count – start_count) * clock_period; To access timer Min of 1 memory read → subroutine call Min of 1 memory write → subroutine call Once at start, again at stop

27 Copyright 2004 David J. Lilja27 Timer Overhead T1T1 T2T2 T3T3 T4T4 Event begins; Initiate read_timer() Current time actually read Event ends; Initiate read_time() Current time actually read Event being measured begins

28 Copyright 2004 David J. Lilja28 Timer Overhead T1T1 T2T2 T3T3 T4T4 T 1 = time to read counter value T 2 = time to store counter value T 3 = time of the event we are measuring T 4 = time to read counter value T 4 = T 1

29 Copyright 2004 David J. Lilja29 Timer Overhead T1T1 T2T2 T3T3 T4T4 T e = event time = T 3 But actually measured T m = T 2 + T 3 + T 4 T e = T m – (T 2 + T 4 ) = T m – (T 1 + T 2 ) Timer overhead = T ovhd = (T 1 + T 2 )

30 Copyright 2004 David J. Lilja30 Timer Overhead If T e >> T ovhd Ignore the timer overhead If T e ≈ T ovhd Measurements will be highly suspect Potentially large variations in T ovhd Good rule of thumb T e should be 100-1000x > T ovhd

31 Copyright 2004 David J. Lilja31 Approximate Measures of Short Intervals How to measure an event that is shorter than the resolution of the clock? Cannot directly measure events with T e < T c Overhead makes it hard to measure even when T e > nT c, n is small integer

32 Copyright 2004 David J. Lilja32 Approximate Measures of Short Intervals TcTc TeTe TeTe Case 1: Count+1 Case 2: Count+0

33 Copyright 2004 David J. Lilja33 Approximate Measures of Short Intervals Bernoulli experiment Outcome = +1 with probability p Outcome = +0 with probability (1-p) Equivalent to flipping a biased coin Repeat n times Approximates a binomial distribution Only approximate since each measurement cannot be guaranteed to be independent Usually close enough in practice

34 Copyright 2004 David J. Lilja34 Approximate Measures of Short Intervals m = number of times Case 1 occurs Count+1 n = total number of measurements Average duration is ratio of m/n Use confidence interval for proportions

35 Copyright 2004 David J. Lilja35 Example Clock resolution = 10 us n = 8764 measurements m = 467 clock ticks counted 95% confidence interval 10 us ?? Case 1: 467 Case 2: 8297

36 Copyright 2004 David J. Lilja36 Example Scale by clock period = 10 us 95% chance that measured event is (0.49, 0.58) us

37 Copyright 2004 David J. Lilja37 Profiling Overall view of program’s execution-time behavior Fraction of total time spent in specific states Fraction of time in each subroutine Fraction of time in OS kernel Fraction of time doing I/O Find bottlenecks, code hot-spots Optimize those sections first

38 Copyright 2004 David J. Lilja38 Statistical Sampling Select a random subset of a population Gather information on only this subset Extrapolate this information to overall population Results are a statistical summary with corresponding error probabilities

39 Copyright 2004 David J. Lilja39 PC Sampling Periodically interrupt program at fixed intervals Record appropriate state information in interrupt service routine Post-process to obtain overall profile +1

40 Copyright 2004 David J. Lilja40 PC Sampling At each interrupt Examine PC on return address stack Use address map to translate this PC to subroutine i Increment array element H[i] Addr map 0-1298: Subr 1 1299-3455: Subr 2 3456-5567: Subr 3 5568-9943: Subr 4 PC: 4582 Histogram counters: H[3]=H[3]+1

41 Copyright 2004 David J. Lilja41 PC Sampling

42 Copyright 2004 David J. Lilja42 PC Sampling n total interrupts Post-processing step H[i]/n = fraction of time executing in subroutine i (H[i]/n) * (interrupt period) = time in each subroutine

43 Copyright 2004 David J. Lilja43 PC Sampling This is a statistical process Different counts each time the experiment is performed Infer behavior of entire program from small sample Apply confidence intervals to quantify precision of results

44 Copyright 2004 David J. Lilja44 Example 40 us interrupt 36,128 interrupts in subroutine A Program runs for 10 seconds Time in this subroutine? 90% confidence interval m = 36,128 n = 10 sec / 40 us = 250,000 p = m/n = 0.144

45 Copyright 2004 David J. Lilja45 Example 90% chance that the program spent 14.4-14.6% of its time in subroutine A

46 Copyright 2004 David J. Lilja46 Example 10 ms interrupt 12 interrupts in subroutine A n = 800 samples 8 seconds total execution time Time in this subroutine? 99% confidence interval p = m/n = 0.015

47 Copyright 2004 David J. Lilja47 Example 99% chance that the program spent 31-210 ms in subroutine A A pretty wide range! But only <3% of total execution time Start optimizing somewhere else first

48 Copyright 2004 David J. Lilja48 Reducing the Interval Size Use a lower confidence level Obtain more samples Run program longer May not be possible Increase sample rate May be fixed by system Will increase overhead and perturbation Run multiple times and add samples from each run

49 Copyright 2004 David J. Lilja49 PC Sampling Interrupts must occur asynchronously w.r.t. any program events Samples must be independent of each other Else over/under-sample events synchronous with interrupt Periodic versus random sampling +1

50 Copyright 2004 David J. Lilja50 Basic Block Counting Basic block Sequence of instructions with no branches into or out of the block When first instruction is executed, guaranteed that all instructions in block will be executed Single entry, single exit

51 Copyright 2004 David J. Lilja51 Basic Block Counting Generate a program profile by inserting additional instructions in each block Increment a unique counter each time a block is entered Produces a histogram of program execution Can post-process to find instruction execution frequencies

52 Copyright 2004 David J. Lilja52 Comparison PC sampling Basic block counting Output Statistical estimate Exact count Overhead Interrupt service routine Extra instructions per block Perturbation Randomly distributed High Repeatability Within statistical variance Perfect

53 Copyright 2004 David J. Lilja53 Event Tracing Profile shows overall frequency-of-execution behavior Ignores time-ordering of events Program trace Dynamic list of events generated by program Events = anything you want to instrument Sequence of memory addresses I/O blocks accessed Typically used to drive a simulator

54 Copyright 2004 David J. Lilja54 Trace Generation Application program Compress Uncompress Trace consumer Modify to generate trace

55 Copyright 2004 David J. Lilja55 Trace Generation Application program Compress Uncompress Trace consumer Online trace consumption Modify to generate trace

56 Copyright 2004 David J. Lilja56 Trace Generation Source-code modification Allows precise control of what events are traced and what data is recorded Typically a manual process Source code Object code Proc Trace Compiler

57 Copyright 2004 David J. Lilja57 Trace Generation Software exceptions HW forces an exception before each instruction Exception routine decodes instruction Store instr type, PC, operand addresses, etc. “Trace” bit in many processors Tremendous slowdown Source code Object code Proc Trace Compiler

58 Copyright 2004 David J. Lilja58 Trace Generation Emulation Make a system appear to be something else Modify emulator to generate trace E.g. Java Virtual Machine Source code Object code Proc Trace Compiler

59 Copyright 2004 David J. Lilja59 Microcode modification Modify instruction execution directly Allows tracing of all instructions Including operating system Depends on access to lower levels of the processor E.g. Transmeta Crusoe processor Trace Generation Source code Object code Proc Trace Compiler

60 Copyright 2004 David J. Lilja60 Trace Generation Compiler modification Insert trace code directly in object file Requires access to the compiler itself Source code Object code Proc Trace Compiler

61 Copyright 2004 David J. Lilja61 Trace Generation Compiler modification Insert trace code directly in object file Requires access to the compiler itself Write post-compilation binary editor/rewrite tool Source code Object code Proc Trace Compiler

62 Copyright 2004 David J. Lilja62 Trace Data Tracing generates a tremendous volume of data Trace 100,000,000 instrs/sec 16 bits of data per event 190 Mbytes of data per second 11 Gbytes per minute Huge perturbations Due to tracing code Time to store trace data

63 Copyright 2004 David J. Lilja63 Trace Data Compression Standard compression algorithms as trace is written to disk Uncompress when reading Typical reduction 20-70% Tradeoff is compress- uncompress time Application program Compress Uncompress Trace consumer Modify to generate trace

64 Copyright 2004 David J. Lilja64 Online Trace Consumption Use trace data as it is generated Never stored on disk Multitasking may lead to non-deterministic behavior Repeatability issue Before-and-after comparison tests Difference due to change in system or change in trace? Becomes statistical comparison with n runs Application program Trace consumer Online trace consumption Modify to generate trace

65 Copyright 2004 David J. Lilja65 Abstract Execution Use higher-level information to intelligently compress trace info Two-step process Compiler-style analysis to find critical subset of trace Store only control flow information sufficient to reconstruct trace later Produce trace-regeneration code for subsequent use of trace

66 Copyright 2004 David J. Lilja66 Abstract Execution Trace will be either 1-2-4 1-3-4 Store only 2 or 3 Combine with compiler- generated control flow graph to regenerate trace Slowdown = 2-10x Compress = 10-100x 1. if (i > 5) 2.then a = a + i; 3.else b = b + i; 4. i = i + 1; 1. if (i>5) 2. a=a+i3. b=b+i 4. i=i+1

67 Copyright 2004 David J. Lilja67 Trace Sampling Save only subsequences of overall trace Drive simulator with samples Results should be statistically similar to driving with complete trace One sample = k consecutive events Sampling interval = P (period) kk P

68 Copyright 2004 David J. Lilja68 SimPoint Find “representative” program samples Match basic block execution frequencies Clustering tool to automate process Perform detailed timing simulation on only these samples Fast-forward (functional simulation) between samples [Sherwood et al, ASPLOS, 2002]

69 Copyright 2004 David J. Lilja69 SimPoint Weight each sample’s result by execution frequency to produced overall result Relatively small number (10s) of SimPoints produced ≈3% error in IPC on SPEC

70 Copyright 2004 David J. Lilja70 SMARTS Uses systematic sampling Fixed sample interval Apply statistical sampling techniques to determine j, k, P k P k Functional simulation Detailed simulation jj

71 Copyright 2004 David J. Lilja71 Indirect Ad Hoc Techniques Sometimes the desired metric cannot be measured directly Use your creativity to measure one thing and then derive/infer the desired value

72 Copyright 2004 David J. Lilja72 Example – System Load What is system load? Number of jobs in run queue? Number of jobs actively time-sharing? Fraction of time processor is not in idle loop? Others? How to measure it? Modify OS PC sampling Indirect?

73 Copyright 2004 David J. Lilja73 Example Let system run for fixed time T Note value of counter Monitor Count n T

74 Copyright 2004 David J. Lilja74 Example Let system run for fixed time T Compare value of loaded system monitor counter to unloaded system count value Monitor App 1 Count n n/2 T

75 Copyright 2004 David J. Lilja75 Example Let system run for fixed time T Compare value of loaded system monitor counter to unloaded system count value Monitor App 1 App 2 Monitor Count n n/2 n/3 T

76 Copyright 2004 David J. Lilja76 Perturbation To obtain more information (higher resolution) → Use more instrumentation points More instrumentation points → Greater perturbation

77 Copyright 2004 David J. Lilja77 Perturbation Computer performance measurement uncertainty principle Accuracy is inversely proportional to resolution. Resolution Accuracy Low High

78 Copyright 2004 David J. Lilja78 Perturbation Superposition does not work here Non-linear Non-additive Double instrumentation ≠ double impact on performance Some instrumentation cancels out Some multiplies impact No way to predict!

79 Copyright 2004 David J. Lilja79 Instrumentation Code Changes memory access patterns Affects memory banking optimizations Generates additional load/store instructions More frequent cache flushes and replacements But may reduce set associativity conflicts Generates more I/O operations Will increase overall execution time More time-sharing context switches Alters virtual memory paging behavior

80 Copyright 2004 David J. Lilja80 Important Points Event types Simple counts of primary event Secondary events triggered by some primary event Overall profiles

81 Copyright 2004 David J. Lilja81 Important Points Measurement strategies Event-driven Tracing Sampling Indirect approaches

82 Copyright 2004 David J. Lilja82 Important Points Interval timers Stopwatch functionality Rollover problem Overhead Quantization errors Statistical measures of short intervals

83 Copyright 2004 David J. Lilja83 Important Points Profiling PC sampling Statistical view Basic block counting Exact behavior High overhead and perturbation

84 Copyright 2004 David J. Lilja84 Important Points Trace generation Source code modification Force exceptions Emulation Microcode modification Compiler modification Object code editor Online trace consumption Trace sampling

85 Copyright 2004 David J. Lilja85 Important Points Indirect measurements when all else fails System load example Perturbations Nobody likes them Have to learn to live with them


Download ppt "Copyright 2004 David J. Lilja1 Measurement tools and techniques Fundamental strategies Interval timers Program profiling Tracing Indirect measurement."

Similar presentations


Ads by Google