Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.

Similar presentations


Presentation on theme: "1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at."— Presentation transcript:

1 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at Boulder

2 2 Finding Causes of Performance Phenomena Application Operating System Hardware C Program Application Framework Java Library Virtual Machine Native Library Operating System Hardware Java /.net Program

3 3 Warehouse Transactions Methodology Benchmark:SPECjbb2000 Virtual machine:JikesRVM Initialization 1 thread 120,000 transactions 50 transactions per time slice time

4 4 Expected Performance of Warehouse Thread Inst / Cyc 9,792 million39,816 millionCycles

5 5 Observed Performance of Warehouse Thread Inst / Cyc 0 0.622 9,792 million39,816 millionCycles

6 6 Investigation: Why this Difference? Correlate IPC with more than 100 other hardware performance metrics –No significant overall correlation

7 7 Investigation: Correlate with GC Activity Inst / Cyc 0 0.622 9,792 million39,816 millionCycles

8 8 Phenomenon  Pre-GC Dip Inst / Cyc 0 0.622 9,792 million39,816 millionCycles

9 9 Phenomenon  Pre-GC Dip Correlate with OS-Level Metric Inst / Cyc EEOff / Cyc 0 0.622 0 0.219 +300% -6% 9,792 million39,816 millionCycles

10 10 Phenomenon  Pre-GC Dip Next Steps We have not found the root cause yet… Need metrics from different levels: –Allocation –Synchronization –System calls –Interrupts

11 11 Observed Performance Inst / Cyc 0 0.622 9,792 million39,816 millionCycles

12 12 Phenomenon  Continuous increase Inst / Cyc 0 0.622 9,792 million39,816 millionCycles

13 13 Phenomenon  Continuous increase Correlate with HW-Level Metric Inst / Cyc 0 0.622 LsuFlush / Cyc 0.037 0 9,792 million39,816 millionCycles

14 14 Phenomenon  Continuous increase Correlate with VM-Level MetricNon- Opt AOSOpt StartEnd IPC0.34790.40910.48900.5082 LsuFlush/Cyc0.05330.02500.00170.0007

15 15 Phenomenon  Continuous increase Next Steps We have not verified the root cause yet… Need metrics from different levels: –Recompilation activity –Time spent executing non-optimized vs. optimized code

16 16 Gather data about multiple levels  Application  Framework  Java Library  Virtual Machine  Native Library  Operating System  Hardware  Pre-GC Dip  Continuous increase Vertical Profiling

17 17 Vertical Event Trace

18 18 Challenges & Possible Approaches Huge difference in event frequencies –E.g. 7 GCs, but 20 billion instructions completed –Idea: Count high-frequency events, trace low-frequency events Large number of possible metrics –Trace everything: impossible to anticipate, too expensive –Write many specialized profilers: error prone, large effort –Idea: Generate profilers from specification Overhead –E.g. tracing every memory access is very expensive –Idea: Provide tunable profiling parameters for least overhead Perturbation –E.g. instrumenting every memory access perturbs HPMs –Idea: Use separate runs for interfering metrics Separate Traces –E.g. handling non-determinism –Idea: Combine traces using intervals to summarize

19 19 Architecture Specification (what) Parameters (how) TracerTrace ReaderTrace Analyzer Generator Event Stream VisualizerInstrumentations Event creations, Counter updates Event Stream Interval Stream Aggregated Profiles Instrumenters

20 20 Intervals Events Vertical Profiling Specification: What to Profile specification IPC_And_BytesAllocated { hardware counter longCyc; hardware counter longInst; software counter longBytesAllocated; event ThreadSwitch { intfromThread; inttoThread; longcyc = Cyc; longinst = Inst; longbytesAllocated = BytesAllocated; } interval TimeSlice { starts with ThreadSwitch; ends with ThreadSwitch where end.fromThread == start.toThread; doubleipc = (end.inst-start.inst) / (end.cyc-start.cyc); longbytesAllocated = end.bytesAllocated – start.bytesAllocated; } Event Attributes Interval Metrics Counters

21 21 Status Profiling –Hardware Performance Monitors [VM’04] –Software Performance Monitors –Specification-driven (early prototype) Visualization & Analysis –IBM Performance Explorer

22 22 Future Work Evaluate utility –Find root causes of phenomena Evaluate perturbation –Intra-level perturbation (e.g. HPM → HPM) –Inter-level perturbation (e.g. lock tracing → HPM) Semi-automate investigative process –Statistics / Machine learning

23 23 Related Work Trace Analyzer –[Perl 92] Performance Assertion Checking –[Perl et al. 98] Continuous Monitoring Software Performance Counters –[Microsoft] Windows Management Instrumentation HPM and JikesRVM –[Sweeney et al. 04] Using Hardware Performance Monitors to Understand the Behavior of Java Applications

24 24 Questions?

25 25 EXTRAS

26 26 Profiling HPMs: Infrastructure Power4 Performance Monitors AIX 5.x pmsvc Kernel Extension AIX 5.x pmapi Library JikesRVM 2.3.0.1+ HPM Facility OS Hardware C Library VM

27 27 Profiling HPMs: Samples A sample represents a time slice –Start and end time (in time-base or “decrementer” ticks) –8 event counts –Processor id –Java thread id –Preempted or yielding –Java method ending the sample VP (CPU) 1: VP (CPU) 2: 10 ms

28 28 Profiling HPMs: Benchmark SPEC JBB Modified to execute a given number of transactions (120,000) Startup phase (ca. 8 sec) –1 main thread Steady-state phase (ca. 24 sec) –N warehouse threads Configurations –{1,2,3,4} warehouses on {1,2,3,4} processors Steady-state behavior –Ca. 50 transactions per 10 ms time slice

29 29 Performance Explorer Visualizer for JikesRVM hardware performance counter traces Built-in information about all Power4 performance events Support for creating computed metrics (e.g. Inst/Cyc, given Cyc and Instr counter values) Multiple visualizations, like time chart and scatter plot (for correlation of metrics)

30 30 Performance Explorer: Power4 Event Information

31 31 Performance Explorer: Creation of Computed Metrics

32 32 Performance Explorer: Overview of Java Threads

33 33 Performance Explorer: Time Chart

34 34 Performance Explorer: Scatter Plot

35 35 Phenomenon  Pre-GC Dip in IPC Other Correlated Metrics MetricNormalDipIncrease IPC0.49240.46095-6.4% EeOff/Cyc0.019650.0785+300% HvCyc/Cyc0.023870.12489+423% GrpDispBlkSbCyc/Cyc0.005950.02577+333% LsuSrqSyncCyc/Cyc0.006120.017+178% StcxFail/StcxPassFail0.000860.00395+362% LsuLrqFullCyc/Cyc0.000770.00271+250%

36 36 Vertical Profiling Matrix Instrument: Observe: HardwareMachine code Byte codeSource code Hardware  OS  Native libs  VM  Java libs  Framework  Application 

37 37 Vertical Profiling Matrix Two “vertical” dimensions –What we observe –What we instrument We may observe higher level behavior by instrumenting a lower level, or vice versa –Instrument HW, observe OS time –Instrument byte code, observe branch misses

38 38 Vertical profiling specification: How to profile ParameterPossible Values Buffer size100000, 1000000, 10000000, … Buffer typeJava byte[], Java int[], native Buffer ownershipGlobal, Processor, Thread Buffer access synchronizationNone, Lock-free, Locked Buffer accessJava, Magic Buffer overflow handlingFlush, Disable, Ignore Buffer flushingExplicit, Seg fault, Each thread switch Buffer flush targetFile, Socket, C routine


Download ppt "1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at."

Similar presentations


Ads by Google