Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 2006 Speaker: Stephen Williams Caliper Development Team

Similar presentations

Presentation on theme: "September 2006 Speaker: Stephen Williams Caliper Development Team"— Presentation transcript:

1 Update on HP Caliper, the Performance Tool for Itanium® HP-UX and Linux Systems
September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard

2 Previous webcasts An introduction to HP Caliper, what it is, and how to use it. Webcast: September 9, 2003 Slides: An update on HP Caliper for HP-UX and Linux Itanium. Webcast: September 21, 2004 Yet more HP Caliper: an update on the Itanium HP-UX and Linux Performance Tool Webcast September 20, 2005

3 Agenda Quick overview of HP Caliper
New features in HP Caliper 3.9, 4.0, and 4.1 Future directions Hints and tips Summary DSPP information Q & A

4 What is HP Caliper? Per-process or system-wide performance measurement tool, for any Itanium®/Itanium®2 native applications For both HP-UX and Linux Integrity servers “Swiss army knife” Many different measurements Common user interface and options Multiple report formats – text, CSV, HTML Graphical user interface (new at 4.0) Uses Performance Monitor Unit (PMU) hardware and dynamic instrumentation as needed

5 Example command lines caliper [measurement] [options] application [ app-opts ] caliper [measurement] [options] PID1 [PID2 …] caliper [measurement] [options] -w Examples: caliper fprof --html dir_name sweep3d caliper dcache –t –p all cc himom.c caliper cpu -w -o out.txt --dur 10 caliper scgprof –p myproc caliper icache –o out.txt

6 Measurements Used for: What? Where? Overview: cpu, ecount
Details? (instrumented) Overview: cpu, ecount Profiles: alat, branch, dcache, dtlb, fprof, icache, itlb, cycles Traces: pmu_trace Call graph: scgprof, cgprof* Coverage: fcover* Counts: acount*, fcount* * not in Linux version

7 New features since HP Caliper 3.9
Improved command line usability Quick Start reference card Improved reports for multi-process applications New ‘cycles’ measurement (dual-core Itanium 2 only) Richer sets of PMU events (dual-core Itanium 2 only) System-wide measurements Graphical user interface

8 Improved command line usability
scgprof now the default measurement: $ caliper myprog collect scgprof data on myprog -a no longer required for attaching to processes: $ caliper 1234 collect scgprof data on process 1234 Re-reporting of last recorded data is simple: $ caliper report [options] Reporting from an HP Caliper database simplified: $ caliper mydb.db New default: report to down to source—but not instruction—level (use -r all to get disassembly) New default: --process all (-p all)

9 Improved command line usability (short options)
More short options added. Here is the complete list: Short Form Long Form -d database -e (for elapsed time) --duration -f options-file -H (long form help) --help -m metrics -o output-file -p process -r report-details -s sampling-spec -t threads all -v version -w --scope system,attr_mod -h or -? (short form help) no equivalent

10 Improved command line usability (short measurement names)
Measurement names have been shortened: New Name Old Name alat alat_miss acount arc_count branch branch_prediction cpu cpu_metrics dcache dcache_miss dtlb dtlb_miss fcount func_count fcover func_cover icache icache_miss itlb itlb_miss ecount total_cpu

11 Improved command line usability (simplified merge and diff syntax)
--join deprecated. Instead, use: $ caliper merge -o out.txt db1 [db ] $ caliper diff -o out.txt db1 db2 Note that you can merge per-process data in a single database: $ caliper merge -o out.txt mydb

12 Quick Start reference card

13 Quick Start reference card (back side)

14 Improved reports for multi-process applications
Caliper can now report: Across-process CPU events Histograms of processes and associated metrics: $ caliper report -o out.txt mydb Histograms of executables and associated metrics: $ caliper merge -o out.txt mydb Use --process-cutoff to change the number of processes or executables reported in the process or executable histogram.

15 Improved reports for multi-process applications (cont.)
Example of a merged process (executable) summary: Process Summary % Total Cumulat IP % of IP Samples Total Samples Process be (1 instances) ecom (1 instances) u2comp (1 instances) ld (1 instances) sh (4 instances) [Minimum process entries: 5, percent cutoff: 2.00, cumulative percent cutoff: ]

16 New measurement: cycles
On dual-core Itanium 2 systems, HP Caliper can now report average cycles per bundle: $ caliper cycles -o out.txt -r all myprog Resulting report resembles an fprof report (showing IP sample hits), but provides the following additional information at disassemby level: Average cycles used to retire bundles. (With no stalls, bundle should be retired in one cycle.) Instructions that were split issued (i.e., instructions not issued at the same time as the instruction that precedes them).

17 Richer PMU events sets On dual-core Itanium 2 systems, HP Caliper now reports many more PMU events (and derivations) in one run. An example from an IP Sample (fprof) report: Metrics Summed for Entire Run PLM Event Name U..K TH AC AT Count BE_L1D_FPU_BUBBLE.ALL x___ T F BE_RSE_BUBBLE.ALL x___ T F BE_FLUSH_BUBBLE.ALL x___ T F BACK_END_BUBBLE.FE x___ F F CPU_OP_CYCLES.ALL x___ T F BE_EXE_BUBBLE.ALL x___ F F BE_L1D_FPU_BUBBLE.L1D x___ T F BE_EXE_BUBBLE.GRALL x___ F F BE_EXE_BUBBLE.FRALL x___ F F BE_EXE_BUBBLE.GRGR x___ F F CPU_CPL_CHANGES.ALL x___ F F

18 Richer PMU events sets (cont.)
% Unstalled execution (higher is better): 47.44 = % Unstalled execution % of Cycles lost due to Front end stalls (lower is better): 6.43 = % stalls due to ICACHE, ITLB and branch execution % of Cycles lost due to Pipeline flush stalls (lower is better): 9.23 = % stalls due to branch misprediction or interruption flush % of Cycles lost due to data access stalls (lower is better): 33.23 = % stalls due to DCACHE and DTLB (includes FR/FR stalls) % of Cycles lost due to RSE stalls (lower is better): 1.45 = % stalls due to RSE spilling/filling registers to/from memory % of Cycles lost due to Scoreboard stalls (lower is better): 2.22 = % stalls due to FPU and register dependency (excludes FR/FR stalls) Number of privilege level changes to/from all privileges: 73385 = CPU_CPL_CHANGES.ALL % of Cycles lost due to Front end stalls: 6.43 = 100 * (BACK_END_BUBBLE.FE / CPU_OP_CYCLES.ALL) % of Cycles lost due to Pipeline flush stalls: 9.23 = 100 * (BE_FLUSH_BUBBLE.ALL / CPU_OP_CYCLES.ALL) % of Cycles lost due to data access stalls (includes FR/FR stalls): 33.23 = % register load stalls (includes FR/FR) + % stalls due to L1D % of Cycles lost due to RSE stalls: 1.45 = 100 * (BE_RSE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) % of Cycles lost due to Scoreboard stalls (excludes FR/FR stalls): 2.22 = % stalls due to FPU + % register dependency stalls % of Cycles lost due to register load stalls (includes FR/FR stalls): 26.81 = % GR/load dependency stalls + % FR/load or FR/FR dependency stalls % of Cycles lost due to FR/load or FR/FR dependency stalls: 0.20 = 100 * BE_EXE_BUBBLE.FRALL / CPU_OP_CYCLES.ALL % of Cycles lost due to GR/load dependency stalls: 26.61 = 100 * (BE_EXE_BUBBLE.GRALL - BE_EXE_BUBBLE.GRGR) / CPU_OP_CYCLES.ALL % of Cycles lost due to stalls in L1D cache and L1/L2 DTLB: 6.42 = 100 * (BE_L1D_FPU_BUBBLE.L1D / CPU_OP_CYCLES.ALL) % of Cycles lost due to register dependency stalls (excludes FR/FR stalls): 2.22 = (100 * BE_EXE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) - % register load stalls % of Cycles lost due to GR/GR dependency stalls: 2.14 = 100 * BE_EXE_BUBBLE.GRGR / CPU_OP_CYCLES.ALL

19 System-wide measurements
Most measurements can now be made system-wide— across all processes and CPUs in both user and kernel space. Three levels of sample attribution: --scope system[,attr-mod|attr-proc|attr-none] -w equivalent to: --scope system,attr-mod PLM: --event-defaults user|kernel|all Sample command (collect IP samples in both kernel and user space for 20 seconds): $ caliper fprof –o o.txt --ev all –w –e 20

20 System-wide measurements (cont.)
Limitations on HP-UX: You must be logged in as the root user Caliper may not be able to locate some executables and shared libraries, resulting in many “unattributed” samples. Workaround: use --module-search-path Limitations on Linux: You cannot exclude idle time and the caliper process (though we hope to provide this feature in the future). Limitations on both HP-UX and Linux: While caliper runs in system-wide mode, no other caliper process can be run on the same system.

21 New graphical user interface
An Eclipse RCP application Makes it easy to: Perform measurement collections Browse Caliper databases See measurement data, with easy drill down Can be run on remote Integrity server, with display shown on your desktop X server (not recommended on wide-area network) via: $ caliper -g Can be run locally on a Windows or Linux x86-based system (local GUI client communicates with Caliper server via ssh or rexec)

22 New graphical user interface (Projects view and Collect view)
Saved collection setup Start process System wide Attach process Previously collected data Start data collection Required fields and tabs in red Only applicable collection tabs enabled

23 New graphical user interface (Measurement tab of Collect view)
Data cache misses selected Stop data collection Collection in progress

24 New graphical user interface (viewing data)
Analyze view Saved collection specification Process tree tab opened Available data sets Application output

25 New graphical user interface (CPU event counts)
Show data for entire application Show CPU events tab

26 New graphical user interface (metrics derived from CPU events)
CPU events tab scrolled to show derived metrics

27 New graphical user interface (histogram viewer)
Maximize or minimize by double-clicking Analyze view tab Hottest process (double-click to drill down) Overview of entire histogram Percent of application’s total misses in process be

28 New graphical user interface (drill down to functions)
Use stacking bars Popups for long function names Show ‘local’ percents (percent of total for be) DagNode::dagConstMarkPredArc(DagNode *, DagNode *, Dag*) Area viewed in table highlighted in Overview Previous levels visited

29 New graphical user interface (drill down to disassembly)
Show: Source Source/disasm Sorted by address Disassembly Click to show hotspots in table

30 New graphical user interface (sorting)
Sort bundles by misses

31 New graphical user interface (call graph viewer)
Multiple Analyze views allowed Callees visited Current function Callers Callees

32 Future directions Expected new features at HP Caliper 4.2 (January 07): Load module-centric reports (e.g., across process profile of Call stack profiling (with wall-clock sampling) Bucketing of data cache miss latencies (to help ascertain cache levels accessed) Trap profiling Merge/diff capability in graphical user interface Caliper Advisor integrated with graphical user interface Features beyond HP Caliper 4.2: Caliper Advisor cheatsheets in graphical user interface Data-centric cache miss reports Integration with Ktrace/Kprofile More data visualization aides in graphical user interface Per-CPU/per-thread CPU metrics

33 Load modules as top level (v4.2)
View load modules as top level

34 Call-stack profile (v4.2)
Graph hot call paths by running time, blocked time, or both

35 CPU metrics overview (v4.2)
Overview of metrics collected by cpu measurement (default metrics)

36 Call-stack samples display (potential future display)
Overview of running and stopped threads Sample cursor (drag to any point) Call stacks at sample 754 “Playback” controls

37 Data-centric cache miss profile display (potential future display)
Double-click row to see function’s disassembly Double-click row (below) to view instruction addresses (above Double-click row (below) to view data addresses (above

38 3D histograms (potential future display)
Figure from CxPerf User’s Guide

39 Hints and tips: caliper command
Getting CPU event names from caliper: Dump all events names and descriptions: $ caliper info all List all event names (no other fields): $ caliper info all –d name List names of all events containing string “L3”: $ caliper info L3 –d name Or, use an ambiguous event name: $ caliper ecount –metric L3_READ myprog HP Caliper: usage error: Ambiguous event name ("L3_READ") specified for "--metrics". Matches L3_READS.ALL.ALL, L3_READS.ALL.HIT, L3_READS.ALL.MISS, L3_READS.DATA_READ.ALL, L3_READS.DATA_READ.HIT, L3_READS.DATA_READ.MISS, L3_READS.DINST_FETCH.ALL, L3_READS.DINST_FETCH.HIT, L3_READS.DINST_FETCH.MISS, L3_READS.INST_FETCH.ALL, L3_READS.INST_FETCH.HIT, L3_READS.INST_FETCH.MISS.

40 Hints and tips: caliper command (cont.)
Getting report help: Dump help file for cycles measurement: $ caliper info –r cycles Append help to a report: $ caliper cycles --info –o out.txt myprog Providing command options using a file: $ caliper fprof –f myOptionsFile Helping Caliper find: Source code: --source-path-map dir|map[:dir|map:…]* Symbols and disassembly: --module-search-path dir[:dir:…] * Where map == old_path,new_path

41 Hints and tips: using views
Close Restore views Minimize Restore default locations Maximize Local view menu Common view menu (right-click on tab) Detached view (not suported by Motif)

42 Summary Itanium execution performance tool
Measures production applications Measures entire system Wide range of performance metrics available Explore performance data using textual or graphical reports Help available from Available on HP-UX and Linux

43 DSPP Tools & Resources for Itanium®2 Architecture Set You Up for Success
Community Itanium® architecture forums, source code repository, document sharing and mailing lists Training and Education online and classroom training News & Events Software development environments, compilers, operating systems, installation/configuration tools, performance tools and more Technical documentation white papers, tutorials, references documents and manuals, FAQ’s, known problems, sample code, etc. Partner Resources webconferencing services podcast production services trade show discounts Equipment rentals and purchase discounts

44 Where to go … Software Developer Resource Kit for the Intel® Itanium®2 microarchitecture: Development and Business Resources from HP & Intel for HP Integrity-based solutions: Contact points for additional information: Americas telephone Europe telephone Asia-Pac or go to for local country phone numbers

45 Complete Survey to Win HP & Intel are giving away an HP laptop to 1(one) lucky winner!! Promotion Period ends November 19, 2006 Attend a webcast AND complete the post-event survey. Full promotion details can be found on DSPP at:

46 Webcast replays may also be found at:
More Events Tuesday, October 24 – New Dual-Core Processor and Server Hardware Tuesday, November 28 – Open MP Tuesday, December 19 – HP-MPI Sign up for the DSPP newsletter to get the latest webcast information sent to you directly. Webcast replays may also be found at: Did you know...that your company can use this same webconferencing tool – at a discounted price - to promote your HP Integrity solutions to your staff and customers? For members only...

47 Intel® Early Access Program - Technology
The Early Access Program (EAP) gives you access to Intel® technology to support your current development cycle as well as early access to tools and information on new technologies. Your membership includes: Early access to pre-release software development platforms Access to Intel and 3rd party software and testing tools Training through Intel® Software College and Web events Technical content and how–to articles Protected remote access to easily evaluate and develop software safely and securely on platforms over the Internet

48 Intel® Early Access Program -Marketing Opportunities and Support
Extensive marketing and business development opportunities: Inclusion in online and print versions of the Intel® Developer Solutions Catalog Intel quotes to support your PR Case studies Access to Intel’s event marketing asset kit Participation in selected industry events and trade shows Support in your development efforts provided through: Access to an Intel Account Representative who will act as your primary contact Intel® Premier Support for confidential technical support 24/7 online support via

49 Related Intel® Resources
Intel® Early Access Program Intel® Software Network Intel® Software College Intel® Software Development Tools Experience Intel® Itanium® 2 Architecture

50 Q&A Session: To ask a question over the phone, press *1 on your touch-tone telephone.

51 Q&A Session: To ask a question over the phone, press *1 on your touch-tone telephone. September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard

Download ppt "September 2006 Speaker: Stephen Williams Caliper Development Team"

Similar presentations

Ads by Google