Presentation is loading. Please wait.

Presentation is loading. Please wait.

Update on HP Caliper, the Performance Tool for Itanium ® HP-UX and Linux Systems September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard.

Similar presentations

Presentation on theme: "Update on HP Caliper, the Performance Tool for Itanium ® HP-UX and Linux Systems September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard."— Presentation transcript:

1 Update on HP Caliper, the Performance Tool for Itanium ® HP-UX and Linux Systems September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard

2 Previous webcasts An introduction to HP Caliper, what it is, and how to use it. Webcast: September 9, 2003 Slides: An update on HP Caliper for HP-UX and Linux Itanium. Webcast: September 21, 2004 Slides: Yet more HP Caliper: an update on the Itanium HP-UX and Linux Performance Tool Webcast September 20, 2005 Slides:

3 Agenda Quick overview of HP Caliper New features in HP Caliper 3.9, 4.0, and 4.1 Future directions Hints and tips Summary DSPP information Q & A

4 What is HP Caliper? Per-process or system-wide performance measurement tool, for any Itanium®/Itanium®2 native applications For both HP-UX and Linux Integrity servers “Swiss army knife” - Many different measurements - Common user interface and options - Multiple report formats – text, CSV, HTML - Graphical user interface (new at 4.0) Uses Performance Monitor Unit (PMU) hardware and dynamic instrumentation as needed

5 Example command lines caliper [measurement] [options] application [ app-opts ] caliper [ measurement] [options] PID1 [PID2 …] caliper [measurement] [options] -w Examples: caliper fprof --html dir_name sweep3d caliper dcache –t –p all cc himom.c caliper cpu -w -o out.txt --dur 10 caliper scgprof –p myproc caliper icache –o out.txt

6 Measurements Overview: cpu, ecount Profiles: alat, branch, dcache, dtlb, fprof, icache, itlb, cycles Traces: pmu_trace Call graph: scgprof, cgprof* Coverage: fcover* Counts: acount*, fcount* * not in Linux version Used for: What? Where? Details? (instrumented)

7 New features since HP Caliper 3.9 Improved command line usability Quick Start reference card Improved reports for multi-process applications New ‘cycles’ measurement (dual-core Itanium 2 only) Richer sets of PMU events (dual-core Itanium 2 only) System-wide measurements Graphical user interface

8 Improved command line usability scgprof now the default measurement: $ caliper myprog collect scgprof data on myprog -a no longer required for attaching to processes: $ caliper 1234 collect scgprof data on process 1234 Re-reporting of last recorded data is simple: $ caliper report [options] Reporting from an HP Caliper database simplified: $ caliper mydb.db New default: report to down to source—but not instruction—level (use -r all to get disassembly) New default: --process all ( -p all )

9 Improved command line usability (short options) More short options added. Here is the complete list: Short Form Long Form -d --database -e (for elapsed time) --duration -f --options-file -H (long form help) --help -m --metrics -o --output-file -p --process -r --report-details -s --sampling-spec -t --threads all -v --version -w--scope system,attr_mod -h or -? (short form help)no equivalent

10 Improved command line usability (short measurement names) Measurement names have been shortened: New Name Old Name alat alat_miss acount arc_count branch branch_prediction cpu cpu_metrics dcache dcache_miss dtlb dtlb_miss fcount func_count fcover func_cover icache icache_miss itlb itlb_miss ecount total_cpu

11 Improved command line usability (simplified merge and diff syntax) --join deprecated. Instead, use: $ caliper merge -o out.txt db1 [ db2...] $ caliper diff -o out.txt db1 db2 Note that you can merge per-process data in a single database: $ caliper merge -o out.txt mydb

12 Quick Start reference card

13 Quick Start reference card (back side)

14 Improved reports for multi-process applications Caliper can now report: –Across-process CPU events –Histograms of processes and associated metrics: $ caliper report -o out.txt mydb –Histograms of executables and associated metrics: $ caliper merge -o out.txt mydb Use --process-cutoff to change the number of processes or executables reported in the process or executable histogram.

15 Improved reports for multi-process applications (cont.) Example of a merged process (executable) summary: Process Summary % Total Cumulat IP % of IP Samples Total Samples Process be (1 instances) ecom (1 instances) u2comp (1 instances) ld (1 instances) sh (4 instances) [Minimum process entries: 5, percent cutoff: 2.00, cumulative percent cutoff: ]

16 New measurement: cycles On dual-core Itanium 2 systems, HP Caliper can now report average cycles per bundle: $ caliper cycles -o out.txt -r all myprog Resulting report resembles an fprof report (showing IP sample hits), but provides the following additional information at disassemby level: –Average cycles used to retire bundles. (With no stalls, bundle should be retired in one cycle.) –Instructions that were split issued (i.e., instructions not issued at the same time as the instruction that precedes them).

17 Richer PMU events sets On dual-core Itanium 2 systems, HP Caliper now reports many more PMU events (and derivations) in one run. An example from an IP Sample (fprof) report: Metrics Summed for Entire Run PLM Event Name U..K TH AC AT Count BE_L1D_FPU_BUBBLE.ALL x___ 0 T F BE_RSE_BUBBLE.ALL x___ 0 T F 3250 BE_FLUSH_BUBBLE.ALL x___ 0 T F BACK_END_BUBBLE.FE x___ 0 F F CPU_OP_CYCLES.ALL x___ 0 T F BE_EXE_BUBBLE.ALL x___ 0 F F BE_L1D_FPU_BUBBLE.L1D x___ 0 T F BE_EXE_BUBBLE.GRALL x___ 0 F F BE_EXE_BUBBLE.FRALL x___ 0 F F 8014 BE_EXE_BUBBLE.GRGR x___ 0 F F 67 CPU_CPL_CHANGES.ALL x___ 0 F F

18 Richer PMU events sets (cont.) % Unstalled execution (higher is better): = % Unstalled execution % of Cycles lost due to Front end stalls (lower is better): 6.43 = % stalls due to ICACHE, ITLB and branch execution % of Cycles lost due to Pipeline flush stalls (lower is better): 9.23 = % stalls due to branch misprediction or interruption flush % of Cycles lost due to data access stalls (lower is better): = % stalls due to DCACHE and DTLB (includes FR/FR stalls) % of Cycles lost due to RSE stalls (lower is better): 1.45 = % stalls due to RSE spilling/filling registers to/from memory % of Cycles lost due to Scoreboard stalls (lower is better): 2.22 = % stalls due to FPU and register dependency (excludes FR/FR stalls) Number of privilege level changes to/from all privileges: = CPU_CPL_CHANGES.ALL % of Cycles lost due to Front end stalls: 6.43 = 100 * (BACK_END_BUBBLE.FE / CPU_OP_CYCLES.ALL) % of Cycles lost due to Pipeline flush stalls: 9.23 = 100 * (BE_FLUSH_BUBBLE.ALL / CPU_OP_CYCLES.ALL) % of Cycles lost due to data access stalls (includes FR/FR stalls): = % register load stalls (includes FR/FR) + % stalls due to L1D % of Cycles lost due to RSE stalls: 1.45 = 100 * (BE_RSE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) % of Cycles lost due to Scoreboard stalls (excludes FR/FR stalls): 2.22 = % stalls due to FPU + % register dependency stalls % of Cycles lost due to register load stalls (includes FR/FR stalls): = % GR/load dependency stalls + % FR/load or FR/FR dependency stalls % of Cycles lost due to FR/load or FR/FR dependency stalls: 0.20 = 100 * BE_EXE_BUBBLE.FRALL / CPU_OP_CYCLES.ALL % of Cycles lost due to GR/load dependency stalls: = 100 * (BE_EXE_BUBBLE.GRALL - BE_EXE_BUBBLE.GRGR) / CPU_OP_CYCLES.ALL % of Cycles lost due to stalls in L1D cache and L1/L2 DTLB: 6.42 = 100 * (BE_L1D_FPU_BUBBLE.L1D / CPU_OP_CYCLES.ALL) % of Cycles lost due to register dependency stalls (excludes FR/FR stalls): 2.22 = (100 * BE_EXE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) - % register load stalls % of Cycles lost due to GR/GR dependency stalls: 2.14 = 100 * BE_EXE_BUBBLE.GRGR / CPU_OP_CYCLES.ALL

19 System-wide measurements Most measurements can now be made system-wide— across all processes and CPUs in both user and kernel space. Three levels of sample attribution: --scope system [,attr-mod | attr-proc | attr-none ] -w equivalent to: --scope system,attr-mod PLM: --event-defaults user|kernel|all Sample command (collect IP samples in both kernel and user space for 20 seconds): $ caliper fprof –o o.txt --ev all –w –e 20

20 System-wide measurements (cont.) Limitations on HP-UX: – You must be logged in as the root user – Caliper may not be able to locate some executables and shared libraries, resulting in many “unattributed” samples. Workaround: use --module-search-path Limitations on Linux: You cannot exclude idle time and the caliper process (though we hope to provide this feature in the future). Limitations on both HP-UX and Linux: While caliper runs in system-wide mode, no other caliper process can be run on the same system.

21 New graphical user interface An Eclipse RCP application Makes it easy to: –Perform measurement collections –Browse Caliper databases –See measurement data, with easy drill down Can be run on remote Integrity server, with display shown on your desktop X server (not recommended on wide-area network) via: $ caliper -g Can be run locally on a Windows or Linux x86-based system (local GUI client communicates with Caliper server via ssh or rexec )

22 New graphical user interface (Projects view and Collect view) Required fields and tabs in red Only applicable collection tabs enabled Saved collection setup Previously collected data Start process System wide Attach process Start data collection

23 New graphical user interface (Measurement tab of Collect view) Collection in progress Stop data collection Data cache misses selected

24 New graphical user interface (viewing data) Saved collection specification Process tree tab opened Analyze view Available data sets Application output

25 New graphical user interface (CPU event counts) Show CPU events tab Show data for entire application

26 New graphical user interface (metrics derived from CPU events) CPU events tab scrolled to show derived metrics

27 New graphical user interface (histogram viewer) Hottest process (double- click to drill down) Overview of entire histogram Maximize or minimize by double-clicking Analyze view tab Percent of application’s total misses in process be

28 New graphical user interface (drill down to functions) Previous levels visited Show ‘local’ percents (percent of total for be ) Use stacking bars DagNode::dagConstMarkPredArc(DagNode *, DagNode *, Dag*) Popups for long function names Area viewed in table highlighted in Overview

29 New graphical user interface (drill down to disassembly) Sorted by address Show: Click to show hotspots in table Source Source/disasm Disassembly

30 New graphical user interface (sorting) Sort bundles by misses

31 New graphical user interface (call graph viewer) Current function Callees visited Multiple Analyze views allowed Callers Callees

32 Future directions Expected new features at HP Caliper 4.2 (January 07): – Load module-centric reports (e.g., across process profile of – Call stack profiling (with wall-clock sampling) – Bucketing of data cache miss latencies (to help ascertain cache levels accessed) – Trap profiling – Merge/diff capability in graphical user interface – Caliper Advisor integrated with graphical user interface Features beyond HP Caliper 4.2: – Caliper Advisor cheatsheets in graphical user interface – Data-centric cache miss reports – Integration with Ktrace/Kprofile – More data visualization aides in graphical user interface – Per-CPU/per-thread CPU metrics

33 Load modules as top level (v4.2) View load modules as top level

34 Call-stack profile (v4.2) Graph hot call paths by running time, blocked time, or both

35 CPU metrics overview (v4.2) Overview of metrics collected by cpu measurement (default metrics)

36 Call-stack samples display (potential future display) Overview of running and stopped threads Call stacks at sample 754 “Playback” controls Sample cursor (drag to any point)

37 Data-centric cache miss profile display (potential future display) Double-click row (below) to view data addresses (above Double-click row (below) to view instruction addresses (above Double-click row to see function’s disassembly

38 3D histograms (potential future display) Figure from CxPerf User’s Guide

39 Hints and tips: caliper command Getting CPU event names from caliper : – Dump all events names and descriptions: $ caliper info all – List all event names (no other fields): $ caliper info all –d name – List names of all events containing string “L3”: $ caliper info L3 –d name – Or, use an ambiguous event name: $ caliper ecount –metric L3_READ myprog HP Caliper: usage error: Ambiguous event name ("L3_READ") specified for "--metrics". Matches L3_READS.ALL.ALL, L3_READS.ALL.HIT, L3_READS.ALL.MISS, L3_READS.DATA_READ.ALL, L3_READS.DATA_READ.HIT, L3_READS.DATA_READ.MISS, L3_READS.DINST_FETCH.ALL, L3_READS.DINST_FETCH.HIT, L3_READS.DINST_FETCH.MISS, L3_READS.INST_FETCH.ALL, L3_READS.INST_FETCH.HIT, L3_READS.INST_FETCH.MISS.

40 Hints and tips: caliper command (cont.) Getting report help: – Dump help file for cycles measurement: $ caliper info –r cycles – Append help to a report: $ caliper cycles --info –o out.txt myprog Providing command options using a file: $ caliper fprof –f myOptionsFile Helping Caliper find: – Source code: --source-path-map dir|map[:dir|map:…] * – Symbols and disassembly: --module-search-path dir[:dir:…] * Where map == old_path,new_path

41 Hints and tips: using views Detached view Close Minimize Maximize Local view menu Restore views Restore default locations Common view menu (right-click on tab) (not suported by Motif)

42 Summary Itanium execution performance tool Measures production applications Measures entire system Wide range of performance metrics available Explore performance data using textual or graphical reports Help available from Available on HP-UX and Linux

43 DSPP Tools & Resources for Itanium ® 2 Architecture Set You Up for Success Software –development environments, compilers, operating systems, installation/configuration tools, performance tools and more Technical documentation –white papers, tutorials, references documents and manuals, FAQ’s, known problems, sample code, etc. Partner Resources –webconferencing services –podcast production services –trade show discounts Equipment –rentals and purchase discounts Community –Itanium ® architecture forums, source code repository, document sharing and mailing lists Training and Education –online and classroom training News & Events

44 Where to go … Software Developer Resource Kit for the Intel® Itanium®2 microarchitecture: Development and Business Resources from HP & Intel for HP Integrity- based solutions: Contact points for additional information: Americas telephone Europe telephone Asia-Pac or go to for local country phone

45 Complete Survey to Win HP & Intel are giving away an HP laptop to 1(one) lucky winner!! Promotion Period ends November 19, 2006 Attend a webcast AND complete the post- event survey. Full promotion details can be found on DSPP at: lPage_IDX/1,1252,9284,00.html lPage_IDX/1,1252,9284,00.html

46 Tuesday, October 24 – New Dual-Core Processor and Server Hardware Tuesday, November 28 – Open MP Tuesday, December 19 – HP-MPI Sign up for the DSPP newsletter to get the latest webcast information sent to you directly. Webcast replays may also be found at: Did you know...that your company can use this same webconferencing tool – at a discounted price - to promote your HP Integrity solutions to your staff and customers? For members only... More Events

47 Intel® Early Access Program - Technology The Early Access Program (EAP) gives you access to Intel® technology to support your current development cycle as well as early access to tools and information on new technologies. Your membership includes: –Early access to pre-release software development platforms –Access to Intel and 3rd party software and testing tools –Training through Intel® Software College and Web events –Technical content and how–to articles –Protected remote access to easily evaluate and develop software safely and securely on platforms over the Internet

48 Intel® Early Access Program -Marketing Opportunities and Support Extensive marketing and business development opportunities: –Inclusion in online and print versions of the Intel® Developer Solutions Catalog –Intel quotes to support your PR –Case studies –Access to Intel’s event marketing asset kit –Participation in selected industry events and trade shows Support in your development efforts provided through: –Access to an Intel Account Representative who will act as your primary contact –Intel ® Premier Support for confidential technical support –24/7 online support via

49 Related Intel ® Resources Intel® Early Access Program – Intel® Software Network – Intel® Software College – Intel® Software Development Tools – Experience Intel® Itanium® 2 Architecture –

50 Q&A Session: To ask a question over the phone, press *1 on your touch-tone telephone.

51 September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard Q&A Session: To ask a question over the phone, press *1 on your touch-tone telephone.

Download ppt "Update on HP Caliper, the Performance Tool for Itanium ® HP-UX and Linux Systems September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard."

Similar presentations

Ads by Google