Presentation on theme: "September 2006 Speaker: Stephen Williams Caliper Development Team"— Presentation transcript:
1Update on HP Caliper, the Performance Tool for Itanium® HP-UX and Linux Systems September 2006Speaker: Stephen WilliamsCaliper Development TeamHewlett-Packard
2Previous webcastsAn introduction to HP Caliper, what it is, and how to use it.Webcast: September 9, 2003Slides:An update on HP Caliper for HP-UX and Linux Itanium. Webcast: September 21, 2004Yet more HP Caliper: an update on the Itanium HP-UX and Linux Performance ToolWebcast September 20, 2005
3Agenda Quick overview of HP Caliper New features in HP Caliper 3.9, 4.0, and 4.1Future directionsHints and tipsSummaryDSPP informationQ & A
4What is HP Caliper?Per-process or system-wide performance measurement tool, for any Itanium®/Itanium®2 native applicationsFor both HP-UX and Linux Integrity servers“Swiss army knife”Many different measurementsCommon user interface and optionsMultiple report formats – text, CSV, HTMLGraphical user interface (new at 4.0)Uses Performance Monitor Unit (PMU) hardware and dynamic instrumentation as needed
6Measurements Used for: What? Where? Overview: cpu, ecount Details?(instrumented)Overview: cpu, ecountProfiles: alat, branch,dcache, dtlb, fprof,icache, itlb, cyclesTraces: pmu_traceCall graph: scgprof, cgprof*Coverage: fcover*Counts: acount*, fcount** not in Linux version
7New features since HP Caliper 3.9 Improved command line usabilityQuick Start reference cardImproved reports for multi-process applicationsNew ‘cycles’ measurement (dual-core Itanium 2 only)Richer sets of PMU events (dual-core Itanium 2 only)System-wide measurementsGraphical user interface
8Improved command line usability scgprof now the default measurement:$ caliper myprog collect scgprof data on myprog-a no longer required for attaching to processes:$ caliper 1234 collect scgprof data on process 1234Re-reporting of last recorded data is simple:$ caliper report [options]Reporting from an HP Caliper database simplified:$ caliper mydb.dbNew default: report to down to source—but not instruction—level (use -r all to get disassembly)New default: --process all (-p all)
9Improved command line usability (short options) More short options added. Here is the complete list:Short Form Long Form-d database-e (for elapsed time) --duration-f options-file-H (long form help) --help-m metrics-o output-file-p process-r report-details-s sampling-spec-t threads all-v version-w --scope system,attr_mod-h or -? (short form help) no equivalent
10Improved command line usability (short measurement names) Measurement names have been shortened:New Name Old Namealat alat_missacount arc_countbranch branch_predictioncpu cpu_metricsdcache dcache_missdtlb dtlb_missfcount func_countfcover func_covericache icache_missitlb itlb_missecount total_cpu
11Improved command line usability (simplified merge and diff syntax) --join deprecated. Instead, use:$ caliper merge -o out.txt db1 [db ]$ caliper diff -o out.txt db1 db2Note that you can merge per-process data in a single database:$ caliper merge -o out.txt mydb
14Improved reports for multi-process applications Caliper can now report:Across-process CPU eventsHistograms of processes and associated metrics:$ caliper report -o out.txt mydbHistograms of executables and associated metrics:$ caliper merge -o out.txt mydbUse --process-cutoff to change the number of processes or executables reported in the process or executable histogram.
15Improved reports for multi-process applications (cont.) Example of a merged process (executable) summary:Process Summary% Total CumulatIP % of IPSamples Total Samples Processbe (1 instances)ecom (1 instances)u2comp (1 instances)ld (1 instances)sh (4 instances)[Minimum process entries: 5, percent cutoff: 2.00, cumulative percent cutoff: ]
16New measurement: cycles On dual-core Itanium 2 systems, HP Caliper can now report average cycles per bundle:$ caliper cycles -o out.txt -r all myprogResulting report resembles an fprof report (showing IP sample hits), but provides the following additional information at disassemby level:Average cycles used to retire bundles. (With no stalls, bundle should be retired in one cycle.)Instructions that were split issued (i.e., instructions not issued at the same time as the instruction that precedes them).
17Richer PMU events setsOn dual-core Itanium 2 systems, HP Caliper now reports many more PMU events (and derivations) in one run. An example from an IP Sample (fprof) report:Metrics Summed for Entire RunPLMEvent Name U..K TH AC AT CountBE_L1D_FPU_BUBBLE.ALL x___ T FBE_RSE_BUBBLE.ALL x___ T FBE_FLUSH_BUBBLE.ALL x___ T FBACK_END_BUBBLE.FE x___ F FCPU_OP_CYCLES.ALL x___ T FBE_EXE_BUBBLE.ALL x___ F FBE_L1D_FPU_BUBBLE.L1D x___ T FBE_EXE_BUBBLE.GRALL x___ F FBE_EXE_BUBBLE.FRALL x___ F FBE_EXE_BUBBLE.GRGR x___ F FCPU_CPL_CHANGES.ALL x___ F F
18Richer PMU events sets (cont.) % Unstalled execution (higher is better):47.44 = % Unstalled execution% of Cycles lost due to Front end stalls (lower is better):6.43 = % stalls due to ICACHE, ITLB and branch execution% of Cycles lost due to Pipeline flush stalls (lower is better):9.23 = % stalls due to branch misprediction or interruption flush% of Cycles lost due to data access stalls (lower is better):33.23 = % stalls due to DCACHE and DTLB (includes FR/FR stalls)% of Cycles lost due to RSE stalls (lower is better):1.45 = % stalls due to RSE spilling/filling registers to/from memory% of Cycles lost due to Scoreboard stalls (lower is better):2.22 = % stalls due to FPU and register dependency (excludes FR/FR stalls)Number of privilege level changes to/from all privileges:73385 = CPU_CPL_CHANGES.ALL% of Cycles lost due to Front end stalls:6.43 = 100 * (BACK_END_BUBBLE.FE / CPU_OP_CYCLES.ALL)% of Cycles lost due to Pipeline flush stalls:9.23 = 100 * (BE_FLUSH_BUBBLE.ALL / CPU_OP_CYCLES.ALL)% of Cycles lost due to data access stalls (includes FR/FR stalls):33.23 = % register load stalls (includes FR/FR) + % stalls due to L1D% of Cycles lost due to RSE stalls:1.45 = 100 * (BE_RSE_BUBBLE.ALL / CPU_OP_CYCLES.ALL)% of Cycles lost due to Scoreboard stalls (excludes FR/FR stalls):2.22 = % stalls due to FPU + % register dependency stalls% of Cycles lost due to register load stalls (includes FR/FR stalls):26.81 = % GR/load dependency stalls + % FR/load or FR/FR dependency stalls% of Cycles lost due to FR/load or FR/FR dependency stalls:0.20 = 100 * BE_EXE_BUBBLE.FRALL / CPU_OP_CYCLES.ALL% of Cycles lost due to GR/load dependency stalls:26.61 = 100 * (BE_EXE_BUBBLE.GRALL - BE_EXE_BUBBLE.GRGR) / CPU_OP_CYCLES.ALL% of Cycles lost due to stalls in L1D cache and L1/L2 DTLB:6.42 = 100 * (BE_L1D_FPU_BUBBLE.L1D / CPU_OP_CYCLES.ALL)% of Cycles lost due to register dependency stalls (excludes FR/FR stalls):2.22 = (100 * BE_EXE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) - % register load stalls% of Cycles lost due to GR/GR dependency stalls:2.14 = 100 * BE_EXE_BUBBLE.GRGR / CPU_OP_CYCLES.ALL
19System-wide measurements Most measurements can now be made system-wide— across all processes and CPUs in both user and kernel space.Three levels of sample attribution:--scope system[,attr-mod|attr-proc|attr-none]-w equivalent to: --scope system,attr-modPLM: --event-defaults user|kernel|allSample command (collect IP samples in both kernel and user space for 20 seconds):$ caliper fprof –o o.txt --ev all –w –e 20
20System-wide measurements (cont.) Limitations on HP-UX:You must be logged in as the root userCaliper may not be able to locate some executables and shared libraries, resulting in many “unattributed” samples. Workaround: use --module-search-pathLimitations on Linux:You cannot exclude idle time and the caliper process (though we hope to provide this feature in the future).Limitations on both HP-UX and Linux:While caliper runs in system-wide mode, no other caliper process can be run on the same system.
21New graphical user interface An Eclipse RCP applicationMakes it easy to:Perform measurement collectionsBrowse Caliper databasesSee measurement data, with easy drill downCan be run on remote Integrity server, with display shown on your desktop X server (not recommended on wide-area network) via:$ caliper -gCan be run locally on a Windows or Linux x86-based system (local GUI client communicates with Caliper server via ssh or rexec)
22New graphical user interface (Projects view and Collect view) Saved collection setupStart process System wide Attach processPreviously collected dataStart data collectionRequired fields and tabs in redOnly applicable collection tabs enabled
23New graphical user interface (Measurement tab of Collect view) Data cache misses selectedStop data collectionCollection in progress
24New graphical user interface (viewing data) Analyze viewSaved collection specificationProcess tree tab openedAvailable data setsApplication output
25New graphical user interface (CPU event counts) Show data for entire applicationShow CPU events tab
26New graphical user interface (metrics derived from CPU events) CPU events tab scrolled to show derived metrics
27New graphical user interface (histogram viewer) Maximize or minimize by double-clicking Analyze view tabHottest process (double-click to drill down)Overview of entire histogramPercent of application’s total misses in process be
28New graphical user interface (drill down to functions) Use stacking barsPopups for long function namesShow ‘local’ percents (percent of total for be)DagNode::dagConstMarkPredArc(DagNode *, DagNode *, Dag*)Area viewed in table highlighted in OverviewPrevious levels visited
29New graphical user interface (drill down to disassembly) Show:SourceSource/disasmSorted by addressDisassemblyClick to show hotspots in table
30New graphical user interface (sorting) Sort bundles by misses
32Future directionsExpected new features at HP Caliper 4.2 (January 07):Load module-centric reports (e.g., across process profile of libc.so)Call stack profiling (with wall-clock sampling)Bucketing of data cache miss latencies (to help ascertain cache levels accessed)Trap profilingMerge/diff capability in graphical user interfaceCaliper Advisor integrated with graphical user interfaceFeatures beyond HP Caliper 4.2:Caliper Advisor cheatsheets in graphical user interfaceData-centric cache miss reportsIntegration with Ktrace/KprofileMore data visualization aides in graphical user interfacePer-CPU/per-thread CPU metrics
33Load modules as top level (v4.2) View load modules as top level
34Call-stack profile (v4.2) Graph hot call paths by running time, blocked time, or both
35CPU metrics overview (v4.2) Overview of metrics collected by cpu measurement (default metrics)
36Call-stack samples display (potential future display) Overview of running and stopped threadsSample cursor (drag to any point)Call stacks at sample 754“Playback” controls
37Data-centric cache miss profile display (potential future display) Double-click row to see function’s disassemblyDouble-click row (below) to view instruction addresses (aboveDouble-click row (below) to view data addresses (above
383D histograms (potential future display) Figure from CxPerf User’s Guide
39Hints and tips: caliper command Getting CPU event names from caliper:Dump all events names and descriptions:$ caliper info allList all event names (no other fields):$ caliper info all –d nameList names of all events containing string “L3”:$ caliper info L3 –d nameOr, use an ambiguous event name:$ caliper ecount –metric L3_READ myprogHP Caliper: usage error:Ambiguous event name ("L3_READ") specified for "--metrics".Matches L3_READS.ALL.ALL, L3_READS.ALL.HIT, L3_READS.ALL.MISS, L3_READS.DATA_READ.ALL, L3_READS.DATA_READ.HIT, L3_READS.DATA_READ.MISS, L3_READS.DINST_FETCH.ALL, L3_READS.DINST_FETCH.HIT, L3_READS.DINST_FETCH.MISS, L3_READS.INST_FETCH.ALL, L3_READS.INST_FETCH.HIT, L3_READS.INST_FETCH.MISS.
40Hints and tips: caliper command (cont.) Getting report help:Dump help file for cycles measurement:$ caliper info –r cyclesAppend help to a report:$ caliper cycles --info –o out.txt myprogProviding command options using a file:$ caliper fprof –f myOptionsFileHelping Caliper find:Source code:--source-path-map dir|map[:dir|map:…]*Symbols and disassembly:--module-search-path dir[:dir:…]* Where map == old_path,new_path
41Hints and tips: using views CloseRestore viewsMinimizeRestore default locationsMaximizeLocal view menuCommon view menu (right-click on tab)Detached view(not suported by Motif)
42Summary Itanium execution performance tool Measures production applicationsMeasures entire systemWide range of performance metrics availableExplore performance data using textual or graphical reportsHelp available fromAvailable on HP-UX and Linux
43DSPP Tools & Resources for Itanium®2 Architecture Set You Up for Success CommunityItanium® architecture forums, source code repository, document sharing and mailing listsTraining and Educationonline and classroom trainingNews & EventsSoftwaredevelopment environments, compilers, operating systems, installation/configuration tools, performance tools and moreTechnical documentationwhite papers, tutorials, references documents and manuals, FAQ’s, known problems, sample code, etc.Partner Resourceswebconferencing servicespodcast production servicestrade show discountsEquipmentrentals and purchase discounts
44Where to go …Software Developer Resource Kit for the Intel® Itanium®2 microarchitecture:Development and Business Resources from HP & Intel for HP Integrity-based solutions:Contact points for additional information:AmericastelephoneEuropetelephoneAsia-Pac or go to for local country phone numbers
45Complete Survey to WinHP & Intel are giving away an HP laptop to 1(one) lucky winner!!Promotion Period ends November 19, 2006Attend a webcast AND complete the post-event survey.Full promotion details can be found on DSPP at:
46Webcast replays may also be found at: www.hp.com/go/itaniumwebcasts More EventsTuesday, October 24 – New Dual-Core Processor and Server HardwareTuesday, November 28 – Open MPTuesday, December 19 – HP-MPISign up for the DSPP newsletter to get the latest webcast information sent to you directly.Webcast replays may also be found at:Did you know...that your company can use this same webconferencing tool – at a discounted price - to promote your HP Integrity solutions to your staff and customers? For members only...
47Intel® Early Access Program - Technology The Early Access Program (EAP) gives you access to Intel® technology to support your current development cycle as well as early access to tools and information on new technologies. Your membership includes:Early access to pre-release software development platformsAccess to Intel and 3rd party software and testing toolsTraining through Intel® Software College and Web eventsTechnical content and how–to articlesProtected remote access to easily evaluate and develop software safely and securely on platforms over the Internet
48Intel® Early Access Program -Marketing Opportunities and Support Extensive marketing and business development opportunities:Inclusion in online and print versions of the Intel® Developer Solutions CatalogIntel quotes to support your PRCase studiesAccess to Intel’s event marketing asset kitParticipation in selected industry events and trade showsSupport in your development efforts provided through:Access to an Intel Account Representative who will act as your primary contactIntel® Premier Support for confidential technical support24/7 online support via
49Related Intel® Resources Intel® Early Access ProgramIntel® Software NetworkIntel® Software CollegeIntel® Software Development ToolsExperience Intel® Itanium® 2 Architecture
50Q&A Session:To ask a question over the phone, press *1 on your touch-tone telephone.
51Q&A Session:To ask a question over the phone, press *1 on your touch-tone telephone.September 2006Speaker: Stephen WilliamsCaliper Development TeamHewlett-Packard