Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Monitoring Update Daniele Francesco Kruse April 2010.

Similar presentations

Presentation on theme: "Performance Monitoring Update Daniele Francesco Kruse April 2010."— Presentation transcript:

1 Performance Monitoring Update Daniele Francesco Kruse April 2010

2 Summary 1.Monitoring CMSSW on Nehalem 2.SDL substitute candidates 3.Monitoring Geant4 2

3 A performance model for Nehalem: Overview 3 Total_cycles: CPU_CLK_UNHALTED:THREAD_P Useless_uops: (UOPS_EXECUTED:PORT015 + UOPS_EXECUTED:PORT234_CORE) - UOPS_RETIRED:ANY PORT015_uop_execution_rate: UOPS_EXECUTED:PORT015 / UOPS_EXECUTED:PORT015(CMASK=1) PORT234_CORE_uop_execution_rate: UOPS_EXECUTED:PORT234_CORE / UOPS_EXECUTED:PORT234_CORE(CMASK=1) Uop_execution_rate: PORT015_uop_execution_rate + PORT234_CORE_uop_execution_rate Useless_cycles: Useless_uops / Uop_execution_rate Useful_cycles: UOPS_RETIRED:ANY / Uop_execution_rate Active_cycles: Useless_cycles + Useful_cycles Stalled_cycles: Total_cycles - Active_cycles

4 Cycle Accounting Analysis for Nehalem (Intel core i7) Total Cycles (Application total execution time) Issuing μops Not Issuing μops Stalled (no work) Not retiring μops (useless work) Retiring μops (useful work) 4 CPU_CLK_UNHALTED:THREAD_P Active_Cycles = Useless_cycles + Useful_cycles UOPS_RETIRED:ANY / Uop_execution_rate Useless_uops / Uop_execution_rate Total_cycles - Active_cycles

5 Nehalem: Overview of memory and cache stalls 5 Memory and cache related stalls: MEM_LOAD_RETIRED:DTLB_MISS // ~10 cycles MEM_LOAD_RETIRED:L1D_HIT // too small: penalty hidden MEM_LOAD_RETIRED:L2_HIT // ~14.5 cycles MEM_LOAD_RETIRED:L3_MISS // ~180 cycles (arch. dependent) MEM_LOAD_RETIRED:L3_UNSHARED_HIT // ~42 cycles MEM_LOAD_RETIRED:OTHER_CORE_L2_HIT_HITM // ~74 cycles ITLB_MISS_RETIRED // too small: penalty hidden Other Stalls: ILD_STALL:ANY // BROKEN?!?! RAT_STALLS RESOURCE_STALLS SEG_RENAME_STALLS SQ_FULL_STALL_CYCLES STORE_BLOCKS And finally what happened to store-forward stalls? - Loads spanning across cache lines cause almost no stalls anymore - Loads blocked by unknown address stores and loads blocked because they are not completely contained in preceding store still cause stalls - Unfortunately no direct event to count these situations

6 Monitoring CMSSW on Nehalem First Nehalem results with CMSSW 3.6.0 pre2 (here)here (compare with Core results here)here Tool discovers architecture at runtime (CPUID) First performance considerations on Nehalem Faster (generally 3 – 15% over Core cycle count) Lower CPI (no & type of instructions stays the same obviously) Stalled cycles cut down to around 30% of Core values (but we need to verify coverage accurately) Same percentage of mispredicted branches Seems more useful cycles required to do the same job 6

7 Version with graphs Structure and libraries 7 Analysis Configuration Start Performance Data Taking Program Run Performance Data Output Performance Data Analysis Browsable HTML results End libpng zlib libSDL libSDL_ttflibpfm zlib

8 SDL substitute candidates libSDL_ttf is not part of standard SLC5 installation SDL substitute candidates (both successfully tested): HTML5’s tag: Supported by Firefox, Opera, Safari & Chrome Text drawing supported only by Firefox (Gecko), Safari & Chrome Internet Explorer also supports it through Mozilla’s plugin ROOT: A little heavier and more difficult to adapt Works the same way as the current SDL implementation (png output) 8

9 Monitoring Geant4 Overall and symbol analysis already possible pfmon command line tool & FullCMS simulation example Modular analysis through User Actions Probably RunAction and EventAction combined Type of particle, direction and energy determine complexity and type of event This triple may be used to describe the “module” of the analysis Proposal: event-level granularity 9

10 Conclusions 10 CMSSW 3.6.0 has been successfully monitored on a Nehalem machine Two proposed substitutes for results graphics display have been successfully tested for suitability: ROOT & A study to apply modular monitoring of Geant4 is underway

11 What’s next? 11 Further study stall impacts on Nehalem and validate Cycle Accounting Analysis (possibly with David Levinthal in may) Implement graphics display without SDL dependency, using ROOT or HTML5’s tag Make Geant4 monitoring exercises with simple examples and later with FullCMS application

12 Thank you, Questions ?

Download ppt "Performance Monitoring Update Daniele Francesco Kruse April 2010."

Similar presentations

Ads by Google