Presentation is loading. Please wait.

Presentation is loading. Please wait.

® GDC’99 Performance Tuning with Intel ® Graphics Tools Larry Wickstrom Sr. Software Engineer Judith Stanley Application Engineer Intel Corporation March.

Similar presentations


Presentation on theme: "® GDC’99 Performance Tuning with Intel ® Graphics Tools Larry Wickstrom Sr. Software Engineer Judith Stanley Application Engineer Intel Corporation March."— Presentation transcript:

1

2 ® GDC’99 Performance Tuning with Intel ® Graphics Tools Larry Wickstrom Sr. Software Engineer Judith Stanley Application Engineer Intel Corporation March 17, 1999

3 ® GDC’99 Purpose To provide two tools that give more performance information than you can get anywhere else!

4 ® GDC’99 Finding FPS problems in your Game Finding FPS problems in your Game Measuring concurrency in your Game Measuring concurrency in your Game Pinpointing performance thru API logging in your Game Pinpointing performance thru API logging in your Game Tuning D3D App. Perf. Using IPEAK/GPT

5 ® GDC’99 The Tool Family Tree Your Game DirectX* GFX Driver Intel ® Graphics Hardware VTune Analyzer IPEAK GPT *Third party marks and brands are the property of their respective owners Intel ® Graphics Profiler in VTune™ Analyzer 4.0

6 ® GDC’99 Half-Life* FPS Half-Life* FPS Demo * Other brands and names are property of their respective owners. GPT finds frame rate problems

7 ® GDC’99 GPT Intercepts DX6.1: DirectDraw* and Direct3D Immediate Mode* GPT Intercepts DX6.1: DirectDraw* and Direct3D Immediate Mode* GPT and the Graphics Pipeline Graphics Controller Application API (DirectDraw*/ Direct3D IM*) Display Driver GPT interceptor * Other brands and names are property of their respective owners. –Retained Mode* partially supported –OpenGL* planned

8 ® GDC’99 Remove Graphics Load Remove Graphics Load –To measure load balance of CPU vs Graphics Remove Parallelism Remove Parallelism –To measure concurrency Now let’s take control...

9 ® GDC’99 GPT Can Remove... GPT Can Remove... Measuring Load Balance API (DirectDraw*/ Direct3D IM*) Driver Graphics Controller Display Application GPT interceptor * Other brands and names are property of their respective owners.

10 ® GDC’99 GPT Can Remove... GPT Can Remove... Measuring Load Balance API (DirectDraw*/ Direct3D IM*) Driver Application GPT interceptor * Other brands and names are property of their respective owners. –Graphics Controller Graphics Controller Display

11 ® GDC’99 GPT Can Remove... GPT Can Remove... Measuring Load Balance API (DirectDraw*/ Direct3D IM*) Application GPT interceptor * Other brands and names are property of their respective owners. –Driver CPU Load –Graphics Controller Driver

12 ® GDC’99 GPT Can Remove... GPT Can Remove... Measuring Load Balance Application GPT interceptor * Other brands and names are property of their respective owners. –API CPU Load –Driver CPU Load –Graphics Controller API (DirectDraw*/ Direct3D IM*)

13 ® GDC’99 GPT Can Remove... GPT Can Remove... Measuring Load Balance Application GPT interceptor –API CPU Load –Driver CPU Load –Graphics Controller … and keep the App happy … and keep the App happy

14 ® GDC’99 Comparison of NULL API to Normal fps Unmodified NULL API API Overhead Time

15 ® GDC’99 Comparison of NULL API to Normal fps Unmodified NULL API App Bound Time

16 ® GDC’99 Comparison of NULL API to Normal fps Unmodified NULL API Time

17 ® GDC’99 If performance increases dramatically If performance increases dramatically –Too much graphics –Too little app –add more AI/Physics/... If performance doesn’t increase If performance doesn’t increase –Too much App –could HW do more? –Too little graphics What can be inferred...

18 ® GDC’99 Parallel Performance Parallel Performance –CPU & GC work at same time Serial Performance Serial Performance –CPU waits on GC, vice versa Concurrency CPU 3D HW CPU

19 ® GDC’99 GPT Can Introduce Locks here GPT Can Introduce Locks here Measuring Concurrency API (DirectDraw*/ Direct3D IM*) Driver Graphics Controller Display Application GPT interceptor * Other brands and names are property of their respective owners. //

20 ® GDC’99 GPT Can Introduce Locks here GPT Can Introduce Locks here Measuring Concurrency API (DirectDraw*/ Direct3D IM*) Driver Graphics Controller Display Application GPT interceptor // that Serialize CPU & Graphics Hardware activity here * Other brands and names are property of their respective owners.

21 ® GDC’99 Comparison of Serialize to Normal Time fps Unmodified Serial Concurrency

22 ® GDC’99 Comparison of Serialize to Normal Time fps Unmodified Serial Lack of Concurrency

23 ® GDC’99 If Serial << Normal If Serial << Normal –Good. Wider gap means more concurrency If Serial == Normal If Serial == Normal –Application isn’t benefiting from CPU/GC concurrency –App is causing CPU & GC to serialize –Extreme load imbalance –Either no graphics load, or no CPU load What can be Inferred...

24 ® GDC’99 Half-Life* Load Balance/Concurrency Half-Life* Load Balance/Concurrency Demo GPT finds frame concurrency problems * Other brands and names are property of their respective owners.

25 ® GDC’99 API Logging Direct3D calling DirectDraw calling DirectDraw

26 ® GDC’99 Coverage

27 ® Duration (Frame Marking)

28 ® GDC’99 Half-Life* Load Balance/Concurrency Half-Life* Load Balance/Concurrency Demo GPT pinpoints performance problems * Other brands and names are property of their respective owners.

29 ® GDC’99 GPT quickly finds FPS problems in your game GPT quickly finds FPS problems in your game GPT measures Concurrency & Load Balance GPT measures Concurrency & Load Balance GPT pinpoints API level performance problems GPT pinpoints API level performance problems GPT Summary

30 ® GDC’99 Intel ® Graphics Profiling Capability of VTune™ Performance Analyzer 4.0 What Is It? What Is It? What’s It Do? What’s It Do? Show Me How... Show Me How...

31 ® GDC’99 VTune™ Performance Analyzer 4.0 System monitoring System monitoring Software execution examination Software execution examination Dynamic simulation and analysis Dynamic simulation and analysis What Is It?

32 ® GDC’99 Intel ® Graphics Profiling Capability Integrated into VTune™ Performance Analyzer 4.0 Integrated into VTune™ Performance Analyzer 4.0 3D application profiling 3D application profiling What Is It?

33 ® GDC’99 The Tool Family Tree Intel ® Graphics Profiler in VTune™ Analyzer 4.0 Your Game DirectX* GFX Driver Intel Graphics Hardware VTune Analyzer IPEAK GPT What Is It? *Third party marks and brands are the property of their respective owners

34 ® GDC’99 Architecture Select and view events Select and view events L2 Cache CPU Chip Set Sys Mem PCI Bus Intel ® Graphics Accelerator Local Vid Memory Intel740™Driver Setup Pix Fill Frames/Sec CPU Utilization State Changes AGP –Intel ® Graphics Hardware Driver Tri/Sec,Utilization Pix/Sec,Utilization –Intel ® Graphics Chip 3DPipe 2D Engine 2D What Is It?

35 ® GDC’99 Analyze Intel ® Graphics Hardware Maximum fill rate Clocks app sits idle 3D Clocks can be recovered What’s It Do?

36 ® GDC’99 Watch Intel ® Graphics D3D*/OpenGL* Drivers Total time in driver Total time in driver Duty cycle for average triangle Duty cycle for average triangle Frames per second Frames per second Total time in each driver call back Total time in each driver call back What’s It Do? *Third party marks and brands are the property of their respective owners

37 ® GDC’99 Reports Bottlenecks Triangle packet size Triangle packet size CPU/Intel740™ chip concurrency CPU/Intel740™ chip concurrency Locks to render targets Locks to render targets What’s It Do?

38 ® GDC’99 Get Started Profile your app with VTune™ Analyzer 4.0 Profile your app with VTune™ Analyzer 4.0 Look for hot-spots Look for hot-spots Look at HW/Driver Counter graphs Look at HW/Driver Counter graphs Find the problem then “drill down” to the CPU time frame Find the problem then “drill down” to the CPU time frame Show Me How...

39 ® GDC’99 Find the Bottleneck Serialization vs Concurrency Serialization vs Concurrency –The CPU sits idle (waits for HW) – The graphics HW sits idle (CPU busy) Why? Why? –Improperly placed 2D instructions –Triangle-at-a-time methodology Gfx HW Raster Triangles Raster Triangles... Driver Duty Cycle Driver Duty Cycle One Frame One Frame Processor GfxHW Drv Light/Transform/Game Control GfxHW Drv Light/Transform… Show Me How...

40 ® GDC’99 Demo: Guess What the Bottleneck Is? Show Me How...

41 ® GDC’99 What to Look For An app requires triple buffering... An app requires triple buffering... An app requires MipMapping… An app requires MipMapping… You can gather 3D statistics… You can gather 3D statistics… Show Me How...

42 ® GDC’99 Demo: Guess What the Bottleneck is? Show Me How...

43 ® GDC’99Summary: Intel ® Graphics Profiler is a new capability of VTune™ Analyzer 4.0 Intel ® Graphics Profiler is a new capability of VTune™ Analyzer 4.0 Intel Graphics Profiler monitors graphics HW and driver performance Intel Graphics Profiler monitors graphics HW and driver performance What you learn can apply to other graphics hardware What you learn can apply to other graphics hardware Usage: find the problem, then drill down! Usage: find the problem, then drill down!

44 ® GDC’99 IPEAK GPT IPEAK GPT –Questions, comments - ipeak@intel.com –IPEAK Web site –http://developer.intel.com/design/ipeak Intel® Graphics Profiling Capability in Vtune™ Analyzer 4.0 Intel® Graphics Profiling Capability in Vtune™ Analyzer 4.0 –http://intel.com/vtune –http://developer.intel.com/design/graphic s/swdev/index.htm Support & Information Download the Demo!!!

45 ® GDC’99 BACKUP

46 ® Installation Included in VTune™ 4.0 Analyzer Installation Included in VTune™ 4.0 Analyzer Installation –Select the Intel740™ Graphics Accelerator counters at the component installation configuration menu Enabling Graphics Profiling Enabling Graphics Profiling –Under “Configure”, under “Options” and “Sampling”, select “Chronology Objects” –Enable the Intel ® Graphics Counters (Intel740™ Graphics Accelerator) –Double click on the Intel740 Chip Counters in the same menu to configure individual counters –Finally, Under “Sampling”, select “Advanced” and enable “Collect Chronology Data” OA Profiler Capability is Included with VTune 4.0 Analyzer

47 ® GDC’99 Tuning your App with the OA Tool Set: Accounting for lost clocks Intel ® Graphics Hardware Intel ® Graphics Hardware –Maximum fill rate is 1 Pix/Clock - 66Meg Pixels/Sec –Clocks between Cmd_Stream_Busy and 66M are clocks the Intel740™ chip sits Idle –3D Clocks not producing a pixel potentially can be recovered by modifying application code D3D*/OpenGL* Driver D3D*/OpenGL* Driver –Total number of CPU clocks used by the driver –Duty cycle for average triangle sizes listed in the SUG can be used to predict where your game should be running –Total from each call backs can be observed to narrow down bottlenecks. –Typical bottlenecks: Triangle packet size, CPU/Intel740 chip concurrency, locks to render targets *Third party marks and brands are the property of their respective owners

48 ® GDC’99 What Intel ® Graphics Counters Tell About Your App % 2D or 3D Cmd Stream Busy % 2D or 3D Cmd Stream Busy –Total amount of time the graphics hardware is in use % 3D Fill Engine Busy vs % 3D Fill Engine Stall % 3D Fill Engine Busy vs % 3D Fill Engine Stall –If very high, app can be fill rate limited (very large tris) –Contrast to see if busy but stalled indicating either waiting for pixel data or waiting for info to finish pixel calculation % 3D Pipeline Busy % 3D Pipeline Busy –If higher than %3D Fill Engine busy indicates too many small triangles and setup limited Graphics Counters Correlate GFX Hardware Events

49 ® GDC’99 What Intel ® Graphics Counters Tell About Your App Pixels (Z Tested) and (Z Failed) Pixels (Z Tested) and (Z Failed) –Number of pixels processed by the gfx card –Z failed / Z tested gives % Z buffer depth Z Writes to Z Buffer Z Writes to Z Buffer –Counts the number of 16-bit Z writes Pixel Reads from Render Buffer Pixel Reads from Render Buffer –You can check what % of your scene gets alpha blended when contrasted with Pixel Writes Color Calculator Stalled by Color Read Color Calculator Stalled by Color Read –If this is high, alpha blending could be causing a bottleneck for local memory bandwidth Counters Used in Combination Uncover Added Information

50 ® GDC’99 What Intel ® Graphics Counters Tell About Your App “Triangles Processed” & “Triangles Rendered” “Triangles Processed” & “Triangles Rendered” –Triangles per second. A large discrepancy indicates zero pixel triangles “AGP Texture Data Bytes Read” “AGP Texture Data Bytes Read” –This is AGP bandwidth being used for textures in bytes. “Texture Cache Busy” & “Texture Cache Fetch Stall” “Texture Cache Busy” & “Texture Cache Fetch Stall” –All texel data goes through the texture cache so this indicates texture usage. –Texture Cache Fetch Stall - very high indicates AGP texture bandwidth is overrun - need mipmapping. What You Learn Can Apply to Other Graphics Hardware


Download ppt "® GDC’99 Performance Tuning with Intel ® Graphics Tools Larry Wickstrom Sr. Software Engineer Judith Stanley Application Engineer Intel Corporation March."

Similar presentations


Ads by Google