Tools for Investigating Graphics System Performance

Slides:



Advertisements
Similar presentations
FatMax Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5.
Advertisements

Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel.
Agenda Windows Display Driver Model (WDDM) What is GPUView?
Computer Hardware & Systems
Windows Display Driver Model (WDDM) v2 And Beyond
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Optimizing and Debugging Graphics Applications with AMD's GPU PerfStudio 2.2 GPG Developer Tools Raul Aguaviva Gordon Selley Seth Sowerby.
Working with Huge Digital Prototypes: Autodesk Inventor Large-Assembly Best Practices Dan Miles INCAT Autodesk Practice Manager =
4.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles Chapter 4 Multithreaded Programming Objectives Objectives To introduce a notion of.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
Chapter 13 Embedded Systems
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
1 Inferring Scheduling Behavior with Hourglass John Regehr School of Computing, University of Utah 6/14/2002.
Microkernels: Mach and L4
Process Concept An operating system executes a variety of programs
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
Windows Server 2008 Chapter 11 Last Update
OPTIMIZING AND DEBUGGING GRAPHICS APPLICATIONS WITH AMD'S GPU PERFSTUDIO 2.5 GPG Developer Tools Gordon Selley Peter Lohrmann GDC 2011.
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Realtime 3D Computer Graphics Computer Graphics Computer Graphics Software & Hardware Rendering Software & Hardware Rendering 3D APIs 3D APIs Pixel & Vertex.
CSE 451: Operating Systems Autumn 2013 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
MCTS Guide to Microsoft Windows Vista Chapter 11 Performance Tuning.
MCTS Guide to Microsoft Windows 7
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
11 SYSTEM PERFORMANCE IN WINDOWS XP Chapter 12. Chapter 12: System Performance in Windows XP2 SYSTEM PERFORMANCE IN WINDOWS XP  Optimize Microsoft Windows.
® GDC’99 Performance Tuning with Intel ® Graphics Tools Larry Wickstrom Sr. Software Engineer Judith Stanley Application Engineer Intel Corporation March.
Kenneth Hurley Sr. Software Engineer
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Next-Generation Graphics APIs: Similarities and Differences Tim Foley NVIDIA Corporation
NVTune Kenneth Hurley. NVIDIA CONFIDENTIAL NVTune Overview What issues are we trying to solve? Games and applications need to have high frame rates Answer.
Lecture 5: Threads process as a unit of scheduling and a unit of resource allocation processes vs. threads what to program with threads why use threads.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 4: Processes Process Concept Process Scheduling Types of shedulars Process.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
Processes Chapter 3. Processes in Distributed Systems Processes and threads –Introduction to threads –Distinction between threads and processes Threads.
Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)
1 Module 3: Processes Reading: Chapter Next Module: –Inter-process Communication –Process Scheduling –Reading: Chapter 4.5, 6.1 – 6.3.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Processes and Threads Chapter 3 and 4 Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee Community College,
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Hands-On Microsoft Windows Server 2008
OPERATING SYSTEMS CS3502 Fall 2017
Mobile Operating System
Process Management Presented By Aditya Gupta Assistant Professor
NVIDIA Profiler’s Guide
The Small batch (and Other) solutions in Mantle API
Threads & multithreading
Chapter 9: Virtual-Memory Management
Threads and Data Sharing
PerfView Measure and Improve Your App’s Performance for Free
CSE 451: Operating Systems Spring 2012 Module 6 Review of Processes, Kernel Threads, User-Level Threads Ed Lazowska 570 Allen.
Mid Term review CSC345.
Thread Implementation Issues
Lecture Topics: 11/1 General Operating System Concepts Processes
CPU scheduling decisions may take place when a process:
CSE451 - Section 10.
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Tools for Investigating Graphics System Performance Matthew Fisher Steve Pronovost

Goal A video game runs slowly, skips frames, has high latency, etc. and the developers want to fix this problem The problem is almost always a cascade of bottlenecks at the application, CPU, and GPU levels that is very challenging to investigate locally We want tools that lets programmers solve these problems faster

Approaches Profiling PIX (for Windows and Xbox 360) GPUView Rig the game events with logging or use an automatic profiler PIX (for Windows and Xbox 360) All calls by the game to the graphics API are logged GPUView OS logs all CPU, graphics kernel and graphics driver events

Profiling Manual profiling requires a significant amount of development effort Polling-based automatic profiling can work reasonably well for CPU applications but doesn’t capture graphics or memory transfer events well Percentage-based statistics (“you spent 45% of the time in function X”) can sometimes be useful and sometimes extremely misleading

PIX Released by Microsoft as part of the DirectX SDK Multiple modes for investigating performance targeted at game developers Interactive mode Frame logging Frame capture and playback

PIX – Interactive Mode Various counters stream by as the game runs You can change the counters, hope is to find that the observed problem correlates with one of the counters

PIX – Interactive Mode

Commonly Used Counter Types Number, type, and size of draw primitive calls Number of texture, vertex/index buffer locks, and what memory pool was locked Object creation and destruction events Allocated system and video memory Frame latency, seconds per frame Page faults

PIX – Frame Capture Mode

PIX – Debug Pixel

Questions PIX is good at Are object locks causing the frame skipping problem users are experiencing? Are we allocating too many resources we don’t use? What are the API calls that are taking the longest time to execute? Why was this pixel in the sky green?

GPUView

Windows Display Driver Model The XP Display Driver Model required applications to cede control of the graphics infrastructure and was largely designed assuming a single 3D application would be running The Vista Display Driver Model added standard scheduling principles forcing applications to share control of graphics memory and compute resources

GPUView The graphics model switch induced a variety of constraints on graphics applications and forced highly optimized graphics drivers to be restructured Many games were running more slowly on Vista than they did on XP (~5% - 30% slower) GPUView was designed to help investigate these problems and see what stage was causing the speed difference

Event Tracing The GPUView logger enables logging of a vast set of events in the OS, such as All calls to the Windows graphics kernel All resource creation, lock, destruction, etc. events All command buffer submissions Context switches (w/ stack trace and reason) Kernel mode enter/exits (w/ stack trace) World of Warcraft generates approximately 1GB every 3 seconds

GPUView Without Any Graphics

Windows Display Driver Model Applications build up local command buffers When these command buffers get big enough they are submitted to the application’s local graphics queue for processing The graphics scheduler selects which application should be running on which graphics card and submits work to the corresponding hardware queue

One Second of a Game

Setup

Multiple Applications Fighting

Simple Problems

Relatively Normal Execution

GPU Starvation

GPU Idle

Sleepy App

Huge Render Times (GPU Bound)

GPU and CPU Starvation

Answering Questions

Why Did Our Thread Context Switch?

Does Surface Allocation Cause Frame Stuttering?

Thoughts Surprisingly, the overhead of GPUView logging is pretty minimal and the traces often reflect the underlying problem well The biggest advantage of GPUView over PIX is that PIX can’t tell you crucial things like when the GPU is blocked on the CPU GPUView is excellent for telling you what part of the application needs optimization

Driver Perspective Provides a lot of detail to let display driver writers and the DirectX graphics kernel diagnose problems with task submission, the command buffer submission threads, GPU preemption, video skipping, etc.