Maximizing Multi-GPU Performance

Slides:



Advertisements
Similar presentations
A Real Time Radiosity Architecture for Video Games
Advertisements

Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group Original materials from Bill Bilodeau 1 15/01/2014.
Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.
Advanced Virtual Texture Topics
Advanced Visual Effects with Direct3D
Using Graphics Processors for Real-Time Global Illumination UK GPU Computing Conference 2011 Graham Hazel.
Device Drivers. Linux Device Drivers Linux supports three types of hardware device: character, block and network –character devices: R/W without buffering.
DirectX11 Performance Reloaded
Firaxis LORE And other uses of D3D11.
Multi-monitor Game Development Thomas Fortier AMD Graphics Developer Relations
Vertex Buffer Objects, Vertex Array Objects, Pixel Buffer Objects.
Agenda Windows Display Driver Model (WDDM) What is GPUView?
Introduction to Direct3D 12
Dragon Age II DX11 Technology
Improving Performance in Your Game
1 | Introducing GPU PerfStudio 2.0 | GDC 2009 Introducing AMD GPU PerfStudio 2.0 Next Generation GPU Performance Analysis & Debugging Tool from AMD GPG.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
6 SQL Server Integration Same manageability, administration & development experience Integrated queries & transactions Integrated HA and backup/restore.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce.
Fast GPU Histogram Analysis for Scene Post- Processing Andy Luedke Halo Development Team Microsoft Game Studios.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
Optimizing and Debugging Graphics Applications with AMD's GPU PerfStudio 2.2 GPG Developer Tools Raul Aguaviva Gordon Selley Seth Sowerby.
Tools for Investigating Graphics System Performance
Sorting and Searching Timothy J. PurcellStanford / NVIDIA Updated Gary J. Katz based on GPUTeraSort (MSR TR )U. of Pennsylvania.
Many-Core Programming with GRAMPS Jeremy Sugerman Kayvon Fatahalian Solomon Boulos Kurt Akeley Pat Hanrahan.
Hybrid PC architecture Jeremy Sugerman Kayvon Fatahalian.
Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008.
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.
High Performance in Broad Reach Games Chas. Boyd
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
4.7. I NSTANCING Introduction to geometry instancing.
® GDC’99 Performance Tuning with Intel ® Graphics Tools Larry Wickstrom Sr. Software Engineer Judith Stanley Application Engineer Intel Corporation March.
Mark Nelson Graphics hardware & Game worlds Fall 2013
Kenneth Hurley Sr. Software Engineer
Multi-threading basics
COMP 261 Lecture 16 3D Rendering. input: set of polygons viewing direction direction of light source(s) size of window. output: an image Actions rotate.
4/23/2017 4:23 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
Esri UC 2014 | Technical Workshop | Animating Thousands of Graphics with ArcGIS Runtime SDK for Java Mark Baird and Vijay Gandhi.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
OpenGL ES Performance (and Quality) on the GoForce5500 Handheld GPU Lars M. Bishop, NVIDIA Developer Technologies.
OpenGL Performance John Spitzer. 2 OpenGL Performance John Spitzer Manager, OpenGL Applications Engineering
NVTune Kenneth Hurley. NVIDIA CONFIDENTIAL NVTune Overview What issues are we trying to solve? Games and applications need to have high frame rates Answer.
OpenGL Buffer Transfers Patrick Cozzi University of Pennsylvania CIS Spring 2012.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
GAM666 – Introduction To Game Programming ● DirectX is implemented as a collection of COM objects ● To use a DirectX program, the user must have the correct.
NVIDIA OpenGL Update Simon Green. Copyright © NVIDIA Corporation 2004 Overview SLI How it works OpenGL Programming Tips SLI Futures New extensions NVX_instanced_arrays.
CDVS on mobile GPUs MPEG 112 Warsaw, July Our Challenge CDVS on mobile GPUs  Compute CDVS descriptor from a stream video continuously  Make.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt.
Graphics, Modeling, and Textures Computer Game Design and Development.
My Coordinates Office EM G.27 contact time:
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
Carlos Bossy Quanta Intelligence SQL Server MCTS, MCITP BI CBIP, Data Mining Real-time Data Warehouse and Reporting Solutions.

VMM-Independent Graphics Acceleration H. Andrés Lagar-Cavilla, U of Toronto Niraj Tolia (CMU), Eyal de Lara (Toronto), M. Satyanarayanan.
VMGL: VMM-Independent Graphics Acceleration H. Andrés Lagar-Cavilla, U of Toronto Niraj Tolia (CMU), Eyal de Lara (Toronto), M.
From VIC (VRVS) to ViEVO (EVO) 3 years of experiences with developing of video application VIC for VRVS allowed us to develop a new video application.
EMERALDS Landon Cox March 22, 2017.
The Small batch (and Other) solutions in Mantle API
Static Image Filtering on Commodity Graphics Processors
HW for Computer Graphics
Desktop Window Manager
Presentation transcript:

Maximizing Multi-GPU Performance Thomas Fortier ISV Relations AMD Graphics Products Group thomas.fortier@amd.com

Topics Covered in this Session Why multi-GPU solutions matter. Hardware & driver considerations. Impact on game design. Profiling & performance gains.

Why Multi-GPU Solutions Matter Dual-GPU boards Multi-board systems Hybrid graphics

Why Support Multi-GPU in Your Game Growing market share of multi-GPU solutions. All game and hw reviews integrate multi-GPU solutions. Expectation by gamers is that game framerate should “just scale” with additional GPUs. The competition is doing it! Market trend

Crossfire Technical Overview

Crossfire Technical Overview Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Frame 7 Frame 8

Crossfire Technical Overview

Crossfire Technical Overview

Crossfire Technical Overview

Crossfire Technical Overview

Crossfire Technical Overview

Crossfire Technical Overview

Crossfire Technical Overview

Crossfire Technical Overview

Alternate Frame Rendering Alternate frame rendering leads to two types of problems: Interframe dependencies CPU/GPU synchronization points In each case, parallelism between CPU and GPUs is lost.

Querying the Number of GPUs Statically link to: atimgpud_s_x86.lib - 32 bit version atimgpud_s_x64.lib - 64 bit version Include header file: atimgpud.h Call this function: INT count = AtiMultiGPUAdapters(); In windowed mode, set Count to 1

Interframe Dependencies

Interframe Dependencies

Interframe Dependencies When are interframe dependencies a problem? Depends on frequency of P2P blits. Solutions: Create n copies of the resource triggering P2P blits. Associate each copy of the resource to a specific GPU. resource[frame_num % num_gpus] Repeat resource updates for n frames.

Interframe Dependencies

Interframe Dependencies

Interframe Dependencies There are many ways to update resources using the GPU: Drawing to Vertex / Index Buffers Stream Out CopyResource() CopySubresourceRegion() GenerateMips() ResolveSubresource() Etc…

CPU/GPU Synchronization Points Frame 1 Frame 2 Frame 3 Frame 4 Frame 5

CPU/GPU Synchronization Points

CPU/GPU Syncs - Queries Having the driver block on a query starves the GPU queues, and limits parallelism. Solutions: Don’t block on query results. Don’t have queries straddle across frames. For queries issued every frame, create a query object for each GPU. Pick up query results n frames after it was issued.

CPU/GPU Syncs – CPU Access to GPU Resources Triggers pipeline stalls because driver blocks waiting on GPU at lock/map call. Followed by a P2P blit at unlock/unmap call. Often results in negative scaling… Solutions: DX10/DX11 – Stream to and copy from staging textures. DX9 – Stream to and copy from sysmem textures. DX9 – Never lock static vertex/index buffers, textures.

Multi-GPU Performance Gains What kind of performance scaling should you expect from multi-GPU systems? Function of CPU/GPU workload balance. Typical for 2 GPUs is 2X scaling. For 3 & 4 GPUs, varies from game to game.

Crossfire Profiling Make sure to be GPU bound. Test framerate scaling with resolution change. Test for multi-GPU scaling. Rename app exe to ForceSingleGPU.exe. Test for texture interframe dependencies. Rename app exe to AFR-FriendlyD3D.exe. Remove queries. Check for CPU locks of GPU resources.

Key Takeaways Multi-GPU solutions matter! Test and profile with multi-GPU systems. Properly handle interframe dependencies. Check for CPU locks of GPU resources. Don’t block on queries. Refer to AMD Crossfire SDK samples ati.amd.com/developer CrossFire Detect & AFR-Friendly projects.

Thank You Thomas Fortier – thomas.fortier@amd.com