Presentation is loading. Please wait.

Presentation is loading. Please wait.

GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy

Similar presentations


Presentation on theme: "GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy"— Presentation transcript:

1 GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy RichardH@nvidia.com

2 GDC March 1999Scalability - R Huddy Basic Objectives To produce the best experience on every users machine To exploit all of the resources available To cope with a broad spread of hardware To avoid ‘bottoming out’ during the shelf- life of the game / engine

3 GDC March 1999Scalability - R Huddy What is a high-end PC? A 125+ mega-texel device A 125+ mega-pixel device A fast CPU ( >= 350MHz) AGP 2X/4X Bus Lots of system RAM ( >= 64MB) Huge frame buffers (16 to 32 MB) Multi-Texture at low cost

4 GDC March 1999Scalability - R Huddy Power Trends CPU Speed Fill Rate Appreciate the absolute values and the ratios. ?

5 GDC March 1999Scalability - R Huddy So what’s the problem? Second generation hardware: A a Graphics bc CPU BC time A a Graphics bc CPU BC time Third generation hardware: Wow, 10% faster! BeginScene() EndScene()

6 GDC March 1999Scalability - R Huddy What can you do to help? Scalability is the key: Run at higher screen resolutions Run at higher color depths Use more complex rendering techniques on good hardware Ship multiple geometry models Protect your CPU Unlock the frame rate

7 GDC March 1999Scalability - R Huddy Higher Screen Resolutions 1) Include direct support for higher resolution modes (uses lots of disk space). 2) Store high resolution art and filter down to produce lower resolution art. 3) Store low resolution art and pixel double: If you have art at 512x384 use it for 1024x768 If you have art at 640x480 use it on 1280x1024 (but only use a 1280x960 viewport)

8 GDC March 1999Scalability - R Huddy Higher Color Depths Runs at much the same speed but gives the user a much richer experience Uses frame buffer memory constructively You can re-use the previous 16 bit assets The main performance loss in true color is often due to texture management But beware the Frame Buffer + Z Buffer depth constraint on Riva TNT

9 GDC March 1999Scalability - R Huddy Complex Rendering Techniques - I Environment Mapping –Beware of spending too much CPU on this. Dual Texture Lighting Bump Mapping Use more alpha transparency –But see also “Alpha sort issues” later on… Please try to use the extra fill rate!

10 GDC March 1999Scalability - R Huddy Trilinear mipmapping for almost everything Use Detail textures Large textures for extra realism 32 bit textures - where it’s a quality win Compressed textures as long as quality is not compromised Complex Rendering Techniques - II

11 GDC March 1999Scalability - R Huddy Protect your CPU The big ones: __ftol and other ‘type conversion’ nightmares sqrt() –that’ll be seventy cycles please... Reciprocal square root –One hundred and nine cycles through the FPU… Transform and lighting (more on that later)

12 GDC March 1999Scalability - R Huddy Removing __ftol Remember that the compiler doesn’t have a choice but you can check the output Write you own inline assembler conversion routine if… –You can accept differing rounding rules This doesn’t break the optimiser!

13 GDC March 1999Scalability - R Huddy Replacement for sqrt() Sqrt seems ‘natural’ if you are normalising vectors, calculating environment map coordinates or calculating distances - but it’s sloooow Sample code is available from the developer web site or from me directly and will be in future versions of the SDK.

14 GDC March 1999Scalability - R Huddy Saturation Arithmetic (C) Limiting a floating point number to lie in the range 0.0 to 1.0 inclusive (traditional method): if (f < 0.0) f = 0.0; else if (f > 1.0) f = 1.0;

15 GDC March 1999Scalability - R Huddy Saturation Arithmetic (Pentium) if (*(long *)&f < 0) *(long *)&f = 0; else if (*(long *)&f > 0x3f800000) *(long *)&f = 0x3f800000; This is faster on a Pentium class processor since the FPU is “non-optimal” (i.e. slow) and the integer unit is much faster.

16 GDC March 1999Scalability - R Huddy Saturation Arithmetic (Pentium II) Use the “cmov” instructions: cmp[f],0 cmovb[f],0 cmp[f],3f800000 cmova[f],3f800000 Faster since unpredictable branches are the bottleneck here. Unavailable on a Pentium.

17 GDC March 1999Scalability - R Huddy Unlock the Frame Rate It’s essential that your physics model can run at high refresh rates. –At least 100fps 30 or 60 fps limits are not acceptable and lead to flat performance on high end hardware

18 GDC March 1999Scalability - R Huddy The Value of Batching Case Specifics: The average # of ‘Polys Per Call’ (PPC) to DrawPrimitive was 2.6, producing 40fps Removing state changes to raise the average PPC to ~50 produced 58fps –Most of the removed state changes were “reasonable”, i.e. not logically redundant –The changes did not reduce visual quality at all –PPC of 200 is optimal

19 GDC March 1999Scalability - R Huddy Alpha Sort Issues The “standard” solution is… 1) Draw all non-alpha polys (sort by texture) 2) Draw all alpha polys in back to front order with Z compare enabled and Z update disabled. This copes with overlapping alpha polys but you can’t sort by texture. (Intersection requires decimation).

20 GDC March 1999Scalability - R Huddy Alpha Sort with Bounding Boxes When you are ready to draw your alpha polys then draw non-overlapping sets using the sort-by-texture technique as before A B C Viewport Here, you can safely draw all of A before any of B or C… B&C need sorting

21 GDC March 1999Scalability - R Huddy Geometry - Part 1 Use the DX6 Transform and Clip engine - it’ll be nearly as fast as your best efforts It takes advantage of CPU specific optimisations done by Intel, AMD etc. It uses the guard band clipping region to enhance performance Use the DX7 interface ASAP

22 GDC March 1999Scalability - R Huddy Geometry - Part 2 This gets you ready for hardware which can do the job much faster than the CPU Tell the chip designers if you need anything non-standard If you think DX is too slow then use a run- time benchmark to select between DX and your own code

23 GDC March 1999Scalability - R Huddy DIPVB() Geometry - Part 3 Use the DX pipeline for geometry which may be rendered Use your own transform for bounding boxes, collisions, portals etc Treat hardware T&L as –Write only –Not necessarily pixel identical to CPU T&L

24 GDC March 1999Scalability - R Huddy Geometry - Part 4 Consider choosing between models at game start-up time More complex Geometry should be several times more complex Introduce some LOD management Your artists are probably generating more complex models and then throwing them away

25 GDC March 1999Scalability - R Huddy Lighting - Part 1 If the DX Lighting model is good enough then there are people who want to help you Multi-texture shadow maps and light maps can be very fast now –remember that (multi-pass != multi-texture) Tell the chip companies what you need

26 GDC March 1999Scalability - R Huddy Lighting - Part 2 Support more lights User a richer set of light types Scale with available power If you have more complex geometry you get better lighting quality

27 GDC March 1999Scalability - R Huddy Summary Use the D3D pipeline as much as possible ‘Use’ the CPU carefully- ‘Abuse’ the fill rate Get on board with DX7 Offer the richest experience possible You may have to treat the PC as two distinct platforms, ‘High-end’ and ‘Low-end’

28 GDC March 1999Scalability - R Huddy Questions ? Richard Huddy RichardH@nvidia.com www.nvidia.com ? ? ? ? ? ?


Download ppt "GDC March 1999Scalability - R Huddy Scalability Advanced D3D Programming Richard Huddy"

Similar presentations


Ads by Google