CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi.

CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi

Adrianne Demo Skin shader Skin shader 1,400 instructions per pixel 1,400 instructions per pixel 15 render passes 15 render passes Five bump maps Five bump maps Physically-based lighting with sub-surface scattering Physically-based lighting with sub-surface scattering Three skin layers with different scattering properties. Three skin layers with different scattering properties. Complex anisotropic hair shader Complex anisotropic hair shader Real geometry Real geometry GPU-accelerated character skinning GPU-accelerated character skinning Blendshapes Blendshapes Sculpt deformers Sculpt deformers Skeletal-driven bump maps Skeletal-driven bump maps

Graphics Pipeline Geometry Transform Framebuffer Clip Setup Rasterize Z-test Light Texture Blend Picture

Graphics Pipeline Vertex Connectivity Textures Vertex Shader Framebuffer Clip & Setup Rasterize Fragment Shader Raster OPs Primitive Assembly Texture Blend Picture

Bottlenecks Too many operations Too many operations Parallelize Parallelize Too many memory accesses Too many memory accesses Parallelize Parallelize SCREEN TILE XBAR SCREEN TILE SCREEN TILE GEOMETRY OPERATIONS FRAGMENT OPERATIONS

Parallelization Distribute computation to processors Distribute computation to processors Work allocation Work allocation Distribute texture to memory banks Distribute texture to memory banks Tile Screen-pixels into memory banks Tile Screen-pixels into memory banks Do all processors have access to all memory Do all processors have access to all memory Distribute access/Replicate data Distribute access/Replicate data

Sorting Taxonomy Sort first Sort first Allocate to processor, which is responsible for only a given area of the screen Allocate to processor, which is responsible for only a given area of the screen Sort middle Sort middle Optimally perform geometry ops and then distribute to the responsible processor Optimally perform geometry ops and then distribute to the responsible processor Sort last Sort last No-screen subdivision. No-screen subdivision. Optimally perform geometry and fragment ops and then compose results Optimally perform geometry and fragment ops and then compose results

Memory Considerations Highly pipelined Highly pipelined Guard against stalls Guard against stalls Memory bandwidth Memory bandwidth How many accesses per second? How many accesses per second? Latency Latency Latency hiding buffers Latency hiding buffers Larger memory atoms Larger memory atoms e.g., 32 byte atoms e.g., 32 byte atoms

Graphics Architecture: A Brief History Evans & Sutherland Evans & Sutherland Ikonas Ikonas UNC Chapel Hill UNC Chapel Hill Silicon Graphics Silicon Graphics (Mushroom: Smart VGA controllers) nVIDIA, AMD nVIDIA, AMD

IKONAS 32 bit data, 24 bit address bus backbone 32 bit data, 24 bit address bus backbone Everything memory mapped Everything memory mapped Host interface = address registers to access anything on the bus. Host interface = address registers to access anything on the bus. Frame buffer resolution and timing could be set via control registers. Frame buffer resolution and timing could be set via control registers. Graphics processor Graphics processor (micro)Programmable (micro)Programmable 32 bit integer ALU and 16x16 bit integer multiplier 32 bit integer ALU and 16x16 bit integer multiplier Address counters, Loop counters and Address counters, Loop counters and 64 bit instruction word. 64 bit instruction word. Plug-in boards Plug-in boards 16 bit graphics processor with 16 pixel-at-once parallel write 16 bit graphics processor with 16 pixel-at-once parallel write microprogrammed 16x16 bit matrix multiplier microprogrammed 16x16 bit matrix multiplier microprogrammed floating point matrix multiplier microprogrammed floating point matrix multiplier hardware Z-buffer hardware Z-buffer real-time alpha-blend hardware for two RGB images real-time alpha-blend hardware for two RGB images real-time RGB video frame grabber real-time RGB video frame grabber

IKONAS 1981

Pixel-planes 5 1989 Upto 32 GPs, i860, and upto 8 Renderers 2 GPs per board 1 128x128 array per board

Pixel-planes 5 Renderer 1 board had 64 mini-chips: Each with 2 columns of 128 pixel processors (w/memory)

Renderer 64 chips of 256 pixel processing elements (PE Each PE has 208 bits of memory, the chip contains a Quadratic expression evaluator (QEE) Ax+By+C+Dx 2 +Exy+Fy 2 simultaneously at each pixel

Basic Algorithm Host app transmits model database and new frame requests to MGP Screen divided statically into bins of 128x128 pixels MGP allocates Renderers to screen regions MGP broadcasts database commands to all GPs. GPs generate Renderer commands for each prim Commands inserted into appropriate bins GPs send the bins Round-robin The Renderers send computed pixels to the frame buffer.

SGI RealityEngine Kurt Akely 1993: Kurt Akely 1993: The implementation is near-massively parallel, employing 353 independent processors in its fullest configuration, resulting in a measured fill rate of over 240 million antialiased, texture mapped pixels per second. Rendering performance exceeds 1 million antialiased, texture mapped triangles per second.

RealityEngine Architecture Input FIFO, Command Processor 6, 8, or 12 Geom Engines 1, 2, or 4 raster boards 5 Fragment Generators (Each has texture replica) 80 Image Engines 1280x1024 Framebuffer 256 bits/pixel

RealityEngine Algorithm FIFO geometry distributed by CP to GEs FIFO geometry distributed by CP to GEs GEs do geometry ops including setup GEs do geometry ops including setup GEs broadcast triangles to FG (Raster) GEs broadcast triangles to FG (Raster) Finely interleaved pixel assignment Finely interleaved pixel assignment FG distribute fragments to IE FG distribute fragments to IE IEs do raster ops IEs do raster ops IEs are the framebuffer IEs are the framebuffer

RealityEngine GE FG IE

PC Architecture North Bridge South Bridge CPU PCI BUS FSB MEM BUS ATA BUS PCI Express (Upto 2.5Gbps bi- directional per lane)

nVIDIA 8800 Process90nm Die Size 484mm² (681 million Transistors) 21.5mm x 22.5mm Chip PackageFlipchip Basic Pipeline Config 32 / 24 / 192 Textures / Pixels / Z Memory Config 384-bit 6x 64-bit (GDDR – GDDR4) System InterconnectPCI Express x16 FSAA Multisampling, Supersampling, Coverage samp., Transparency 2x1/2x2/4x2 (On a 16x16 grid)

Texture Textures Per Pass128 Texture Filtering Methods Bilinear, Trilinear, 2-16x Anisotropic Texture CompressionDXTC 1-5, 3Dc+ Fragment Processors128x FP32 scalar MADD+MUL Further Details API ComplianceDX10.0 System InterconnectPCI Express x16 Display PipelineNone (NVIO) Multi GPU SupportMulti cascade

CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi.

Similar presentations

Presentation on theme: "CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi.

Similar presentations

Presentation on theme: "CSL 859: Advanced Computer Graphics Dept of Computer Sc. & Engg. IIT Delhi."— Presentation transcript:

Similar presentations

About project

Feedback