From Turing Machine to Global Illumination

From Turing Machine to Global Illumination

Outline My first computer (CASIO fx3600)
Turning machine and von Neumann architecture GPU pipeline Local and global illumination Shadow and reflection through texture Programmable GPUs

Calculator vs. Computer
What is the difference between a calculator and a computer? Doesn’t a compute-r just “compute”? The Casio fx3600p calculated can be programmed (38 steps allowed).

Turing Machine Can be adapted to simulates the logic of any computer that could possibly be constructed. von Neumann architecture implements a universal Turing machine. Look them up at Wikipedia!

Simplified View The Data Flow:
3D Polygons (+Colors, Lights, Normals, Texture Coordinates…etc.) 2D Polygons 2D Pixels (I.e., Output Images) Transform (& Lighting) Rasterization

Global Effects shadow multiple reflection translucent surface

Local vs. Global

How Does GPU Draw This?

Quiz Q1: A straightforward GPU pipeline give us local illumination only. Why? Q2: What typical effects are missing? Hint: How is an object drawn? Do they consider the relationship with other objects? Shadow, reflection, and refraction…

Wait but I’ve seen shadow and reflection in games before…
With Shadows Without Shadows

Adding “Memory” to the GPU Computation
Modern GPUs allow: The usage of multiple textures. Rendering algorithms that use multiple passes. Transform (& Lighting) Rasterization Textures

Faked Global Illumination
Shadow, Reflection, BRDF…etc. In theory, real global illumination is not possible in current graphics pipeline: Conceptually a loop of individual polygons. No interaction between polygons. Can this be changed by multi-pass rendering?

Case Study: Shadow Map Using two textures: color and depth
Relatively straightforward design using pixel (fragment) shaders on GPUs.

Eye’s View Light’s View Depth/Shadow Map Image Source: Cass Everitt et al., “Hardware Shadow Mapping” NVIDIA SDK White Paper

Basic Steps of Shadow Maps
Render the scene from the light’s point of view, Use the light’s depth buffer as a texture (shadow map), Projectively texture the shadow map onto the scene, Use “texture color” (comparison result) in fragment shading.

PC Graphics Architecture
Two buses on PC: System Bus (CPU-Memory) and Peripheral I/O Bus. Before AGP: narrow path (I/O Bus) between main memory and graphics memory (for frame buffer, Z buffer, texture, vertex data…etc.) AGP and PCI-e speed up the link between host PC and graphics processor (GPU)

Source: http://www.karbosguide.com/hardware/module2d03a.htm

NVIDIA Geforce 6800

NVIDIA Geforce 8800

NVIDIA Fermi (Geforce 400 and 500 Series)
From NVIDIA Fermi Architecture Whitepaper

How to Program a GPU? Writing a 3D graphics application program
Typically in DirectX or OpenGL Still CPU programming in C/C++ The APIs and drivers do the dirty work for you. Writing GPU shaders Typically in GLSL or Cg Still drawing 3D objects Working like plug-in’s to the 3D rendering pipeline

GPGPU General-purpose GPU computing
No longer restricted to graphics applications. To utilize the abundant “GFLOPs” in GPU. Could be implemented in GPU shaders By clever transformation of problem domains. Textures to store the data structures However, shaders could not perform memory writes with calculated addresses (a.k.a. scatter operations)

GPU as a Parallel Computing Platform
Treating GPUs as parallel machinery Not quite the same as shared-memory multi-processor. A special kind of memory hierarchy. NVIDIA CUDA Widely adopted in real-world applications OpenCL For non-NV GPUs and multi-core CPUs

Branch Divergence on GPU
Warp … if x1 – x0 > y1 – y0: xMajorIteration() else: yMajorIteration() GPUs group pixels into “warps” or “wavefronts” of adjacent pixels, each of which runs on a single core of a superscalar processor. Here we assume the ray is in the first quadrant of the plane. Green threads represent rays where the slope of the ray in screen space is less than one, and blue thread have slope more than 1 At a statement where pixels within a group branch in different directions, the GPU must compute both sides of the branch, issuing no operation to a portion of the vector lanes for each branch and this is the case even if only a single thread in a warp takes a particular branch. So, while branch instructions themselves are often relatively expensive compared to arithmetic instructions, their most important cost is that divergent execution will reduce throughput by one half for each nested branch. Note that this code only handles the first quadrant; we need 2 more nested branches to handle all four quadrants in naïve implementation More branches for handling clipping to viewport Total of 8x reduction in performance for handling all cases, plus overhead for clipping branches against the viewport. We rederive the algorithm to make it branchless except for the actual hit detection. … ½ performance for each branch!

Examples

GPU Shading Effects Reflection and refraction Relief on surface
Ambient occlusion and lighting

Real-Time Rendering of Splashing Water
Particle system simulation for real-time interaction with terrains and dynamic objects. Reconstruction of the splash surface with 2D metaballs

Ray Tracing on GPU Using OpenCL or NVIDIA CUDA Or use NVIDIA OptiX

From Turing Machine to Global Illumination

Similar presentations

Presentation on theme: "From Turing Machine to Global Illumination"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

From Turing Machine to Global Illumination

Similar presentations

Presentation on theme: "From Turing Machine to Global Illumination"— Presentation transcript:

Similar presentations

About project

Feedback