Tesla Architecture NVIDIA's entry into high-performance computing and general- purpose GPU (GPGPU) computing Presented by : M2 HPC KHENE Soraya 2024/2025.

Tesla Architecture NVIDIA's entry into high-performance computing and general- purpose GPU (GPGPU) computing Presented by : M2 HPC KHENE Soraya 2024/2025 ZEMOUCHI Nafila Raihana

Introduction 01 0203 0405 Table of contents DefinitionsArchitecture Architecture improvements GPU Series implementing Tesla 06 Conclusion

Introduction 01

Introduction Pre-Tesla Era: ● GPUs were fixed-function, focused on graphics rendering, and lacked programmability. ● CURIE Architecture: Predecessor of Tesla, primarily for gaming and graphics. Tesla Architecture (2006): ● Shifted GPUs to general-purpose computing (GPGPU). ● Named after Nikola Tesla, a pioneering electrical engineer. ● Introduced CUDA for parallel processing. Applications: Designed for scientific research, machine learning, and HPC. Impact: Opened new possibilities for solving complex computational problems.

Definitions 02

Shader Definitions A program running on the GPU to perform graphics-related tasks. Vertex Shader ● Processes each vertex in shapes. ● Transforms 3D positions into 2D screen coordinates.

Shader Definitions A program running on the GPU to perform graphics-related tasks. Pixel Shader ● Processes each fragment (pixel- candidate). ● Takes input from the vertex shader and outputs the fragment’s color.

Definitions Graphics pipeline

Definitions OpenGL ● Cross-platform, open- standard API for rendering 2D and 3D graphics. ● Widely used in gaming, simulations, and CAD software. ● First release on june 30, 1992 DirectX ● Microsoft’s proprietary API for Windows and Xbox platforms. ● Includes tools for 3D graphics, multimedia, and gaming. ● First release on september 30, 1995

Architecture 03

Early 90 ( Pre GPUs )

● In the first phase, from the early ‘80s to late' 90s, graphics where entirely on CPUs Doom 1, 1993Wolfenstein 3D, 1992 ● Interactive software rendering ( NO GPUs yet )

First era - Fixed function pipeline

GPU Origins ● Term introduced by NVIDIA in 1999 with GeForce 256 (Celsius architecture). ● Integrated transform, lighting, triangle manipulation, and rendering. ● Processes 10M polygons/second. ● Supports OpenGL and Direct3D APIs.

First era - Fixed function pipeline Caracteristics ● Predefined Tasks: Graphics pipeline hardwired for specific operations. ● Limited Flexibility: Developers could adjust parameters but not algorithms. ● Use of OpenGL/Direct3D: Could configure effects like lighting and textures, but within hardware limits. ● Triangle-Based Rendering: Objects drawn as triangles; smaller triangles improve image quality.

First era - Fixed function pipeline A fixed-function NVIDIA GeForce graphics pipeline.

First era - Fixed function pipeline A fixed-function NVIDIA GeForce graphics pipeline. Host Interface: ● Receives commands and data from the CPU via APIs (e.g., DirectX, OpenGL). ● Coordinates, textures, colors..

First era - Fixed function pipeline A fixed-function NVIDIA GeForce graphics pipeline. Vertex Control Stage: ● Receives and formats triangle data for GPU processing. ● Prepares data for the next stages.

First era - Fixed function pipeline A fixed-function NVIDIA GeForce graphics pipeline. Vertex Shading, Transform, and Lightning (VS/T&L): ● Transforms vertices and assigns per-vertex values (e.g., colors). ● Lighting is computed to simulate the interaction of light with objects. ● Uses computationally intensive mathematical operations.

First era - Fixed function pipeline Summarize The graphics card pipeline with the three most important stages: the vertex shader, the geometry shader and the rasterizer/pixel shade

First era - Fixed function pipeline Summarize We do the pipeline for millions of triangles

Second era - Programmable pipeline (shaders programs) : 2000-2003

Second era Programmable pipeline (shaders programs) : 2000-2003 GeForce 3 (2001) - Kelvin Architecture ● Key Feature: Introduced programmable shaders, allowing programmers to customize parts of the graphics pipeline. ● Programmable Pipeline: Enabled sending vertex programs (shaders) that process data within the pipeline. ● Specs:  4 Pixel Shaders  1 Vertex Shader  8 Texture Mapping Units (TMUs)  4 Raster Operations Units (ROPs)

Second era Programmable pipeline (shaders programs) : 2000-2003 fixed-function pipeline Programmable pipeline An example of separate vertex processor and fragment processor in a programmable graphics pipeline Pipeline view

Second era Programmable pipeline (shaders programs) : 2000-2003 Architecture view : GeForce 6800 (GeForce 6 Series) - NV40, Curie Architecture Key Feature: Significant advancement in graphics capabilities with increased programmability. Specs: ● 6 Vertex Shaders ● 16 Pixel Shaders ● 16 Texture Mapping Units ● 16 Raster Operations Units

Second era Programmable pipeline (shaders programs) : 2000-2003 Problem ● Frames with many edges (vertices) require more vertex shaders ● Frames with many primitives require more pixel shaders ⇒ Unequal task distribution leads to inefficient hardware usage!

Second era Programmable pipeline (shaders programs) : 2000-2003 Solution Unified Shader Architecture ● Geometry, Pixel, and Vertex shaders run on the same core, enabling flexibility.

Second era Programmable pipeline (shaders programs) : 2000-2003 Solution Benefits: ● Eliminates idle shader cores. ● Dynamic allocation of shader tasks ensures better GPU utilization. ● Programmer control. Efficiency Performance

Third era - Fully programmable GPU : The birth of Tesla

Third era Fully programmable GPU : The birth of Tesla ● Launched in November 2006 with the GeForce 8800 GPU. ● Non-Traditional Pipeline: Breaks from the traditional graphics pipeline structure. ● Unified Processing: Combines vertex and pixel processors for high-performance parallel computing (GPGPU). ● CUDA Support: Introduced Compute Unified Device Architecture (CUDA), enabling programming in C. ● Scalable Design:  Processor Array: A scalable Streaming Processor Array (SPA) handles all programmable calculations.  Memory System: Includes external DRAM control and ROPs for efficient frame buffer operations (color and depth).

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU 128 streaming-processor (SP) cores organized as 16 streaming multiprocessors (SMs) in 8 independent processing units called texture/processor clusters (TPCs).

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Host Interface: ● Communicates with the CPU. ● Responds to commands from the CPU and fetches data from system memory. ● Checks command consistency and performs context switching (between kernels).

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Work Distribution Unit: ● Forwards kernel parameters to the processor array. ● grid size/ block size ● Prepare TPCs Work Distribution Mechanisms: ● Vertex/Compute Work: Delivered in a round-robin scheme. ● Pixel Work: Distributed based on pixel location.

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Streaming Processor Array (SPA): ● A set of Threads processing clusters (TPCs) Processing Clusters (TPCs):  1 TPC in small GPUs.  8+ TPCs in high-performance GPUs.

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Texture Processing Cluster (TPC): ● TPCs receive work from different work distribution

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Texture Processing Cluster (TPC): The SM Controller (SMC) : ● Manages and coordinates work across multiple Streaming Multiprocessors (SMs) in the GPU. ● Controls access to shared resources like the texture unit ● It groups 32 computing threads into a warp. ● The SMC can serve three graphics workloads at the same time: vertex processing, geometry processing, and pixel processing. ● Each type of input data (vertex, geometry, pixel) has its own independent I/O path, but the SMC is in charge of load balancing and resource allocation between them.

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Texture Processing Cluster (TPC): Texture unit: ● Executes 1 group of 4 threads (vertex, geometry, pixel, or compute) per cycle. ● Inputs: Texture coordinates. ● Outputs: Filtered samples (e.g., RGBA color). ● 4 Address Generators & 8 Filter Units. ● Streams hits and misses without stalling.

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Two Texture Processing Cluster (TPC): Two Streaming Multiprocessors (SMs): A unified multiprocessor that handles both graphics and computing tasks. Executes: ● Vertex, geometry, and pixel-fragment shader programs. ● Parallel computing programs.

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Streaming Multiprocessor (SM): ● 8 cores per SM. ● 2 SFUs: Handle transcendental functions and attribute interpolation, Fully pipelined; interpolate 4 samples/cycle. ● Multithreaded instruction fetch/issue unit (MT Issue). ● Instruction cache. ● Read-only constant cache. ● 16 KB read/write memory for input buffers or shared parallel data. ● Low-latency interconnect for efficient SP-shared memory communication.

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Streaming Multiprocessor (SM) simplified:

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Streaming Multiprocessor (SM): Streaming Processor(SP): the primary thread processor in the SM

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Streaming Multiprocessor (SM): SIMT Architecture Overview ● SIMT Multithreading:  Executes threads in groups of 32 (warps).  24 warps per SM, totaling 768 threads.  Each warp executes independently with possible branching. ● Thread Execution:  Threads in a warp start together but may branch independently.  SIMT instruction broadcast to parallel threads, inactive threads disabled during branch divergence. ● Efficient Execution:  Full efficiency when all threads in a warp take the same path. ○ Branch synchronization stack manages divergence and convergence.

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Streaming Multiprocessor (SM): SIMT Warp Scheduling ● Warp Types: Vertex, geometry, pixel, or compute threads. ● Scheduler: Operates at half of the 1.5 GHz processor clock rate. ● Concurrent Execution: Supports executing different warp types simultaneously (e.g., vertex and pixel).

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Streaming Multiprocessor (SM): SIMT Instruction Set ● Scalar Instructions: Simpler, compiler-friendly (floating-point, integer, bit, etc.). ● Vector Instructions: Texture instructions remain vector-based. ● Transcendental Functions: Cosine, sine, exponential, logarithm, etc.

Third era Fully programmable GPU : The birth of Tesla Architecture view: Memory SP level Memory: ○ local memory: to store temporary results it’s 32x32 register file

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU SM level Memory: ○ instruction cache ○ constant cache ○ shared memory

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Cluster level Memory: ○ texture memory ○ L2 texture cache ○ L2 constant cache

Third era Fully programmable GPU : The birth of Tesla Architecture view Geforce 8800 GPU Memory: Memory Synchronization and Coalescing ● Barrier Synchronization:  Fast instruction synchronizes threads within the SM using shared and global memory. __syncwarp(); ● Memory Coalescing: ○ Addresses that are aligned and within the same memory block are merged to be one memory access. ○ Significantly boosts memory bandwidth and reduces overhead.

Architecture improvements

GPU Series implementing Tesla GPUs for consumer graphics

CERIST’s Custer IbnBadis has a visualisation node that has Quadro4000 GPU GPU Series implementing Tesla GPUs for professional graphics

Not to confuse the Tesla architecture with the Tesla series, not all Tesla GPU’s implement the Tesla architecture. Tesla V100 uses the Volta architecture GPU Series implementing Tesla GPUs for High Performance Computnig

Conclusion Thank You!

Tesla Architecture NVIDIA's entry into high-performance computing and general- purpose GPU (GPGPU) computing Presented by : M2 HPC KHENE Soraya 2024/2025.

Similar presentations

Presentation on theme: "Tesla Architecture NVIDIA's entry into high-performance computing and general- purpose GPU (GPGPU) computing Presented by : M2 HPC KHENE Soraya 2024/2025."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tesla Architecture NVIDIA's entry into high-performance computing and general- purpose GPU (GPGPU) computing Presented by : M2 HPC KHENE Soraya 2024/2025.

Similar presentations

Presentation on theme: "Tesla Architecture NVIDIA's entry into high-performance computing and general- purpose GPU (GPGPU) computing Presented by : M2 HPC KHENE Soraya 2024/2025."— Presentation transcript:

Similar presentations

About project

Feedback