Best Practices for Multi-threading Eric Young Developer Technology.

Slides:



Advertisements
Similar presentations
Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),
Advertisements

DirectX11 Performance Reloaded
Vertex Buffer Objects, Vertex Array Objects, Pixel Buffer Objects.
Chapter 5 Threads os5.
Day 10 Threads. Threads and Processes  Process is seen as two entities Unit of resource allocation (process or task) Unit of dispatch or scheduling (thread.
Chapter 5 Processes and Threads Copyright © 2008.
Concurrency. What is Concurrency Ability to execute two operations at the same time Physical concurrency –multiple processors on the same machine –distributing.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
1: Operating Systems Overview
CS533 Concepts of Operating Systems Class 6 The Duality of Threads and Events.
CS533 Concepts of Operating Systems Class 2 The Duality of Threads and Events.
CS533 Concepts of Operating Systems Class 3 Integrated Task and Stack Management.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
1 MATERI PENDUKUNG SINKRONISASI Matakuliah: M0074/PROGRAMMING II Tahun: 2005 Versi: 1/0.
1 Chapter 4 Threads Threads: Resource ownership and execution.
Process Concept An operating system executes a variety of programs
Dreams in a Nutshell Steven Sommer Microsoft Research Institute Department of Computing Macquarie University.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
Kenneth Hurley Sr. Software Engineer
Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Next-Generation Graphics APIs: Similarities and Differences Tim Foley NVIDIA Corporation
OpenGL Performance John Spitzer. 2 OpenGL Performance John Spitzer Manager, OpenGL Applications Engineering
OpenGL Buffer Transfers Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Threads G.Anuradha (Reference : William Stallings)
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
12/7/ Performance Optimizations for running NIM on GPUs Jacques Middlecoff NOAA/OAR/ESRL/GSD/AB Mark Govett, Tom Henderson.
Multithreaded Programing. Outline Overview of threads Threads Multithreaded Models  Many-to-One  One-to-One  Many-to-Many Thread Libraries  Pthread.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
GAM666 – Introduction To Game Programming ● Programmer's perspective of Game Industry ● Introduction to Windows Programming ● 2D animation using DirectX.
CSE 153 Design of Operating Systems Winter 2015 Midterm Review.
Threads A thread is an alternative model of program execution
VAR/Fence: Using NV_vertex_array_range and NV_fence Cass Everitt.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Tuning Threaded Code with Intel® Parallel Amplifier.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
1.1 Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 1: Introduction What Operating Systems Do √ Computer-System Organization.
Threads prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University 1July 2016Processes.
Tutorial 2: Homework 1 and Project 1
2D Graphics Optimizations
CUDA C/C++ Basics Part 2 - Blocks and Threads
Multi-GPU Programming
Processes and threads.
EMERALDS Landon Cox March 22, 2017.
CS427 Multicore Architecture and Parallel Computing
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Chapter 4: Multithreaded Programming
CSE451 I/O Systems and the Full I/O Path Autumn 2002
Lecture 21 Concurrency Introduction
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Threads and Data Sharing
Mid Term review CSC345.
Fast Communication and User Level Parallelism
Threads Chapter 4.
Concurrency: Mutual Exclusion and Process Synchronization
CS510 - Portland State University
CUDA Execution Model – III Streams and Events
UMBC Graphics for Games
Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment. Phone:
Operating System Overview
CSE 153 Design of Operating Systems Winter 2019
Ch 3.
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Best Practices for Multi-threading Eric Young Developer Technology

Copyright © NVIDIA Corporation 2004 Overview Benefits of Dual-Core NVIDIA Multi-threaded Driver OpenGL Direct3D Dual-Core Performance Gains Summary

Copyright © NVIDIA Corporation 2004 Benefits of Dual-Core NVIDIA multi-threaded driver Producer & Consumer threads dispatch commands to GPU Improves parallelism between GPU/CPU Performance gains from 10 to 40% Application has more computation power available! Handles even more Complex Math, Physics, and AI Add Dual-Core optimizations to your application, and you can achieve up to 2X the performance Best combination: Application has Dual-Core optimizations Running with the multi-threaded driver!

Copyright © NVIDIA Corporation 2004 OpenGL on Dual-Core Both OpenGL runtime and driver can be completely offloaded Application gains 20 to 40% performance Driver can dynamically switch between multi-thread and single-thread mode

Copyright © NVIDIA Corporation 2004 OpenGL Performance Tips Immediate mode for small amounts of vertex data glVertex3f(x, y, z) VBO(s) & Display Lists for large amounts of Vertex Data Avoid passing Vertex Pointers Driver will need to copy extra data Use Buffer Objects instead for better performance When using VBOs, use element array buffers glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, index_buffer);

Copyright © NVIDIA Corporation 2004 OpenGL Performance Tips (contd.) Avoid glGetXXX calls or Driver Queries CPU Thread synchronization needed Release builds should not have many glGetError() calls Do not lock application threads to single CPU OS can do a better job scheduling CPU resources Avoid using Vertex Arrays without Buffer Objects Requires a data copy from pointer for every call Driver will have to wait until it is finished using data

Copyright © NVIDIA Corporation 2004 Direct3D on Dual-Core Also uses Producer & Consumer threads Only driver can be offloaded onto the second CPU Note: Direct3D runtime runs in the application thread 10 to 30% performance gains with a multi-threaded NVIDIA driver

Copyright © NVIDIA Corporation 2004 Direct3D Performance Tips Pack your DrawPrimitive2 calls together Frequently creating & destroying shaders, VB, IB, and surfaces will impact performance Avoid allocating too many system memory resources Minimize use of any locks/unlocks Avoid any calls that return GPU state information Requires a CPU thread synchronization Driver Queries are OK (calls are asynchronous) Do not lock threads to a specific CPU!

Copyright © NVIDIA Corporation 2004 Total Gains with Dual-Core

Copyright © NVIDIA Corporation 2004 Summary NVIDIA multi-threaded drivers available now Download drivers from On a dual-core system 10 to 40% improvement Avoid any API calls that return driver state Do not lock application threads to a specific CPU Applications optimized for dual-core will get even better performance We will continue to improve dual-core driver performance

Copyright © NVIDIA Corporation 2004