Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-threading basics

Similar presentations


Presentation on theme: "Multi-threading basics"— Presentation transcript:

1 Multi-threading basics
Main process forks additional processing threads Takes advantage of multiple processors, or CPU dead times while waiting for data Synchronization: When one thread needs the results of processing happening in another thread, (i.e. one thread will wait) Locks: multiple threads might need to access the same data. They have to lock it/manipulate it/unlock it (as quick as possible)

2 Why multi-threading/multi-core?
4/23/2017 2:45 AM Why multi-threading/multi-core? Clock rates are stagnant Future CPUs will be predominantly multi-thread/multi-core Xbox 360 has 3 cores PS3 has a stream architecture with eight cores Almost all new PC’s are dual or quad core. Two performance possibilities: Single-threaded? Minimal performance growth Multi-threaded? Exponential performance growth This topic is important because the free ride is over... Per hardware thread performance is stagnant, but processor improvement continues <SWIPE> >70% figure applies to servers (>85%), desktop, and laptops—everything Celeron dual-core! Moore's law lives, but since we can't increase single-proc clocks our transistor counts much more... More performance requires multi-threading Multi-core penetration figures: ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

3 Design for Multithreading
4/23/2017 2:45 AM Design for Multithreading Good design is critical Bad multithreading can be worse than no multithreading Deadlocks, synchronization bugs, poor performance, etc. Comments can help alot! Why this talk? Multi-threading is hard—to get benefit you need to plan for it, and you will hit subtle bugs. Effective multi-threading can be really hard. You may hit problems where threading is actually hurting performance. Done properly—huge benefits Good multi-threading always starts with good design. ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

4 Bad Multithreading Thread 1 Thread 2 Thread 3 Thread 4 Thread 5
4/23/2017 2:45 AM Bad Multithreading Thread 1 Thread 2 Thread 3 Haphazard design Start with one thread, it spawns a couple more Then they spawn a couple more Then you start adding communication between threads And more communication between threads And still more communication between threads Then you add synchronization points, where threads need data from other threads or shared resources End result: a lot of your threads spend a lot of time waiting, you need a lot of synchronization objects, you’re prone to resource contention and synch bugs Thread 4 Thread 5 ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

5 Good Multithreading Game Thread Rendering Thread Rendering Thread
Physics Game Thread Rendering Thread Rendering Thread Rendering Thread Rendering Thread Game Thread Main Thread Particle Systems Start with main thread, look for major tasks Split out into Game/Rendering Add synch points… other than at those points, both threads can run independently Look for additional parallelizable tasks… physics might be a good candidate Synch points before and after Break out other parallelizable tasks Look for tasks that can run independently of main threads… service requests Add communication but keep it to a minimum Animation/ Skinning Networking File I/O ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

6 Another Paradigm: Cascades
4/23/2017 2:45 AM Another Paradigm: Cascades Frame 1 Frame 3 Frame 2 Frame 4 Thread 1 Input Physics Thread 2 AI Thread 3 Rendering Thread 4 Also, each chunk size must be same as largest. Probably not well-suited for games Can work if you have very few stages. At 30Hz, intolerable latency Thread 5 Present Advantages: Synchronization points are few and well-defined Disadvantages: Increases latency (for constant frame rate) Needs simple (one-way) data flow ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

7 Available Synchronization Objects
Events Semaphores Mutexes Critical Sections Don't use SuspendThread() Some title have used this for synchronization Can easily lead to deadlocks Interacts badly with Visual Studio debugger

8 Exclusive Access: Mutex
4/23/2017 2:45 AM Exclusive Access: Mutex // Initialize HANDLE mutex = CreateMutex(0, FALSE, 0); // Use void ManipulateSharedData() { WaitForSingleObject(mutex, INFINITE); // Manipulate stuff... ReleaseMutex(mutex); } // Destroy CloseHandle(mutex); This guarantees that ManipulateSharedData() is only executed by one thread at a time. But, mutexes are not the cheapest option... ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

9 Exclusive Access: CRITICAL_SECTION
4/23/2017 2:45 AM Exclusive Access: CRITICAL_SECTION // Initialize CRITICAL_SECTION cs; InitializeCriticalSection(&cs); // Use void ManipulateSharedData() { EnterCriticalSection(&cs); // Manipulate stuff... LeaveCriticalSection(&cs); } // Destroy DeleteCriticalSection(&cs); Critical sections are much cheaper. On Xbox 360 and on Windows they run roughly 20x faster. Two restrictions: cannot be used between processes, and cannot be used with WaitForMultipleObjects Mutexes are kernel objects, so they require a kernel transition, whereas critical sections are user-space objects. Mutexes are more robust in the face of thread death. CRITICAL_SECTION is a good optimization but... key optimization is don't synchronize too often ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

10 4/23/2017 2:45 AM How Many Threads? No more than one CPU intensive software thread per core 3-6 on Xbox 360 1-? on PC (1-4 for now, need to query) Too many busy threads adds complexity, and lowers performance Context switches are not free Can have many non-CPU intensive threads I/O threads that block, or intermittent tasks It is reasonable to have additional threads that are not CPU intensive—blocking on I/O Seque: One per hardware thread, or one per core ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

11 Typical Threaded Tasks
4/23/2017 2:45 AM Typical Threaded Tasks File Decompression Rendering Graphics Fluff Physics Update loop should generally be single threaded. May be able to pull out some parts, like path-finding, but synchronization concerns limit your options. ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

12 File Decompression Most common CPU heavy thread on the Xbox 360
4/23/2017 2:45 AM File Decompression Most common CPU heavy thread on the Xbox 360 Easy to multithread Allows use of aggressive compression to improve load times Don’t throw a thread at a problem better solved by offline processing Texture compression, file packing, etc. File I/O is something that is often put on a separate thread. This can avoid stalls that asynchronous I/O can't always hide. Normally file I/O is not CPU heavy. That can change now. File read/write is cheap, but spare threads allows use of aggressive compression ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

13 Rendering Separate update and render threads
4/23/2017 2:45 AM Rendering Separate update and render threads Rendering on multiple threads (D3DCREATE_MULTITHREADED) works poorly Exception: Xbox 360 command buffers Special case of cascades paradigm Pass render state from update to render With constant workload gives same latency, better frame rate With increased workload gives same frame rate, worse latency Rendering is usually quite expensive. D3D overhead adds up, and scene traversal costs also Limited number of primitives per second (On Modern Windows machines, we recommend expecting about 300 draws per frame for 60 FPS) Simple in theory: double-buffer all state that affects rendering. Sometimes complicated in practice. Synchronize once per frame ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

14 Graphics Fluff Extra graphics that doesn't affect play
4/23/2017 2:45 AM Graphics Fluff Extra graphics that doesn't affect play Procedurally generated animating cloud textures Cloth simulations Dynamic ambient occlusion Procedurally generated vegetation, etc. Extra particles, better particle physics, etc. Easy to synchronize Potentially expensive, but if the core is otherwise idle...? Graphics fluff is a good candidate because it has few interactions with other data. May not need to run at same frame-rate as game. Some games are spending 100% of a core on cloth animation. "That's crazy!", or is it brilliant? The main loop of your game may be impossible to multi-thread, in which case the other threads will sit idle unless you add new features. On PC, graphics fluff can be dropped on single-core machines without affecting game-play. Can be replaced with cheaper alternatives. ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

15 Physics? Could cascade from update to physics to rendering
Makes use of three threads May be too much latency Could run physics on many threads Uses many threads while doing physics May leave threads mostly idle elsewhere

16 Overcommitted Multithreading?
4/23/2017 2:45 AM Overcommitted Multithreading? Physics Game Thread Rendering Thread Rendering Thread Rendering Thread Particle Systems This diagram does show good multithreading, but probably not perfect. It relies on spawning extra threads for physics, animation, and particle systems. It could turn out that this system demands ten hardware threads at some times, and two hardware threads at others. Ideally you should try to have the same number of CPU heavy threads running at all times. Amdahl's law—speeding up part of your calculations just leaves the remainder as the single-threaded bottleneck Middle-ware needs to be flexible enough to adapt to the needs of different games. Physics may be allowed one core—or not. Animation/ Skinning ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

17 Synchronization tips/costs:
4/23/2017 2:45 AM Synchronization tips/costs: Synchronization is moderately expensive when there is no contention Hundreds to thousands of cycles Synchronization can be arbitrarily expensive when there is contention! Goals: Synchronize rarely Hold locks briefly Minimize shared data Requiring exclusive access to a popular resource can make multi-threading a complex way of doing single-threading on multiple threads Ideally you want to use synchronization primitives to guarantee multiple threads won't modify resources simultaneously, while designing so that they generally won't anyway. Sometimes it is worth doing a short spin-lock on resources that are likey to be held for only a short time. InitializeCriticalSectionAndSpinCount supports this. ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

18 Threading File I/O & Decompression
First: use large reads and asynchronous I/O Then: consider compression to accelerate loading Don't do format conversions etc. that are better done at build time! Have resource proxies to allow rendering to continue

19 File I/O Implementation Details
4/23/2017 2:45 AM File I/O Implementation Details vector<Resource*> g_resources; Worst design: decompressor locks g_resources while decompressing Better design: decompressor adds resources to vector after decompressing Still requires renderer to synch on every resource access Best design: two Resource* vectors Renderer has private vector, no locking required Decompressor use shared vector, syncs when adding new Resource* Renderer moves Resource* from shared to private vector once per frame g_resources holds a list of pointers to all loaded resources. It is referenced frequently by the render thread as it needs meshes, textures, shaders, etc. The load thread needs to make resources available once they are loaded. If the decompression thread locks g_resources while it decompresses, or while it does file I/O, then the render thread may be locked out for long periods. If g_resources is shared at all, then every reference by the render thread requires synchronization, wasting time on acquiring and releasing locks. Best design is two (or more) vectors, to insulate threads from each other. Private data is good. ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

20 Profiling multi-threaded apps
4/23/2017 2:45 AM Profiling multi-threaded apps Need thread-aware profilers Profiling may hide many synchronization stalls Home-grown spin locks make profiling harder Consider instrumenting calls to synchronization functions Don't use locks in instrumentation Windows: Intel VTune, AMD CodeAnalyst, and the Visual Studio Team System Profiler Xbox 360: PIX, XbPerfView, etc. Anecdote about profile capture completely hiding critical issue (code was waiting on GPU, but only when not profiling. Same thing happened waiting on load thread) I actually saw a title that had instrumented a ton of functions but then stored the results to a shared array, using critical sections to guard it. About 90% of their synchronization was in the profile functions. Synchronization stalls are hard to locate Use Timing Capture on Xbox 360 to visualize threading behavior Add instrumentation to make visualization easier ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

21 Windows tips Avoid using wglMakeCurrent or this.Invoke()
4/23/2017 2:45 AM Windows tips Avoid using wglMakeCurrent or this.Invoke() Best to do all rendering calls from a single thread Test on multiple machines and configurations Single-core, SMT (i.e. Hyper-Threading), Dual-core, Intel and AMD chips, Multi-socket multicore (4+ cores) If your multi-threaded code is not tested on multi-proc systems, it will fail! ©2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.

22 Ogre-specific Ogre has a class to load resources in a background process


Download ppt "Multi-threading basics"

Similar presentations


Ads by Google