Introduction to Multithreading

Introduction to Multithreading
Prof. Aaron Lanterman School of Electrical and Computer Engineering Georgia Institute of Technology

References (1) by Ben Albahari, Peter Drayton, and Brad Merrill, 2001
by Joseph Hall, 2008

References (2) Francisco Balena Microsoft Press

Tons of stuff from Microsoft’s Bruce Dawson
References (3) Tons of stuff from Microsoft’s Bruce Dawson

Threading example 1 public static void SyncProb () {
// Create 10 secondary threads for (int i = 0; i <= 9; i++) { Thread t = new Thread(SyncProb_Task); t.Start(i) } static void SyncProb_Task(object obj) { int number = (int) obj; for (int i = 1; i <= 1000; i++) { Console.Write(“ “); Console.Write(number); i will get boxed (remember boxing is explicit but unboxing must be done explicitly) Sometimes digits will bump up against each other, in addition to being jumbled From F. Balena, “Visual C# 2005: The Base Class Library,” p. 468.

Threading example 1, with lock
// The lock object; any nonnull reference value // shared by all threads that need to be synchronized // will do. static Object consoleLock = new Object(); static void SyncProb_Task(object obj) { int number = (int) obj; for (int i = 1; i <= 1000; i++) { lock (consoleLock) { Console.Write(“ “); Console.Write(number); } From F. Balena, “Visual C# 2005: The Base Class Library,” p. 469.

Threading example 2 using System; using System.Threading; class ThreadTest { static void Main() { Thread t = new Thread(new ThreadStart(Go)); t.Start(); Go(); } static void Go() for (char c=‘a’; c <= ‘z’; c++) Console.Write(c); static methods are part of the class, not particular instances ThreadStart delegate not required (I think it was required in earlier versions of C#, and maybe now in current versions of managed C++) Example from “C# Essentials,” pp

Threading example 2 output
using System using System.Threading; class ThreadTest { static void Main() { Thread t = new Thread(new ThreadStart(Go)); t.Start(); Go(); } static void Go() for (char c=‘a’; c <= ‘z’; c++) Console.Write(c); Output: abcdabcdefghijklmnopqrsefg hjiklmnopqrstuvwxyztuvwxyz Example from “C# Essentials,” pp

Threading example 2, with lock
using System; using System.Threading; class LockTest { static void Main() { LockTest lt = new LockTest(); Thread t = new Thread(new ThreadStart(lt.Go)); t.Start(); lt.Go(); } void Go() { lock(this) for (char c=‘a’; c <= ‘z’; c++) Console.Write(c); Example from “C# Essentials,” p. 108 this references the current instance of the class (can’t use this in static methods) Example from “C# Essentials,” pp

Threading ex. 2 output, w/lock
using System; using System.Threading; class LockTest { static void Main() { LockTest lt = new LockTest(); Thread t = new Thread(new ThreadStart(lt.Go)); t.Start(); lt.Go(); } void Go() { lock(this) for (char c=‘a’; c <= ‘z’; c++) Console.Write(c); Example from “C# Essentials,” p. 108 Output: abcdefghijklmnopqrstuvwxyz Example from “C# Essentials,” pp

Lock: behind the curtain
lock(expression) { // mycode } is syntactic sugar for System.Threading.Monitor.Enter(expression); try { finally { System.Threading.Monitor.Exit(expression); From “C# Essentials,” pp

Lock advice from MSDN “In general, avoid locking on a public type, or instances beyond your code's control… lock(this) is a problem if the instance can be accessed publicly. lock(typeof(MyType)) is a problem if MyType is publicly accessible. lock(“myLock”) is a problem since any other code in the process using the same string, will share the same lock.”

Lock advice from Rico Mariani
class MyClass {   private static String myLock = “MyLock“; public void Foo() {  lock(myLock) { ... }   }  } “This is bad because string literals are normally interned, meaning that there is one instance of any given string literal for the entire program. The exact same object represents the literal…on all threads. So if someone else comes along and locks a literal named “MyLock” his literal will interfere with yours. Recommendation: private static Object myLock = new Object();

Don’t lock on value types
Value types can be “boxed” to act as reference types… …but each lock construct will create a different box So don’t lock on an integer, etc.

Grrrrrrrrrrr!!!!! XNA on Xbox 360 uses Compact Framework, not full .NET like on Windows Compact Framework has a Monitor class (so can use locks), but it doesn’t implement Pulse/Wait and their variations  Also missing Semaphores Available in “pro Xbox 360 development,” i.e. C++ XDK According to a former student who asked about it during a job interview with EA Was embarrassing in a previous semester since I spent a bunch of time talking about Pulse/Wait and Semaphores only to find they are unavaiable on the Xbox 360

One Mutex A mutex is called “signaled” if no thread currently owns it
// This Mutex object must be accessible to all threads. Mutex m = new Mutex(); public void WaitOneExample(); { // Attempt to enter the synchronized section, // but give up after 0.1 seconds if (m.WaitOne(100, false)) // Enter the synchronized section. … // Exit the synchronized section, and release the Mutex. m.ReleaseMutex(); } Everyone is on their honor to release the mutex when they can; also everyone must agree on what resources are being protected This usage isn’t really any different than a Monitor lock, in terms of what you can do with it (The “false” parameter is referring to something complicated and obscure.) A mutex is called “signaled” if no thread currently owns it From F. Balena, “Visual C# 2005: The Base Class Library,” p. 478.

Many Mutexes - WaitAny static Mutex[] mutexes =
{ new Mutex(), new Mutex(), new Mutex() }; public void WaitAnyExample(); { // Wait until a resource becomes available. // (Returns the index of the available resource.) int mutexNdx = Mutex.WaitAny(mutexes); // Enter the synchronized section. // (This code should use only the // resource corresponding to mutexNdx.) … // Exit the synchronized section, and release the Mutex. mutexes[mutexNdx].ReleaseMutex(); } Useful when waiting for communication ports, disk drives, etc. to become available From F. Balena, “Visual C# 2005: The Base Class Library,” p. 479.

Mutex.WaitAll(mutexes)
Many Mutexes - WaitAll Mutex.WaitAll(mutexes) Wait until all resources have been released Useful if you can’t proceed until all the other threads are done From F. Balena, “Visual C# 2005: The Base Class Library,” pp. 480.

Naming a Mutex (available on Windows)
Mutex m = new Mutex(false,”mutexname”); If a Mutex with that name already exists, caller gets a reference to it; otherwise a new Mutex is created Lets you share Mutex objects among different applications Not too relevant to video game programming Can’t do this with monitor locks From F. Balena, “Visual C# 2005: The Base Class Library,” pp. 480.

Mutexes vs. Monitor locks
Mutexes slower than locks (around 20 times slower!) Monitor locks operating at the level of the CLR Mutexes operate at the OS level Mutexes generally reserved for interprocess communications (vs. interthread) Info from B. Dawson, “Coding For Multiple Cores on Xbox 360 and Microsoft Windows,”

Thread safety Some .NET objects are thread-safe Some aren’t
Some .NET objects have some method that are thread safe and some that aren’t Check the documentation If using on Xbox 360, be careful to note .NET vs. “Compact .NET” differences Info from F. Balena, “Visual C# 2005: The Base Class Library,” pp

Synchronized types Some .NET types that aren’t ordinarily thread-safe offer thread-safe version // Create an ArrayList object, and add some values to it ArrayList al = new ArrayList(); al.Add(1); al.Add(2); al.Add(3); // Create a synchronized, thread-safe version ArrayList syncAl = ArrayList.Synchronized(al); // Prove that the new object is thread-safe Console.WriteLine(al.IsSynchronized); // => False; Console.WriteLine(syncAl.IsSynchronized); // => True; // You can share the syncAl object among different // threads From F. Balena, “Visual C# 2005: The Base Class Library,” pp

Synchronized types - disadvantages
Accessing synchronized objects is slower than accessing the original nonsynchronized object Generally better (in terms of speed) to use regular types and synchronize via locks Info from F. Balena, “Visual C# 2005: The Base Class Library,” p. 474.

Problems with locks (1) Overhead: acquiring and releasing locks takes time So don’t acquire locks too often Deadlocks: lock acquisition order must be consistent to avoid these So don’t have very many locks, or only acquire one at a time Contention: sometimes somebody else has the lock So never hold locks for too long So have lots of little locks From B. Dawson, “Lockless Programming in Games,”

Problems with locks (2) Priority inversions: if a thread is swapped out while holding a lock, progress may stall Changing thread priorities can lead to this Xbox 360 system threads can briefly cause this From B. Dawson, “Lockless Programming in Games,”

Sensible reaction Use locks carefully Or, try lockless
Don't lock too frequently Don't lock for too long Don't use too many locks Don't have one central lock Or, try lockless From B. Dawson, “Lockless Programming in Games,”

Lockless programming Techniques for safe multi-threaded data sharing without locks Pros: May have lower overhead Avoids deadlocks May reduce contention Avoids priority inversions Cons Very limited abilities Extremely tricky to get right Generally non-portable

Polling Main thread checks flag variables set by the worker threads when they finish Useful if main thread can do some stuff (e.g., eye-candy animation in a turn-based strategy game) independently of the worker threads (e.g. AI), but needs worker threads to finish before continuing (e.g. making the computer’s move)

Polling example bool done = false; while (!done) { Thread.Sleep(0);
done = true; for int(i = 0; i < m_ThreadDone.Length; i++) done &= m_ThreadDone[i]; } Code from Joseph Hall, “XNA Game Studio Express,” p. 608 Thread.Sleep(timespan) sleeps for timespan milliseconds – what does it mean to sleep for 0 seconds? Basically says “hey, I don’t really have much interesting stuff to do, anyone else want to run?” Worker thread i sets m_ThreadDone[i]=true before it exits

The problem with polling
Polling takes up “C# cycles” If your main thread only needs to wait until its worker threads are done, the Wait/Pulse approach is better Let the .NET runtime handle it! Uh… oh, but only on Windows. 

True or False? “If all you are doing is reading or writing a shared integer variable, nothing can go wrong and you don’t need any lock blocks, since reads and writes correspond to a single CPU instruction… right?” Info from F. Balena, “Visual C# 2005: The Base Class Library,” p. 472.

Beware enregistering private bool Done = false; void TheTask(); { // Exit the loop when another thread has set the Done // flag or when the task being performed is complete. while (this.Done == false) // Do some stuff if (nothingMoreToDo) this.Done = true; break; } Maybe this is a “fluff” routine Enregistering: compiler caches variable in a register, not in L2 or main memory From F. Balena, “Visual C# 2005: The Base Class Library,” p. 472.

volatile fields private volatile bool Done = false;
volatile tells compiler other threads may be reading or writing to the variable, so don’t enregister it Does not ensure operations are carried out atomically for classes, structs, arrays… Does not ensure atomic read+write for anything Increment, decrement Test & Set “Works” in .NET, but can still be problematic when doing “real C++ XDK” Xbox 360 programming (we’ll return to this later) Info from F. Balena, “Visual C# 2005: The Base Class Library,” p. 474.

Interlocked.X (1) Atomic increment and decrement:
int lockCounter = 0; // Increment the counter and execute some code if // its previous value was zero if (Interlocked.Increment(ref lockCounter) == 1) { … } // Decrement the shared counter. Interlocked.Decrement(ref lockCounter); Can also increment or decrement by an arbitray amount with a second argument From F. Balena, “Visual C# 2005: The Base Class Library,” p. 485.

Interlocked.X (2) Can assign a value and return its previous value as an atomic operation: string s1 = “123”; string s2 = Interlocked.Exchange(ref s1, “abc”); After execution, s2 = “123”, s1 = “abc” Think about how swapping can get messed up: temp = a; a = b; b = temp; CompareExchange swaps a and b but only if a = c Variation to the assignment if a and c are equal (reference equality in the case of objects): Interlocked.CompareExchange(ref a, b, c); From F. Balena, “Visual C# 2005: The Base Class Library,” p. 485.

Out-of-order read/writes (1)
“CPUs employ performance optimizations that can result in out-of-order execution, including memory load and store operations.” “Memory operation reordering normally goes unnoticed within a single thread of execution, but causes unpredictable behaviour in concurrent programs and device drivers unless carefully controlled.” It’s not necessarily the case that the CPU is “reordering the instructions,” but the apparent effects of those instructions may complete at different times

“When a program runs on a single CPU, the hardware performs the necessary book-keeping to ensure that programs execute as if all memory operations were performed in program order, hence memory barriers are not necessary.” Most of what you think is going on isn’t really going on under the hood; it’s a contract between the CPU designer and you

“However, when the memory is shared with multiple devices, such as other CPUs in a multiprocessor system, or memory mapped peripherals, out-of-order access may affect program behavior.” “For example a second CPU may see memory changes made by the first CPU in a sequence which differs from program order.”

Simple CPU/compiler model
Read pC Write pA Write pB Read pD Write pC Write pB Write pC Write pA Read pD Read pC Simple model is that reads and writes happen instantaneously, one after another. This is how computers worked in the old days. 6502, 68000, these machines were so slow that memory could keep up and loads/stores happened exactly when you thought they did, with minimal pipelining. But actually, reads and writes take a really long time – too long. Note that ‘pass’ (as in reads pass writes) means a newer operation (read) passes an older operation (write). Note also that this refers to reads and writes from a single CPU core passing each other. Each processor has an independent stream of events Point out that a write doesn’t take effect for other processors until it reaches shared storage. From B. Dawson, “Lockless Programming in Games,”

Alternate CPU model – writes pass writes
Write pA Write pB Write pC Visible order: Write pB Write pA Write pC Can hit L2 at different times because of the gather/store buffers Reads and writes take a really long time – too long. So instructions don’t wait for them. Note that all of the instructions complete before any of the writes reach shared storage (L2), and notice that the writes don’t reach shared storage in order. A write doesn’t take effect for other processors until it reaches shared storage. From B. Dawson, “Lockless Programming in Games,”

Alternate CPU – reads pass reads
Read A1 Read A2 Visible order: Read A1 Read A2 Read A1 Getting A1 from the L1 cache Point out that a read takes effect when the data leaves shared storage. The time that the data arrives at the processor core is not the relevant metric. From B. Dawson, “Lockless Programming in Games,”

Alternate CPU – writes pass reads
Read A1 Write A2 Visible order: Read A1 Write A2 From B. Dawson, “Lockless Programming in Games,”

Alternate CPU – reads pass writes
From B. Dawson, “Lockless Programming in Games,” Read A1 Write A2 Read A2 Visible order: Read A1 Read A2 Write A2 Read A1 Note that at this point the CPU that wrote to A2 will see the new value, but no other CPU will!!! Reads passing writes is the most expensive type of reordering for CPU designers to avoid.

Memory models "Pass" means "visible before"
From B. Dawson, “Lockless Programming in Games,” Memory models x86/x64 PowerPC ARM IA64 store can pass store? No Yes* load can pass load? Yes store can pass load? load can pass store?** These are the four basic types of reordering. Actual causes may vary. We’ll see how they can all cause problems. It is the job of the compiler and of the CPU to only do rearrangements which are safe for single-threaded code. If they rearrange your reads and writes in such a way that it breaks your single-threaded code (with the exception of device drivers) then they are broken. This also doesn’t affect code that uses locks. Locks are always also memory barriers that prevent reordering so if you use, for instance, critical sections you can ignore these reordering problems. "Pass" means "visible before" Memory models are actually more complex than this May vary for cacheable/non-cacheable, etc. This only affects multi-threaded lock-free code!!! * Only stores to different addresses can pass each other ** Loads to a previously stored address will load that value

Improbable CPU – Reads Don’t Pass Writes
Read A1 Write A2 Read A1 Write A2 Read A1 Note the painfully slow read/probe of A1 that is not allowed to pass the write of A2. Such a CPU would be so slow that it would be impractical. You’d be basically killing your L1 cache From B. Dawson, “Lockless Programming in Games,”

Reads must pass writes! Reads not passing writes would mean L1 cache is frequently disabled Every read that follows a write would stall for shared storage latency Huge performance impact Therefore, on x86 and x64 (on all modern CPUs) reads can pass writes From B. Dawson, “Lockless Programming in Games,”

Memory barriers “a class of instructions which cause a central processing unit (CPU) to enforce an ordering constraint on memory operations issued before and after the barrier instruction.”

PowerPC memory barriers
Assembly instructions: lwsync: lightweight sync (still lets reads pass writes) sync, i.e. hwsync: heavyweight sync (stops all reordering) eieio: “Enforce In-Order Execution of I/O” Further information from an from Bruce Dawson

MyExportBarrier(); Prevents reordering of writes by compiler or CPU
Used when allowing access to data x86/x64: _ReadWriteBarrier(); Compiler intrinsic, prevents compiler reordering PowerPC: __lwsync(); Hardware barrier, prevents CPU write reordering ARM: __dmb(); // Full hardware barrier IA64: __mf(); // Full hardware barrier Positioning is crucial! Write the data, MyExportBarrier, write the control value Export-barrier followed by write is known as write-release semantics Positioning is crucial! We need to make sure that g_data is written before g_dataReady, so the export barrier has to go between them. It’s possible that we could use _WriteBarrier instead of _ReadWriteBarrier on x86/x64 but it’s not worth the risk. __lwsync also works to prevent the compiler from reordering across it, so it is both a compiler and a CPU memory barrier. From B. Dawson, “Lockless Programming in Games,”

MyImportBarrier(); Prevents reordering of reads by compiler or CPU
Used when gaining access to data x86/x64: _ReadWriteBarrier(); Compiler intrinsic, prevents compiler reordering PowerPC: __lwsync(); or isync(); Hardware barrier, prevents CPU read reordering ARM: __dmb(); // Full hardware barrier IA64: __mf(); // Full hardware barrier Positioning is crucial! Read the control value, MyImportBarrier, read the data Read followed by import-barrier is known as read-acquire semantics Positioning is again crucial! We need to make sure that g_dataReady is read before g_data, so we place the import barrier between them. isync can also be used to prevent read reordering, but lwsync is recommended on Xbox 360. It’s possible that we could use _ReadBarrier instead of _ReadWriteBarrier on x86/x64 but it’s not worth the risk. __lwsync prevents read passing read, write passing write, and write passing read Does not prevent read passing write Adding __lwsync after every instruction gives you the x86/x64 memory model __sync is full barrier From B. Dawson, “Lockless Programming in Games,”

Full memory barrier MemoryBarrier();
x86: __asm xchg Barrier, eax x64: __faststorefence(); Xbox 360: __sync(); ARM: __dmb(); IA64: __mf(); Prevents all reordering – including preventing reads passing writes Most expensive barrier type If you need a full memory barrier then you’re in luck, there is an existing macro which gives you this on x86, x64, and Xbox 360 The regular memory barriers are not sufficient for this. MyExportBarrier and MyImportBarrier don’t prevent reads from passing writes, on any platform. Note that entering and exiting a lock such as a critical section does a full memory barrier. From B. Dawson, “Lockless Programming in Games,”

Reordering implications
Publisher/Subscriber model Thread A: g_data = data; g_dataReady = true; Thread B: if( g_dataReady ) process( g_data ); Is it safe? This algorithm is particularly nice because it has the lowest possible communication overhead. Interlocked operations are usually needed for lockless algorithms and they can be expensive, especially as processor counts rise. This algorithm can be thread safe while just using normal reads and writes. But, as written above it is not safe. Common question: can threads see each other’s data? Yes, of course. Follow-up: when? Certainly not instantly cycles later on my current machine. But that’s not useful. More useful question is: how can I tell when my data is visible to another thread? Common implementation: write some data, then set a flag. Concrete question: if I write some data and then set a flag, will any threads be able to see the flag being set without seeing the data? Now we’re getting somewhere. Question is still too hard. Let’s break it into two parts: 1) If I write some data and then set a flag, is it possible for the flag to be set in shared memory before the data gets there? 2) If I read a flag and then if it is set read some data, is it possible for the data to have come from shared memory before the flag came from shared memory? (harder to think about) Short answer to both: yes. From B. Dawson, “Lockless Programming in Games,”

Publisher/Subscriber on PowerPC (1)
From B. Dawson, “Lockless Programming in Games,” Proc 1: Write g_data Write g_dataReady Proc 2: Read g_dataReady Read g_data Writes may reach L2 out of order Write g_data Write g_dataReady Proc 1 only does writes, and proc 2 only does reads, so write/read and read/write reordering is irrelevant. What happens with read/read and write/write reordering? Both can cause problems. The writes to g_data and g_dataReady can be reordered, which can allow stale data to be read.

From B. Dawson, “Lockless Programming in Games,” Proc 1: Write g_data MyExportBarrier(); Write g_dataReady Proc 2: Read g_dataReady Read g_data Writes now reach L2 in order Export Barrier Write g_data Write g_dataReady Inserting an export barrier avoid this problem – writes now reach L2/shared storage in order.

From B. Dawson, “Lockless Programming in Games,” Proc 1: Write g_data MyExportBarrier(); Write g_dataReady Proc 2: Read g_dataReady Read g_data Reads may leave L2 out of order – g_data may be stale Export Barrier Read g_dataReady Read g_data Write g_data Write g_dataReady However, reads can also be reordered, even on in-proc machines. The second read may be satisfied from L1 meaning that it may end up with stale data. More precisely, it may end up with data that is staler than the first read. Stale data is unavoidable because the speed of light is finite, but inconsistently stale data is a problem. g_dataReady may be fresher than g_dataReady What has happened is that the reads have left L2 out of order – g_data left L2 a long time ago – before this piece of code even executed. Other types of read-reordering may be possible. Invalidate g_data

From B. Dawson, “Lockless Programming in Games,” Proc 1: Write g_data MyExportBarrier(); Write g_dataReady Proc 2: Read g_dataReady MyImportBarrier(); Read g_data It's all good! Export Barrier Read g_dataReady Import Barrier Read g_data Write g_data Write g_dataReady Inserting an import barrier prevents read reordering. The actual implementation is not documented, but the effect is to make sure that reads following the barrier are at least as fresh as reads before the barrier. One possible implementation would be to send a message down to L2 that takes long enough to return that any pending invalidates are guaranteed to have time to come back from L2. The crucial takeaways are: Reordering can happen both on the publisher and subscriber sides of data sharing, so barriers are generally needed in both places. Barrier positioning is crucial – an export barrier prevents new writes from passing older writes. An import barrier ensures that new reads are at least as fresh as old writes. The positioning of the barrier determines what are ‘new’ reads/writes and what are ‘old’ reads/writes. This reordering can happen on in-order CPUs. Even though instructions are executed in order the load/store subsystem may have significant reordering built-in. It depends on the CPU. x86/x64 FTW!!! Invalidate g_data

x86/x64 FTW!!! Not so fast… Compilers can be just as evil as processors Compilers will rearrange your code as much as legally possible And compilers assume your code is single threaded Compiler and CPU reordering barriers needed x86/x64 CPUs do not let reads pass reads or writes pass writes. But, this same problem can happen even on x86/x64 systems, because the compiler may do these rearrangements for you. Compilers will do all four types of rearrangement (maybe more) From B. Dawson, “Lockless Programming in Games,”

Barrier summary MyExportBarrier when publishing data, to prevent write reordering MyImportBarrier when acquiring data, to prevent read reordering MemoryBarrier to stop all reordering, including reads passing writes Identify where you are publishing/releasing and where you are subscribing/acquiring These three barrier should be portable and sufficient for all of your reordering prevent needs. From B. Dawson, “Lockless Programming in Games,”

What about “volatile” in C++?
Standard volatile semantics not designed for multi-threading Compiler can move normal reads/writes past volatile reads/writes Also, doesn’t prevent CPU reordering VC volatile is better… Acts as read-acquire/write-release on x86/x64 and Itanium Doesn’t prevent hardware reordering on Xbox 360 Watch for atomic<T> in C++0x Sequentially consistent by default but can choose from four memory models Note that on Itanium VC will add the necessary hardware memory barriers. On x86/x64 it doesn’t need to. On these platforms marking g_dataReady as volatile is sufficient for publisher/subscriber. On Xbox 360 it needs to add hardware memory barriers, but doesn’t, so it’s almost useless. So, don’t use volatile for multi-threading. It’s behavior is far too inconsistent and it is not what it is intended for.  But volatile in .NET plays nice From B. Dawson, “Lockless Programming in Games,”

Interlocked.X in C++ Interlocked.X is a full barrier on Windows for x86, x64, and Itanium Not a barrier at all on Xbox 360 Oops. Still atomic, just not a barrier Using InterlockedXxx unfortunately means that your code is not portable to Xbox 360. You should always (with extremely rare exceptions) use InterlockedXxx Acquire or Release – whichever is appropriate – if you need Xbox 360 portability. InterlockedIncrement for reference counting is the most common time for wanting to avoid the cost of the Acquire and Release modifiers and the lwsync that they imply. If in doubt use InterlockedIncrementAcquire (because you are gaining access to the object and want to see other CPUs changes) and use InterlockedDecrementRelease (because you are giving up access to the object and want your changes to be visible). For immutable objects this isn’t necessary, but even immutable objects are modified at creation time, so be careful. From B. Dawson, “Lockless Programming in Games,”

Problems with C++ on Xbox 360
Interlocked.X & volatile-type operations are very fast Safe on Windows (because of Intel memory model) When doing “real X++ XDK” Xbox 360 development, Interlocked.X and volatile keyword will prevent compiler from reordering reads and writes, but not the CPU! The Xbox 360 CPU is broken Info from B. Dawson, “Lockless Programming Considerations for Xbox 360 and Microsoft Windows,” msdn2.microsoft.com/en-us/library/bb aspx

Danger of the Xbox 360 CPU Can still do native lockless programming in on the Xbox 360, but you have to really know what you’re doing Info from B. Dawson, “Lockless Programming Considerations for Xbox 360 and Microsoft Windows,” msdn2.microsoft.com/en-us/library/bb aspx

Locks and Mutexes provide needed memory barriers
Playing it safe Locks and Mutexes provide needed memory barriers Makes them easier to use than lockless programming

C#: MemoryBarrier() “Synchronizes memory access as follows: The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier execute after memory accesses that follow the call to MemoryBarrier.”

Notes on MemoryBarrier()
“MemoryBarrier is required only on multiprocessor systems with weak memory ordering (for example, a system employing multiple Intel Itanium processors).” “For most purposes, the C# lock statement…the Monitor class provide easier ways to synchronize data.”

Compact Framework to the rescue? (1)
“Now, we have access to a fair few Interlocked.xxx methods in the framework, which would do fine if I were programming on Windows, however on the 360 I need to be sure that I am not going to be caught out by write-reordering by the CLR or CPU. (i.e the reading thread spins until Interlocked.xxx sees a flag change, but the writing thread's CPU hasn't finished writing out its data to its cache, causing the reading thread to see old data).” - CosmicFlux, 7/9/2007 Creator’s Club community forum post, “Lightweight locking on the 360”

Compact Framework to the rescue? (2)
“From the CF guys who implemented these methods: The Interlocked functions in NETCF provide a memory barrier on both sides of the interlocked operation. (This is different than native Xbox360 programming.) In addition, we provide the Thread.MemoryBarrier api if the customer needs to place an explicit memory barrier. Also, the Monitor functions are generally a higher performance operation than using a Mutex unless there are many many collisions on the lock. They were quite impressed that someone actually understood the issues involved :-)” - Shawn Hargreaves, 7/10/2007 Creator’s Club community forum post, “Lightweight locking on the 360”

Partial memory barriers in C#
Don’t have to declare a variable volatile Instead, you can use value = Thread.VolatileRead(ref sharedvalue); Thread.VolatileWrite(ref sharedvalue, value); as needed Memory barriers are slow, so manually using VolatileRead & VolatileWrite as needed will let you optimize, but you have to know what you’re doing volatile variables conduct implicit VolatileRead and VolatileWrite Info from F. Balena, “Visual C# 2005: The Base Class Library,” p. 474.

Setting thread priority in C#
t.Priority = ThreadPriority.Normal; or Highest, AboveNormal, BelowNormal, Lowest Defaults to normal OS may ignore you Be careful about boosting thread priority If the priority is too high, you could cause the system to hang and become unresponsive If the priority is too low, the thread may starve Final bullet from Bruce Dawson & Chuck Walbourn, Microsoft Game Technology Group, “Coding for Multiple Cores,” PowerPoint presentation

Locating your threads on the Xbox 360
Thread.CurrentThread.SetProcessorAffinity (new int[] {index}); Set thread affinity within the worker thread immediately after starting it Don’t forget to call it, or your worker thread will be running on the same hardware thread as your main thread Only available on Xbox 360 XNA

Check to see if you’re on an Xbox 360
#if XBOX360 Thread.CurrentThread.SetProcessorAffinity (new int[] {index}); #endif No way I know of in C# to manually set processor affinity in Windows like on the Xbox 360 Windows decides what threads run where

Xbox 360 hardware threads Ind CPU Thr Comment
Not available in XNA Available; main thread; game runs here by default Not available in XNA Available; parts of the Guide and Dashboard live here Available; Xbox Live Marketplace downloads Available; parts of the Table from Joseph Hall, “XNA Game Studio Express,” p. 608

Xbox 360 specific notes (1) “If a program holds a lock for too long—because of poor design or because the thread has been swapped out by a higher priority thread—then other threads may be blocked for a long time.” “This risk is particularly great on Xbox 360, because the software threads are assigned a hardware thread by the developer, and the operating system won't move them to another hardware thread, even if one is idle.” Info from B. Dawson, “Lockless Programming Considerations for Xbox 360 and Microsoft Windows,” msdn2.microsoft.com/en-us/library/bb aspx

Xbox 360 specific notes (2) The Xbox 360 also has no protection against priority inversion, where a high-priority thread spins in a loop while waiting for a low-priority thread to release a lock Info from B. Dawson, “Lockless Programming Considerations for Xbox 360 and Microsoft Windows,” msdn2.microsoft.com/en-us/library/bb aspx

Advice More than one thread per core isn’t bad…
…but more than one processor-intensive task per core is! Put most intensive tasks on separate cores, and some less-demanding tasks on those same cores (threads that work in short bursts, disk I/O, etc.) Advice from Joseph Hall, “XNA Game Studio Express,” p. 610

More advice Limit number of synchronization points
Don’t lock resources longer than necessary Avoid sharing data when possible Profile your code before and after to make sure you’re getting the performance benefits you expect Very easy to write multithreaded code that performs worse than single threaded! Advice from Joseph Hall, “XNA Game Studio Express,” p. 611

Introduction to Multithreading

Similar presentations

Presentation on theme: "Introduction to Multithreading"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Multithreading

Similar presentations

Presentation on theme: "Introduction to Multithreading"— Presentation transcript:

Similar presentations

About project

Feedback