Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Reduction for Store Buffers Ernie Cohen, Microsoft Norbert Schirmer, DFKI.

Similar presentations

Presentation on theme: "Practical Reduction for Store Buffers Ernie Cohen, Microsoft Norbert Schirmer, DFKI."— Presentation transcript:

1 Practical Reduction for Store Buffers Ernie Cohen, Microsoft Norbert Schirmer, DFKI

2 problem practical reasoning about imperative code is based on state assertions and invariants such reasoning tacitly assumes sequential consistency (SC) … … but real MP hardware doesn’t provide SC needed: a programming discipline that – guarantees SC – is flexible enough to handle real software – is practical to check

3 x86/x64 hardware model: TSO FIFO store buffer (SB) between each processor (P) and the (shared, SC) memory – P writes are queued onto its SB – concurrently, writes leave SBs and are applied to memory – a read by P reads from P’s SB if possible; otherwise, it reads from memory (“SB forwarding”) – P can flush its own SB (expensive) note: TSO != “load-acquire, store-release” – a read can move backward past a write to the same location, turning into a read of a constant note: UP TSO machines are SC, but …

4 TSO is not SC TSO is not SC, because of the delay in writes becoming visible to other processors, e.g. P0: P1: both Ps can complete under TSO, but not under SC (whichever thread writes second gets stuck)

5 a simple SC discipline make sure that P reads only when P’s SB is empty – writes dirty the SB; flushes clean it – read allowed only when the SB is clean – (lazy caching uses a similar trick to achieve SC) proof of SC: – each P simulates a virtual P (that might fall behind) – virtual P takes a write step when that write hits memory – real and virtual P are in sync on read steps but this discipline isn’t practical – disjoint concurrency shouldn’t require any flushes! idea: distinguish private and shared memory

6 ownership each location can be either owned (by a unique processor) or unowned each access is volatile or nonvolatile modified discipline: – nonvolatile access requires ownership of the location – volatile writes dirty the SB – volatile reads allowed only when SB is clean simulation proof is similar, but novolatile accesses happen as soon as there are no volatile writes in front of them – they’re guaranteed to see the same values when they hit the SB, because other Ps don’t modify

7 moving ownership around use ghost operations to take and release ownership – P can take ownership of unowned locations – P can release ownership of locations P owns (this fits with ownership in VCC, where “unowned” means owned by a data object rather than a thread) discipline in the paper also adds unowned read- only locations, which allows shared non-volatile reading

8 ex: spinlocks typedef … struct _SPIN_LOCK { volatile int Lock; _(ghost \object prot_obj;) _(invariant !Lock ==> \mine(prot_obj)) } SPIN_LOCK; void Acquire(SPIN_LOCK *SpinLock …) … { int stop; do { …{ //atomic stop = (__interlockedcompareexchange(&SpinLock->Lock, 1, 0) == 0); _(if (stop) \giveup_closed_owner(SpinLock->prot_obj, SpinLock);) } } while (!stop); } Microsoft confidential

9 key points discipline follows some basic VCC methodology – discipline expressed in terms of ghost state – ghost code “witnesses” conformance to the discipline (much as ghost code is used to witness simulations) – by replacing proof obligations with programming obligations, we’re more likely to get programmers to do it when checking the discipline, we get to assume a SC execution, so we never have to think about the SBs.

10 the only tricky part of the proof key observation: ownership changes cannot race on their own – if they do, there are executions that violate the discipline therefore, we can pretend that ownership doesn’t get released until the next volatile write

11 a note on ghosts VCC requires lots of ghost code, incude racy operations on volatile ghost state why doesn’t this introduce flushing? SC code follows discipline on real data => {SC stripped code simulates SC code} SC stripped code follows discipline on real data => {reduction theorem} stripped code simulates SC stripped code => {SC stripped code simulates SC code} stripped code simulates SC code

12 how close is this to practice? discipline followed almost everywhere in the Hv codebase – even non-interlocked volatile writes are fairly rare exceptions (outside of device ops) are writes where – the write doesn’t race with other writes – racing reads can safely read the old value – ex: releasing a spinlock, broadcasting signals a solution: introduce a new kind of volatile – one reader, multiple writers – keep track of an upper and lower bound – writes must be above the upper bound, – writes raise the upper bound – flush raises the lower bound to the upper bound – reads by other processors raise the lower bound to the value read – this works, but is kind of gross


Download ppt "Practical Reduction for Store Buffers Ernie Cohen, Microsoft Norbert Schirmer, DFKI."

Similar presentations

Ads by Google