Presentation is loading. Please wait.

Presentation is loading. Please wait.

Decomposing Hardware Lock Elision

Similar presentations


Presentation on theme: "Decomposing Hardware Lock Elision"— Presentation transcript:

1 Decomposing Hardware Lock Elision
Stephan Diestelhorst, TU Dresden Christof Fetzer

2 My View of Microprocessor and OS Complexity
OS Compatibility Arch Uarch Transactional Memory } > 0 (!) Short intro to the problem: processors are complex I hve been there, I cannot propose crazily complicated features OS no changes -> carefull arch adaption uarch small changes -> carefull uarch extensions Question: does that leave room for innovation? => YES! With all the cutting off, is there still interesting stuff left over? Hardware Verification Cost AMD –extremetech.com, intel – softpedia.com

3 Hardware Lock Elision Primer
while ( !CAS(lock, FREE, TAKEN) ) Critical Section while ( !CAS(lock, FREE, TAKEN) ) while ( !CAS(lock, FREE, TAKEN) ) introduce the HLE mechanisms quickly recently proposed by Intel lock := FREE Critical Section

4 Hardware Lock Elision Primer
while ( ACQUIRE !CAS(lock, FREE, TAKEN) ) Critical Section while ( ACQUIRE !CAS(lock, FREE, TAKEN) ) while ( ACQUIRE !CAS(lock, FREE, TAKEN) ) introduce the HLE mechanisms quickly recently proposed by Intel Critical Section RELEASE lock := FREE Critical Section RELEASE lock := FREE

5 Hardware Lock Elision Primer
Acquire Magic Transaction Acquire Magic introduce the HLE mechanisms quickly recently proposed by Intel Transaction Release Magic Release Magic

6 Transactions Lock Elision Abort Handling Comparison
TX.start() ACQUIRE Transaction Transaction * lock elision does more than TM: special handling of the lock variable -> more HW * lock elision is less flexible than TM: no visible aborts -> glibc / Linux kernel work currently uses TM * result: people wanting flexibility have to work around the missing ACCESS to the advanced LE feautures => I propose to expose the additional features one by one and make them available to programmers current efforts in Linux Kernel and Glibc to elide locks, all using the transactional mode to work around the limitations of HLE Abort Handler Flexible contention management?

7 Transactions Lock Elision Lock Variable Secret Sauce
TX.start() ACQUIRE TX.start() ACQUIRE lock := TAKEN assert(lock == TAKEN); lock := TAKEN assert(lock == TAKEN); * lock elision does more than TM: special handling of the lock variable -> more HW * lock elision is less flexible than TM: no visible aborts -> glibc / Linux kernel work currently uses TM * result: people wanting flexibility have to work around the missing ACCESS to the advanced LE feautures => I propose to expose the additional features one by one and make them available to programmers current efforts in Linux Kernel and Glibc to elide locks, all using the transactional mode to work around the limitations of HLE TX.start() Special treatment of the lock variable Prediction assert(lock == TAKEN);

8 Complications of Using Transactions for Lock Elision
Memcached: short transactions, low overhead SW prediction [Transact 2010] Hotspot JVM: advanced, multi-mode locks, assert(lock == TAKEN), TAKEN1 vs TAKEN2 memcached work: transparently replace pthread mutex lock / unlock through LD_PRELOAD predcitor has significant impact on performance, semi-correctable prediction Java lock elision (unpublished, yet) roling their own advanced locks transactions cannot acquire the lock for writing many codepaths check the lock whether it is held by the current thread if not, some try to reacquire, others with assert(lock == locked); Multi-modal locks, where to put, update etc. the prediction stats?

9 Combining HLE Features and TM Flexibility
Combine low-overhead HW fast-path & flexible SW handler No extra HW cost Mechanisms: Prediction, Lazy Writes, Silent Store Chains not a HW cost: all the features are likely there for HLE already does not disrupt the incremental upgrade path: each of these has a trivial fall-back split out HLE‘s features and make them available to SW Transactions separately Mechanisms: Prediction, Lazy Writes, Silent Store Chains

10 Software-visible Generic Hardware Prediction
Branch Predictor branch_on_pred <target>, <id> pred_good <id> pred_bad <id> CPU * advantages: no additional memory traffic, can correlate with other branch events, no additional instructions, early in the pipeline -> no delay Gaetan Lee -

11 Lazy Conflict Detection with Software Control
LAZY foo := 1 Transaction Transaction Transaction Transaction i := foo i := foo foo := 1 foo := 1

12 Globally Invisible Store Chains
CYCLE foo := 1 Transaction Transaction foo := 3 Transaction if (foo !=3) TX.abort foo := 1 CYCLE foo := 2 in a transaction, only the last store to a specific region will become visible if the final store turns value back to the one it had before the transaction, the global effects of these stores can be discarded CYCLE foo := 3 foo := 2 foo := 3

13 Putting It All Together
ACQUIRE RELEASE branch_on_pred <sw_pred>, 17 CYCLE lock := FREE TX.start <abort_hnd> TX.commit show how the just introduced primitives can be used to implement lock elision with flexible prediction logic if (lock != FREE) jmp <abort_hnd> pred_good <id> LAZY CYCLE lock := TAKEN

14 Summary HLE adds interesting HW capabilities. We propose to make these available to general (transactional) programming.

15 My Questions Are decomposed lock elision primitives useful? What are additional workloads? Can we increase usability by small tweaks? Upcoming Things what of that is useful to the (S)TM library, compiler and SW developers? can these features be effectively exposed to SW? what are other (except emulating lock elision) use cases for this? I can think of crazy use cases for the prediction feature already are there architectural tweaks that would make their SW adoption easier? Sneak Peek: Resurrecting Aborted Transactions without changing the OS-visible state or the microarchitecture Transactional ressurection and Alert-On-Update (without OS-changes, tiny HW adaption)


Download ppt "Decomposing Hardware Lock Elision"

Similar presentations


Ads by Google