Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.

Similar presentations


Presentation on theme: "Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory."— Presentation transcript:

1 Computer Laboratory Practical non-blocking data structures Tim Harris tim.harris@cl.cam.ac.uk Computer Laboratory

2 Overview  Introduction  Lock-free data structures  Correctness requirements  Linked lists using CAS  Multi-word CAS  Conclusions

3 Computer Laboratory Introduction class Counter { int next = 0; int getNumber () { int t; t = next; next = t + 1; return t; }  What can go wrong here? next = 0 Thread1: getNumber() t = 0 Thread2: getNumber() t = 0 result=0 next = 1 result=0 

4 Computer Laboratory Introduction (2) class Counter { int next = 0; synchronized int getNumber () { int t; t = next; next = t + 1; return t; } next = 0  What about now? Thread1: getNumber() t = 0 Thread2: getNumber() result=0 Lock released Lock acquired result=1 next = 1next = 2

5 Computer Laboratory Introduction (3) class Counter { int next = 0; synchronized int getNumber () { int t; t = next; next = t + 1; return t; }  Now the problem is liveness Thread1: getNumber() Thread2: getNumber() Priority inversion: 1 is low priority, 2 is high priority, but some other thread 3 (of medium priority) prevents 1 making any progress Sharing: suppose that these operations may be invoked both in ordinary code and in interrupt handlers… Failure: what if thread 1 fails while holding the lock? The lock’s still held and the state may be inconsistent

6 Computer Laboratory Introduction (4) class Counter { int next = 0; int getNumber () { int t; do { t = next; } while (CAS (&next, t, t + 1) != t); return t; }  In this case a non-blocking design is easy: Atomic compare and swap Location Expected value New value

7 Computer Laboratory Correctness  Safety: we usually want a ‘linearizable’ implementation (Herlihy 1990)  The data structure is only accessed through a well-defined interface  Operations on the data structure appear to occur atomically at some point between invocation and response  Liveness: usually one of two requirements  A ‘wait free’ implementation guarantees per-thread progress  A ‘non-blocking’ implementation guarantees only system-wide progress

8 Computer Laboratory Overview  Introduction  Linked lists using CAS  Basic list operations  Alternative implementations  Extensions  Multi-word CAS  Conclusions

9 Computer Laboratory Lists using CAS  Insert 20: H10 30 T10 30 20 30  20

10 Computer Laboratory Lists using CAS (2)  Insert 20: H10 30 T 20 30  20 25 30  25 

11 Computer Laboratory Lists using CAS (3)  Delete 10: H10 30 TH10 30 10  30

12 Computer Laboratory Lists using CAS (4)  Delete 10 & insert 20: H10 30 TH10 30 H10 30 H10 30 10  30 20 30  20 

13 Computer Laboratory Logical vs physical deletion  Use a ‘spare’ bit to indicate logically deleted nodes: H10 30 TH 20 30  20  10  30  30  30X  10  30 

14 Computer Laboratory Implementation problems  Also need to consider visibility of updates H10 30 T 20 30  20 Write barrier

15 Computer Laboratory Implementation problems (2)  …and the ordering of reads too H10 30 T 20 10 30 while (val < seek) { p = p->next; val = p->val; }   val = ???

16 Computer Laboratory Overview  Introduction  Linked lists using CAS  Multi-word CAS  Design  Results  Conclusions

17 Computer Laboratory Multi-word CAS  Atomic read-modify-write to a set of locations  A useful building block:  Many existing designs (queues, stacks, etc) use CAS2 directly (e.g. Detlefs ’00)  More generally it can be used to move a structure between consistent states  We’d like it to be non-blocking, disjoint-access parallel, linearizable, and efficient with natural data

18 Computer Laboratory Previous work  Lots of designs… Anderson ’95YesStrong LL/SCp(w+l)+l l=log 2 p+log 2 a I+R ’95YesCASp + log 2 p Herlihy ’93NoCAS0 YesCAS0 or 2 Moir ’97YesStrong LL/SClog 2 p+log 2 n I+R ’95YesStrong LL/SClog 2 p …none of them practicable p processors, word size w, max n locations, max a addresses ParallelRequiresReserved bits

19 Computer Laboratory Design H 10 20 T 0x100 0x108 0x110 0x118 0x104 0x10C 0x114 0x11C status=UNDECIDED locations=2 a1=0x10C o1=0x110 n1=0x118 a2=0x114 o2=0x118 n2=  Build descriptor  Acquire locations  Decide outcome  Release locations DCSS (&status, UNDECIDED, 0x10C, 0x110, &descriptor) DCSS (&status, UNDECIDED, 0x114, 0x118, &descriptor) CAS (&status, UNDECIDED, SUCCEEDED) status=SUCCEEDED CAS (0x10C, &descriptor, 0x118)CAS (0x114, &descriptor, null) null

20 Computer Laboratory Reading H 10 20 T 0x100 0x108 0x110 0x118 0x104 0x10C 0x114 0x11C status=UNDECIDED locations=2 a1=0x10c o1=0x110 n1=0x118 a2=0x114 o2=0x118 n2= word_t read (addr_t a) { word_t val = *a; if (!isDescriptor(val)) return val else { SUCCEEDED => return new value; return old value; }

21 Computer Laboratory 100x108 0x10C ac=0x200 oc=0 au=0x10C ou=0x110 nu=0x200  Now we need DCSS from CAS:  Easier than full CAS2: the locations used for ‘control’ and ‘update’ addresses must not overlap, only the ‘update’ address may be changed + we don’t need the result  DCSS(&status, UNDECIDED 0x10C, 0x110, &descriptor): CAS (0x10C, 0x110, &DCSSDescriptor) if (*0x200 == 0) CAS (0x10C, &DCSSDescriptor, 0x200) else CAS (0x10C, &DCSSDescriptor, 0x110); Whither DCSS?

22 Computer Laboratory Evaluation: method  Attempt to permute elements in a vector. Can control:  Level of concurrency  Length of the vector  Number of elements being permuted  Padding between elements  Management of descriptors 2343455460676

23 Computer Laboratory Evaluation: small systems 248163264 HF 1.62.86.01771280 HF-RC 1.52.65.61668270 IR 3.44.47.91976300 MCS 5.68.213244692 MCS-FG 1.42.86.01442130  gargantubrain.cl: 4-processor IA-64 (Itanium)  Vector=1024, Width=2-64, No padding   s per successful update CASn width (words permuted per update) Algorithm used

24 Computer Laboratory Evaluation: large systems ms per successful update Number of processors  hodgkin.hpcf: 64-processor  Origin-2000, MIPS R12000  Vector=1024, Width=2  One element per cache line HF-RC IR MCS

25 Computer Laboratory Overview  Introduction  Linked lists using CAS  Multi-word CAS  Conclusions

26 Computer Laboratory Conclusions  Some general techniques  The descriptor pointers serve two purposes:  They allow ‘helpers’ to find out the information needed to complete their work.  They indicate ownership of locations  Correctness seems clearest when thinking about the state of the shared memory, not the state of individual threads  Unlike previous work we need only a small and constant number of reserved bits (e.g. 2 to identify descriptor pointers if there’s no type information available at run time)

27 Computer Laboratory Conclusions (2)  Our scheme is the first practical one:  Can operate on general pointer-based data structures  Competitive with lock-based schemes  Can operate on highly parallel systems  Disjoint-access parallel, non-blocking, linearizable http://www.cl.cam.ac.uk/~tlh20/papers/hfp-casn-submitted.pdf


Download ppt "Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory."

Similar presentations


Ads by Google