Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny.

Similar presentations


Presentation on theme: "Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny."— Presentation transcript:

1

2 Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny Hendler, Itai Incze, and Moran Tzafrir

3 Multicore Software Scaling 3 User code Multicore Speedup 1.8x7x3.6x Unfortunately, not so simple…

4 Speedup = 1/(ParallelPart/N + SequentialPart) Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! Why? Amdahl’s Law As num cores grows the effect of 25% becomes more acute 2.3/4, 2.9/8, 3.4/16, 3.7/32…4.0/infinity

5 Amdahl and Shared Data Structures 75% Unshared 25% Shared cc cc cc cc Coarse Grained c c c c c c c c cc cc cc cc Fine Grained c c c c c c c c The reason we get only 2.9 speedup 75% Unshared 25% Shared Fine grained parallelism has huge performance benefit

6 But… Can we always draw the right conclusions from Amdah’s law? Claim: sometimes the overhead of using fine- grained synchronization is so high…that it is better to have a single thread do all the work sequentially in order to avoid it

7 7 Software Combining Tree [Yew et al] n requests in log n time object Tree requires a major coordination effort: multiple CAS operations, cache-misses, etc

8 Oyama et. al Mutex object lock bcd Head a object CAS() Apply a,b,c, and d to object return responses Release lock every request involves CAS

9 Flat Combining Have single lock holder collect and perform requests of all others – Without using CAS operations to coordinate requests – With combining of requests (if cost of k batched operations is less than that of k operations in sequence  we win)

10 Flat-Combining object lock Enq(d) Head object CAS() Apply requests to object Publication list Enq(d ) null Deq() counter 54 125453 Enq(d) Deq() Collect requestsAgain try to collect requests Most requests do not involve a CAS, in fact, not even a memory barrier

11 Flat-Combining Pub-List Cleanup Enq(d) Head object Publication list Enq(d ) null Deq() counter 54 12 54 53 Enq(d) Every combiner increments counter and updates record’s time stamp when returning response Traverse and remove from list records with old time stamp If thread reappears must add itself to pub list Cleanup requires no CAS, only reads and writes

12 Fine-Grained FIFO Queue bcd TailHead a JDK6.0 (on > 10 million desktops) lock-free Alg by Michael and Scott CAS() P: Dequeue() => a Q: Enqueue(d)

13 Flat-Combining FIFO Queue object lock Enq(a) Head CAS() Publication list Enq(b ) null counter 54 1254 Enq(b) Deq()Enq(b) Sequential FIFO Queue bcd TailHead a OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step

14 Flat-Combining FIFO Queue object lock Enq(a) Head CAS() Publication list Enq(b ) counter 54 1254 Enq(b) Deq() Enq(b) Sequential “Fat Node” FIFO Queue TailHead c b a c b e OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step “Fat Node” easy sequentially but cannot be done in concurrent alg without CAS

15 Linearizable FIFO Queue Flat Combining Combining tree MS queue, Oyama, and Log-Synch better

16 Benefit’s of Flat Combining Flat Combining in Red better log

17 Linearizable Stack Flat Combining Elimination Stack Treiber Lock-free Stack better

18 Priority Queue Lotan Shavit lock-based SkipQueue Lotan Shavit lock-free SkipQueue Flat combining with sequential pairing heap plugged in… better

19 Parallel FC Synchronous Queues Single Flat Combining Parallel Flat Combining JDK 6.0 JDK no parks Elimination tree better parallel FC single thread performance still better than JDK

20 Why? Parallel Flat Combining in Blue better log

21 Summary FC is provides superior linearizable implementations of quite a few structures Parallel FC, when applicable, allows FC + scalability Good fit with heterogeneous architectures But FC and Parallel FC not always applicable, for example, for search trees (we tried  )

22 In the Future: Data-Structures Will have to Adapt

23 But How?  Randomized, Relaxed, and Flat A lot more randomization…(hash tables, skiplists, randomized collections) Relaxed fairness…(unordered pools instead of queues and stacks) Move away from comparison based algs…(tries and hash tables instead of trees)

24 Figuring out what these future structures will look like is part of what we do at ScaleSynch… Thanks


Download ppt "Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny."

Similar presentations


Ads by Google