Download presentation

Presentation is loading. Please wait.

Published byDestin Saker Modified over 2 years ago

1
Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan

2
Big Picture 2 Lightweight user-level threads Scheduler 1 t1 t2 tn Lots of concurrency! Core 1 Core n Core 2 Heap

3
Big Picture 3 Expendable resource? Big Picture 3 Scheduler 1 t1 t2 tn Lots of concurrency! Heap

4
Big Picture 4 Expendable resource? Big Picture 4 Scheduler 1 t1 t2 tn Lots of concurrency! Heap Exploit program concurrency to eliminate read barriers from thread-local collectors GC Operation Alleviate MM cost?

5
MultiMLton Goals – Safety, Scalability, ready for future manycore processors Parallel extension of MLton SML compiler and runtime Parallel extension of Concurrent ML – Lots of Concurrency! – Interact by sending messages over first-class channels 5 C send (c, v) v recv (c)

6
MultiMLton GC: Considerations Standard ML – functional PL with side-effects – Most objects are small and ephemeral Independent generational GC – # Mutations << # Reads Keep cost of reads to be low Minimize NUMA effects Run on non-cache coherent HW 6

7
MultiMLton GC: Design 7 Core Local Heap Core Local Heap Core Local Heap Core Local Heap Shared Heap Thread-local GC NUMA Awareness Circumvent cache-coherence issues

8
Invariant Preservation Read and write barriers for preserving invariants 8 Shared Heap r r Local Heap x x Target Source Exporting writes r := x Shared Heap r r Local Heap x x FWD Transitive closure of x Mutator needs read barriers!

9
Challenge Object reads are pervasive – RB overhead ∝ cost (RB) * frequency (RB) Read barrier optimization – Stacks and Registers never point to forwarded objects 9 20.1 % 15.3 % 21.3 % Mean Overhead ---------------------- Read barrier overhead (%)

10
Mutator and Forwarded Objects 10 # RB invocations # Encountered forwarded objects <0.00001 Eliminate read barriers altogether

11
RB Elimination Visibility Invariant – Mutator does not encounter forwarded objects Observation – No forwarded objects created ⇒ visibility invariant ⇒ No read barriers Exploit concurrency Procrastination! 11

12
Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 r1 := x1 r2 := x2 T T is running T T is suspended T T is blocked 12

13
Procrastination Shared Heap r1 Local Heap x1 r1 := x1 T1T2 r2 := x2 r2 x2 Delayed write list Control switches to T2 T T is running T T is suspended T T is blocked 13

14
Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 Delayed write list r1 := x1 r2 := x2 T T is running T T is suspended T T is blocked 14

15
Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list r1 := x1 r2 := x2 x1 T T is running T T is suspended T T is blocked FWD 15 FWD

16
Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list x1 Force local GC T T is running T T is suspended T T is blocked 16 r1 := x1 r2 := x2

17
Correctness Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! 17 T1 T2 T T is running T T is suspended T T is blocked

18
Correctness 18 T1 Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks! T2T T is running T T is suspended T T is blocked Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock!

19
Correctness 19 T1 T2 Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! T T is running T T is suspended T T is blocked Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks!

20
Efficacy (Procrastination) ∝ # Available runnable threads Is Procrastination alone enough? 20 M W1W1 W1W1 W1W1 F J Serial (low thread availability) Concurrent (high thread availability) With Procrastination, half of local major GCs were forced Eager exporting writes while preserving visibility invariant

21
Cleanliness A clean object closure can be lifted to the shared heap without breaking the visibility invariant 21 r := x inSharedHeap (r) inLocalHeap (x) && isClean (x) inLocalHeap (x) && isClean (x) Eager write (no Procrastination)

22
Cleanliness: Intuition 22 Shared Heap Local Heap x x lift (x) to shared heap

23
Shared Heap Local Heap Cleanliness: Intuition 23 x x FWD find all references to FWD

24
Shared Heap Local Heap Cleanliness: Intuition 24 x x Need to scan the entire local heap

25
Local Heap h h Shared Heap Cleanliness: Simpler question 25 x x FWD Do all references originate from heap region h? sizeof (h) << sizeof (local heap)

26
Local Heap h h Shared Heap Cleanliness: Simpler question 26 x x Only scan the heap region h. Heap session! sizeof (h) << sizeof (local heap)

27
Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 27 Previous Session Current Session Free Local Heap SessionStart Frontier Young Objects Old Objects Old Objects Start

28
Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC – SessionStart is moved to Frontier Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 28 Average session size < 4KB Previous SessionFree Local Heap Frontier & SessionStart Start

29
Cleanliness: Eager exporting writes 29 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap

30
Cleanliness: Eager exporting writes 30 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap Walk and fix FWD

31
Avoid tracing current session? Many SML objects are tree-structured (List, Tree, etc,.) – Specialize for no pointers from outside the closure ∀ x’ ∊ transitive closure (x), ref_count (x) = 0 && ref_count (x’) = 1 31 Local Heap x(0) y(1) z(1) Eager exporting write – No current session tracing needed! No refs from outside – ref_count does not consider pointers from stack or registers

32
Cleanliness: Reference Count 32 Current Session X(0) Current Session X(1) Current Session X(LM) Current Session X(G) Prev Sess Zero One LocalMany Global Purpose – Track pointers from previous session to current session – Identify tree-structured object Does not track pointers from stack and registers – Reference count only triggered during object initialization and mutation

33
Bringing it all together ∀ x’ ∊ transitive closure (x), if max (ref_count (x’)) – One & ref_count (x) = 0 ⇒ Clean tree-structured ⇒ Session tracing not needed – LocalMany ⇒ Clean ⇒ Trace current session – Global ⇒ 1+ pointer from previous session ⇒ Procrastinate 33

34
Cleanliness: Tree-structured Closure 34 Previous Session Current Session x(0) y(1) z(1) T1 Local Heap Shared heap r := x r r current stack

35
Shared heap Cleanliness: Tree-structured Closure 35 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x FWD r r Walk current stack

36
Shared heap Cleanliness: Tree-structured Closure 36 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x r r No need to walk current session!

37
Shared heap Cleanliness: Tree-structured Closure 37 Previous Session Current Session x x y y z z T1 Local Heap r := x r r T2 Next stack FWD current stack

38
Shared heap Cleanliness: Tree-structured Closure 38 Previous Session Current Session x x y y z z T1 previous stack Local Heap r := x r r T2 current stack Context Switch Walk target stack

39
Cleanliness: Object graph 39 Previous Session Current Session x(0) y ( LM ) z(1) current stack Local Heap Shared heap r := x r r a a

40
Shared heap Cleanliness: Object graph 40 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a FWD Walk current stack Walk current session

41
Shared heap Cleanliness: Object graph 41 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a Walk current stack Walk current session

42
Cleanliness: Global Reference 42 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a

43
Cleanliness: Global Reference 43 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a Procrastinate

44
Immutable Objects Specialize exporting writes If immutable object in previous session – Copy to shared heap Immutable objects in SML do not have identity – Original object unmodified Avoid space leaks – Treat large immutable objects as mutable 44

45
Cleanliness: Summary Cleanliness allows eager exporting writes while preserving visibility invariant With Procrastination + Cleanliness, <1% of local GCs were forced Additional niceties – Completely dynamic Portable – Does not impose any restriction on the GC strategy 45

46
Evaluation 46 Variants – RB- : TLC with Procrastination and Cleanliness – RB+ : TLC with read barriers Sansom’s dual-mode GC – Cheney’s 2-space copying collection Jonker’s sliding mark-compacting – Generational, 2 generations, No aging Target Architectures: – 16-core AMD Opteron server (NUMA) – 48-core Intel SCC (non-cache coherent) – 864-core Azul Vega3

47
Results Speedup: At 3X min heap size, RB- faster than RB+ – AMD 32% (2X faster than STW collector) – SCC 20% – AZUL 30% Concurrency – During exporting write, 8 runnable user-level threads/core! 47

48
Cleanliness Impact RB- MU- : RB- GC ignoring mutability for Cleanliness RB- CL- :RB- GC ignoring Cleanliness (Only Procrastination) 48 Avg. slowdown -------------------- 11.4% 28.2% 31.7%

49
Conclusion Eliminate the need for read barriers by preserving the visibility invariant – Procrastination: Exploit concurrency for delaying exporting writes – Cleanliness: Exploit generational property for eagerly perform exporting writes 49

50
Questions? http://multimlton.cs.purdue.edu 50

Similar presentations

OK

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on rural education and sustainable development Slideshare net download ppt on pollution Ppt on telephone etiquettes and manners Ppt on bluetooth based smart sensor networks ppt Ppt on economic order quantity example Ppt on education for all in india Powerpoint ppt on patient safety goals Ppt on congruent triangles Ppt on water scarcity in the world Ppt on ready to serve beverages images