Download presentation

Presentation is loading. Please wait.

Published byDestin Saker Modified about 1 year ago

1
Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan

2
Big Picture 2 Lightweight user-level threads Scheduler 1 t1 t2 tn Lots of concurrency! Core 1 Core n Core 2 Heap

3
Big Picture 3 Expendable resource? Big Picture 3 Scheduler 1 t1 t2 tn Lots of concurrency! Heap

4
Big Picture 4 Expendable resource? Big Picture 4 Scheduler 1 t1 t2 tn Lots of concurrency! Heap Exploit program concurrency to eliminate read barriers from thread-local collectors GC Operation Alleviate MM cost?

5
MultiMLton Goals – Safety, Scalability, ready for future manycore processors Parallel extension of MLton SML compiler and runtime Parallel extension of Concurrent ML – Lots of Concurrency! – Interact by sending messages over first-class channels 5 C send (c, v) v recv (c)

6
MultiMLton GC: Considerations Standard ML – functional PL with side-effects – Most objects are small and ephemeral Independent generational GC – # Mutations << # Reads Keep cost of reads to be low Minimize NUMA effects Run on non-cache coherent HW 6

7
MultiMLton GC: Design 7 Core Local Heap Core Local Heap Core Local Heap Core Local Heap Shared Heap Thread-local GC NUMA Awareness Circumvent cache-coherence issues

8
Invariant Preservation Read and write barriers for preserving invariants 8 Shared Heap r r Local Heap x x Target Source Exporting writes r := x Shared Heap r r Local Heap x x FWD Transitive closure of x Mutator needs read barriers!

9
Challenge Object reads are pervasive – RB overhead ∝ cost (RB) * frequency (RB) Read barrier optimization – Stacks and Registers never point to forwarded objects % 15.3 % 21.3 % Mean Overhead Read barrier overhead (%)

10
Mutator and Forwarded Objects 10 # RB invocations # Encountered forwarded objects < Eliminate read barriers altogether

11
RB Elimination Visibility Invariant – Mutator does not encounter forwarded objects Observation – No forwarded objects created ⇒ visibility invariant ⇒ No read barriers Exploit concurrency Procrastination! 11

12
Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 r1 := x1 r2 := x2 T T is running T T is suspended T T is blocked 12

13
Procrastination Shared Heap r1 Local Heap x1 r1 := x1 T1T2 r2 := x2 r2 x2 Delayed write list Control switches to T2 T T is running T T is suspended T T is blocked 13

14
Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 Delayed write list r1 := x1 r2 := x2 T T is running T T is suspended T T is blocked 14

15
Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list r1 := x1 r2 := x2 x1 T T is running T T is suspended T T is blocked FWD 15 FWD

16
Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list x1 Force local GC T T is running T T is suspended T T is blocked 16 r1 := x1 r2 := x2

17
Correctness Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! 17 T1 T2 T T is running T T is suspended T T is blocked

18
Correctness 18 T1 Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks! T2T T is running T T is suspended T T is blocked Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock!

19
Correctness 19 T1 T2 Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! T T is running T T is suspended T T is blocked Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks!

20
Efficacy (Procrastination) ∝ # Available runnable threads Is Procrastination alone enough? 20 M W1W1 W1W1 W1W1 F J Serial (low thread availability) Concurrent (high thread availability) With Procrastination, half of local major GCs were forced Eager exporting writes while preserving visibility invariant

21
Cleanliness A clean object closure can be lifted to the shared heap without breaking the visibility invariant 21 r := x inSharedHeap (r) inLocalHeap (x) && isClean (x) inLocalHeap (x) && isClean (x) Eager write (no Procrastination)

22
Cleanliness: Intuition 22 Shared Heap Local Heap x x lift (x) to shared heap

23
Shared Heap Local Heap Cleanliness: Intuition 23 x x FWD find all references to FWD

24
Shared Heap Local Heap Cleanliness: Intuition 24 x x Need to scan the entire local heap

25
Local Heap h h Shared Heap Cleanliness: Simpler question 25 x x FWD Do all references originate from heap region h? sizeof (h) << sizeof (local heap)

26
Local Heap h h Shared Heap Cleanliness: Simpler question 26 x x Only scan the heap region h. Heap session! sizeof (h) << sizeof (local heap)

27
Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 27 Previous Session Current Session Free Local Heap SessionStart Frontier Young Objects Old Objects Old Objects Start

28
Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC – SessionStart is moved to Frontier Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 28 Average session size < 4KB Previous SessionFree Local Heap Frontier & SessionStart Start

29
Cleanliness: Eager exporting writes 29 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap

30
Cleanliness: Eager exporting writes 30 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap Walk and fix FWD

31
Avoid tracing current session? Many SML objects are tree-structured (List, Tree, etc,.) – Specialize for no pointers from outside the closure ∀ x’ ∊ transitive closure (x), ref_count (x) = 0 && ref_count (x’) = 1 31 Local Heap x(0) y(1) z(1) Eager exporting write – No current session tracing needed! No refs from outside – ref_count does not consider pointers from stack or registers

32
Cleanliness: Reference Count 32 Current Session X(0) Current Session X(1) Current Session X(LM) Current Session X(G) Prev Sess Zero One LocalMany Global Purpose – Track pointers from previous session to current session – Identify tree-structured object Does not track pointers from stack and registers – Reference count only triggered during object initialization and mutation

33
Bringing it all together ∀ x’ ∊ transitive closure (x), if max (ref_count (x’)) – One & ref_count (x) = 0 ⇒ Clean tree-structured ⇒ Session tracing not needed – LocalMany ⇒ Clean ⇒ Trace current session – Global ⇒ 1+ pointer from previous session ⇒ Procrastinate 33

34
Cleanliness: Tree-structured Closure 34 Previous Session Current Session x(0) y(1) z(1) T1 Local Heap Shared heap r := x r r current stack

35
Shared heap Cleanliness: Tree-structured Closure 35 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x FWD r r Walk current stack

36
Shared heap Cleanliness: Tree-structured Closure 36 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x r r No need to walk current session!

37
Shared heap Cleanliness: Tree-structured Closure 37 Previous Session Current Session x x y y z z T1 Local Heap r := x r r T2 Next stack FWD current stack

38
Shared heap Cleanliness: Tree-structured Closure 38 Previous Session Current Session x x y y z z T1 previous stack Local Heap r := x r r T2 current stack Context Switch Walk target stack

39
Cleanliness: Object graph 39 Previous Session Current Session x(0) y ( LM ) z(1) current stack Local Heap Shared heap r := x r r a a

40
Shared heap Cleanliness: Object graph 40 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a FWD Walk current stack Walk current session

41
Shared heap Cleanliness: Object graph 41 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a Walk current stack Walk current session

42
Cleanliness: Global Reference 42 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a

43
Cleanliness: Global Reference 43 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a Procrastinate

44
Immutable Objects Specialize exporting writes If immutable object in previous session – Copy to shared heap Immutable objects in SML do not have identity – Original object unmodified Avoid space leaks – Treat large immutable objects as mutable 44

45
Cleanliness: Summary Cleanliness allows eager exporting writes while preserving visibility invariant With Procrastination + Cleanliness, <1% of local GCs were forced Additional niceties – Completely dynamic Portable – Does not impose any restriction on the GC strategy 45

46
Evaluation 46 Variants – RB- : TLC with Procrastination and Cleanliness – RB+ : TLC with read barriers Sansom’s dual-mode GC – Cheney’s 2-space copying collection Jonker’s sliding mark-compacting – Generational, 2 generations, No aging Target Architectures: – 16-core AMD Opteron server (NUMA) – 48-core Intel SCC (non-cache coherent) – 864-core Azul Vega3

47
Results Speedup: At 3X min heap size, RB- faster than RB+ – AMD 32% (2X faster than STW collector) – SCC 20% – AZUL 30% Concurrency – During exporting write, 8 runnable user-level threads/core! 47

48
Cleanliness Impact RB- MU- : RB- GC ignoring mutability for Cleanliness RB- CL- :RB- GC ignoring Cleanliness (Only Procrastination) 48 Avg. slowdown % 28.2% 31.7%

49
Conclusion Eliminate the need for read barriers by preserving the visibility invariant – Procrastination: Exploit concurrency for delaying exporting writes – Cleanliness: Exploit generational property for eagerly perform exporting writes 49

50
Questions? 50

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google