Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan.

Similar presentations


Presentation on theme: "Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan."— Presentation transcript:

1 Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan

2 Big Picture 2 Lightweight user-level threads Scheduler 1 t1 t2 tn Lots of concurrency! Core 1 Core n Core 2 Heap

3 Big Picture 3 Expendable resource? Big Picture 3 Scheduler 1 t1 t2 tn Lots of concurrency! Heap

4 Big Picture 4 Expendable resource? Big Picture 4 Scheduler 1 t1 t2 tn Lots of concurrency! Heap Exploit program concurrency to eliminate read barriers from thread-local collectors GC Operation Alleviate MM cost?

5 MultiMLton Goals – Safety, Scalability, ready for future manycore processors Parallel extension of MLton SML compiler and runtime Parallel extension of Concurrent ML – Lots of Concurrency! – Interact by sending messages over first-class channels 5 C send (c, v) v  recv (c)

6 MultiMLton GC: Considerations Standard ML – functional PL with side-effects – Most objects are small and ephemeral Independent generational GC – # Mutations << # Reads Keep cost of reads to be low Minimize NUMA effects Run on non-cache coherent HW 6

7 MultiMLton GC: Design 7 Core Local Heap Core Local Heap Core Local Heap Core Local Heap Shared Heap Thread-local GC NUMA Awareness Circumvent cache-coherence issues

8 Invariant Preservation Read and write barriers for preserving invariants 8 Shared Heap r r Local Heap x x Target Source Exporting writes r := x Shared Heap r r Local Heap x x FWD Transitive closure of x Mutator needs read barriers!

9 Challenge Object reads are pervasive – RB overhead ∝ cost (RB) * frequency (RB) Read barrier optimization – Stacks and Registers never point to forwarded objects % 15.3 % 21.3 % Mean Overhead Read barrier overhead (%)

10 Mutator and Forwarded Objects 10 # RB invocations # Encountered forwarded objects < Eliminate read barriers altogether

11 RB Elimination Visibility Invariant – Mutator does not encounter forwarded objects Observation – No forwarded objects created ⇒ visibility invariant ⇒ No read barriers Exploit concurrency  Procrastination! 11

12 Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2  r1 := x1 r2 := x2 T  T is running T  T is suspended T  T is blocked 12

13 Procrastination Shared Heap r1 Local Heap x1 r1 := x1 T1T2  r2 := x2 r2 x2 Delayed write list  Control switches to T2 T  T is running T  T is suspended T  T is blocked 13

14 Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 Delayed write list  r1 := x1 r2 := x2 T  T is running T  T is suspended T  T is blocked 14

15 Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list  r1 := x1 r2 := x2 x1 T  T is running T  T is suspended T  T is blocked FWD 15 FWD

16 Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list  x1 Force local GC T  T is running T  T is suspended T  T is blocked 16  r1 := x1 r2 := x2

17 Correctness Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! 17 T1 T2 T  T is running T  T is suspended T  T is blocked

18 Correctness 18 T1 Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks! T2T  T is running T  T is suspended T  T is blocked Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock!

19 Correctness 19 T1 T2 Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! T  T is running T  T is suspended T  T is blocked Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks!

20 Efficacy (Procrastination) ∝ # Available runnable threads Is Procrastination alone enough? 20 M W1W1 W1W1 W1W1 F J Serial (low thread availability) Concurrent (high thread availability) With Procrastination, half of local major GCs were forced Eager exporting writes while preserving visibility invariant

21 Cleanliness A clean object closure can be lifted to the shared heap without breaking the visibility invariant 21 r := x inSharedHeap (r) inLocalHeap (x) && isClean (x) inLocalHeap (x) && isClean (x) Eager write (no Procrastination)

22 Cleanliness: Intuition 22 Shared Heap Local Heap x x lift (x) to shared heap

23 Shared Heap Local Heap Cleanliness: Intuition 23 x x FWD find all references to FWD

24 Shared Heap Local Heap Cleanliness: Intuition 24 x x Need to scan the entire local heap

25 Local Heap h h Shared Heap Cleanliness: Simpler question 25 x x FWD Do all references originate from heap region h? sizeof (h) << sizeof (local heap)

26 Local Heap h h Shared Heap Cleanliness: Simpler question 26 x x Only scan the heap region h. Heap session! sizeof (h) << sizeof (local heap)

27 Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 27 Previous Session Current Session Free Local Heap SessionStart Frontier Young Objects Old Objects Old Objects Start

28 Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC – SessionStart is moved to Frontier Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 28 Average session size < 4KB Previous SessionFree Local Heap Frontier & SessionStart Start

29 Cleanliness: Eager exporting writes 29 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap

30 Cleanliness: Eager exporting writes 30 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap Walk and fix FWD

31 Avoid tracing current session? Many SML objects are tree-structured (List, Tree, etc,.) – Specialize for no pointers from outside the closure ∀ x’ ∊ transitive closure (x), ref_count (x) = 0 && ref_count (x’) = 1 31 Local Heap x(0) y(1) z(1) Eager exporting write – No current session tracing needed! No refs from outside – ref_count does not consider pointers from stack or registers

32 Cleanliness: Reference Count 32 Current Session X(0) Current Session X(1) Current Session X(LM) Current Session X(G) Prev Sess Zero One LocalMany Global Purpose – Track pointers from previous session to current session – Identify tree-structured object Does not track pointers from stack and registers – Reference count only triggered during object initialization and mutation

33 Bringing it all together ∀ x’ ∊ transitive closure (x), if max (ref_count (x’)) – One & ref_count (x) = 0 ⇒ Clean tree-structured ⇒ Session tracing not needed – LocalMany ⇒ Clean ⇒ Trace current session – Global ⇒ 1+ pointer from previous session ⇒ Procrastinate 33

34 Cleanliness: Tree-structured Closure 34 Previous Session Current Session x(0) y(1) z(1) T1 Local Heap Shared heap r := x r r current stack

35 Shared heap Cleanliness: Tree-structured Closure 35 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x FWD r r Walk current stack

36 Shared heap Cleanliness: Tree-structured Closure 36 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x r r No need to walk current session!

37 Shared heap Cleanliness: Tree-structured Closure 37 Previous Session Current Session x x y y z z T1 Local Heap r := x r r T2 Next stack FWD current stack

38 Shared heap Cleanliness: Tree-structured Closure 38 Previous Session Current Session x x y y z z T1 previous stack Local Heap r := x r r T2 current stack Context Switch Walk target stack

39 Cleanliness: Object graph 39 Previous Session Current Session x(0) y ( LM ) z(1) current stack Local Heap Shared heap r := x r r a a

40 Shared heap Cleanliness: Object graph 40 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a FWD Walk current stack Walk current session

41 Shared heap Cleanliness: Object graph 41 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a Walk current stack Walk current session

42 Cleanliness: Global Reference 42 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a

43 Cleanliness: Global Reference 43 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a Procrastinate

44 Immutable Objects Specialize exporting writes If immutable object in previous session – Copy to shared heap Immutable objects in SML do not have identity – Original object unmodified Avoid space leaks – Treat large immutable objects as mutable 44

45 Cleanliness: Summary Cleanliness allows eager exporting writes while preserving visibility invariant With Procrastination + Cleanliness, <1% of local GCs were forced Additional niceties – Completely dynamic  Portable – Does not impose any restriction on the GC strategy 45

46 Evaluation 46 Variants – RB- : TLC with Procrastination and Cleanliness – RB+ : TLC with read barriers Sansom’s dual-mode GC – Cheney’s 2-space copying collection  Jonker’s sliding mark-compacting – Generational, 2 generations, No aging Target Architectures: – 16-core AMD Opteron server (NUMA) – 48-core Intel SCC (non-cache coherent) – 864-core Azul Vega3

47 Results Speedup: At 3X min heap size, RB- faster than RB+ – AMD 32% (2X faster than STW collector) – SCC 20% – AZUL 30% Concurrency – During exporting write, 8 runnable user-level threads/core! 47

48 Cleanliness Impact RB- MU- : RB- GC ignoring mutability for Cleanliness RB- CL- :RB- GC ignoring Cleanliness (Only Procrastination) 48 Avg. slowdown % 28.2% 31.7%

49 Conclusion Eliminate the need for read barriers by preserving the visibility invariant – Procrastination: Exploit concurrency for delaying exporting writes – Cleanliness: Exploit generational property for eagerly perform exporting writes 49

50 Questions? 50


Download ppt "Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan."

Similar presentations


Ads by Google