Presentation is loading. Please wait.

Presentation is loading. Please wait.

Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University.

Similar presentations

Presentation on theme: "Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University."— Presentation transcript:

1 Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University

2 Platform assumptions Symmetric multi-processor (SMP/CMP) Multiple mutator threads (Large heaps)

3 Desirable properties Maximize throughput Minimize collector pauses Scalability

4 Exploiting parallelism Avoid contention (Mostly-)Concurrent allocation (Mostly-)Concurrent collection

5 Concurrent allocation Use thread-private allocation “pages” Threads contend for free pages Each thread allocates from its own page multiple small objects per page, or multiple pages per large object

6 Concurrent collection: The tricolour abstraction Black “live” scanned cannot refer to white Grey “live” wavefront still to be scanned may refer to any color White hypothetical garbage

7 Garbage collection White = whole heap Shade root targets grey While grey nonempty Shade one grey object black Shade its white children grey At end, white objects are garbage

8 Copying collection Partition white from black by copying Reclaim white partition wholesale At next GC, “flip” black to white

9 Mutator threads Incremental collection

10 Mutator threads Concurrent collection Background GC thread

11 Concurrent mutators Mutation changes reachability during GC Loss of black/grey reference is safe Non-white object losing its last reference will be garbage at next GC New reference from black to white New reference may make target live Collector may never see new reference Mutations may require compensation

12 Compensation options Prevent mutator from creating black-to- white references write barrier on black read barrier on grey to prevent mutator obtaining white refs Prevent destruction of any path from a grey object to a white object without telling GC write barrier on grey

13 Mostly-copying GC [Bartlett] Copying collection with ambiguous roots Uncooperative compilers Untidy references Explicit pinning Pin ambiguously-referenced objects Shade their page grey without copying Assume heap accuracy Copy remaining heap-referenced objects

14 Incremental MCGC [DeTreville] Enforce grey mutator invariant –STW greys ambiguously-referenced pages –Read barrier on grey using VM page protection Read barrier –Stop mutator threads –Unprotect page –Copy white targets to grey –Shade page black –Restart threads Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)

15 Concurrent MCGC? Stopping all threads at each increment is prohibitive on SMP & impedes concurrency BUT barriers difficult to place on ambiguous references with uncooperative compilers ALSO Preemptive scheduling may break wrapper atomicity

16 Mostly-concurrent MCGC Enforce black mutator invariant STW blackens ambiguously-referenced pages Read barrier on load of accurate (tidy) grey reference Read barrier: Blacken grey references as they are loaded No system call wrappers: arguments are always black

17 Read barrier on load of grey Object header bit marks grey objects Inline fast path checks grey bit in target header, calls out to slow path if set Out-of-line slow path: Lock heap meta-data For each (grey) source object in target page Copy white targets to grey Clear grey header bit Shade target page black Unlock heap meta-data

18 Coherence for fast path STW phase synchronizes mutators’ views of heap state Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW Mutators can never see a cleared grey header unless the page is also black Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize

19 Implementation Modula-3: gcc-based compiler back-end No tricky target-specific stack-maps Compiler front-end emits barriers M3 threads map to preemptively-scheduled POSIX pthreads Stop/start threads: signals + semaphores, or OS primitives if available Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF

20 Experiments Parallelized GCOld benchmark to permit throughput measurements for multiple mutators Measures steady-state GC throughput 2 platforms: 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6

21 Read Barriers: STW 1 user-level mutator thread, work=1

22 Elapsed time (s) 1 system-level mutator thread, work=1

23 Heap size 1 system-level mutator thread

24 BMU 1 system-level mutator thread, work=1000, ratio=1

25 Scalability work=1000, ratio=1, 8xP3

26 Java Hotspot server work=1000, 8xP3

27 Conclusions Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of- existence) Performance is good (scalable) Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads

28 Future work Convert read barrier to “clean” only target object instead of whole page

29 BMU 1 system-level mutator thread, work=10, ratio=1

30 Scalability work=10, ratio=1, 8xP3

31 Java Hotspot server work=10, 8xP3

Download ppt "Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University."

Similar presentations

Ads by Google