Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)

Similar presentations


Presentation on theme: "Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)"— Presentation transcript:

1 Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)

2 Incremental GC for soft-realtime applications [Steele 75] [Yuasa 90] [Doligez 93] Target: Multimedia, game etc. – Pauses should be <10ms Collection tasks are divided into small pieces Success: Pauses of <5ms [Cheng 01] – They assume compiler cooperation Reduction of pause for ‘ conservative ’ GCs is insufficient

3 Conservative GC [Boehm et al. 88] Mark sweep GC for C/C++ programs No compiler cooperation (e.g., write barriers) Mostly parallel GC [Boehm et al. 91] Incremental, conservative Pauses >100ms fairly common

4 Write barriers in conservative GCs No fine-grain write barrier by compiler VM ’ s write protection Coarse grain – Page level – Detect only first update after protection Restrict design

5 Incremental mark sweep algorithms Snapshot at beginning&DLG [Yuasa 90] [Doligez 93] – Make (conceptual) heap snapshot before marking – Promise short pause – Large space overhead with VM write barrier Incremental update [Steele 75] [Dijkstra 78 ] – Maintain consistency after marking Need final marking before finish Unlimitedly long! Only choice With VM

6 Contributions Analyze why previous algorithms fail Propose techniques to bound pauses & guarantee progress Show a `stress-test’ benchmark: iukiller Demonstrate experimental results – < 5ms in applications – < 12ms in the stress-test benchmark (constant across all heap sizes) (This talk omits parallel issues)

7 Overview of presentation Mostly parallel GC Techniques to reduce pause time Experimental results Related work Summary

8 Mostly parallel garbage collector (1) Start GC Write-protect heap Incremental markUser write fault Remember dirty (=updated) pages addr. Unprotect Final marking Incremental sweepUser Trap handler End GC

9 Mostly parallel garbage collector (2) Second update is un-trapped – Mark r in final phase Need final marking writer p q r p qr p q

10 Final marking heap root 1. Scan all dirty pages + root 2. Mark all unmarked objects from scanned region The amount of work is unbounded # of dirty pages Objects reachable from a dirty page Makes pauses >100ms

11 Overview of presentation Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

12 Goal of our collector Bound pause time (< constant) – Mutator utilization is important, but focus on pause Guarantee progress of collection Combine two techniques: Bound dirty pages (BD) Retry incremental marking (RI)

13 Bounding dirty pages (1) Basic collector produces many dirty pages Keep # of dirty pages < a given limit – If exceeds limit, choose a dirty page – Re-protect, scan, clean it – Good: Reduce task in final marking – Bad: More protection cost

14 Bounding dirty pages (2) Is pause now bounded? … No! Unmarked objects reachable from a dirty page are not bounded heap root

15 Retrying incremental marking (1) Start GC Write-protect heap Incremental markUser Trap handler Final marking Incremental sweepUser End GC Finished before limit? Yes. No. Retry! Keep works of final marking < a given limit

16 Retrying incremental marking (2) Good: Bound length of single final marking Bad: Risk of starvation (no progress) – Final marking may abort before finishing scanning (unbounded) dirty pages – Unmarked objects may ‘ escape ’ from collector

17 The worst case Abort a final marking with no progress Final aborts write Final aborts write Incr. finishes Incr. finishes

18 Ensuring bounded pause and progress Either is insufficient … Need two techniques: – Bounding dirty pages (BD) – Retrying incremental marking (RI) BD  Every final marking can scan all dirty pages  It finds some unmarked objects, if any

19 Overview of presentation Mostly parallel garbage collector Techniques to reduce pause time Experimental results Related work Summary

20 Experimental Environments 400MHz UltraSPARC, Solaris 8 Four GCs – Stop: Stop-the-world GC – Basic: Basic incremental GC – BD: Use bounding dirty pages – BD+R: Use bounding dirty pages + retrying incremental marking Basic/BD/BD+R: GC starts when heap usage > 75% BD/BD+R: # of dirty pages < 16

21 The iukiller synthetic benchmark ‘ Stress-test ’ benchmark for mostly parallel GC Trees tend to escape from collector Final marking tends to be long root large binary trees repeat

22 Results of iukiller benchmark: the maximum pause time Previous collectors fail – > 1.8 seconds – The larger the heap, the longer BD+R achieves <12ms pause – independent from heap size

23 Application benchmarks Programs written in C/C++ – deltablue: an incremental constraint solver (25MB) – espresso: a logic optimizer for PLA (10MB) – N-Body: an N-Body solver with Barnes-Hut (15MB) – CKY: a context free grammar parser (40MB) – Cube: a Rubik ’ s cube puzzle solver (8MB)

24 Results of application benchmarks: the maximum pause time BD+R achieves <5ms pause in five applications BD is also OK (< 16ms) 215ms 283ms

25 Results of application benchmarks: overhead BD/BD+R is <9% slower than Basic – More protection All incr. GCs are 1 — 53% slower than Stop – VM write barrier – Floating garbage – More GC cycles Total execution times ( ‘ Stop ’ =1)

26 Related work [Appel et al. 88] – Copy GC with VM read barrier. Slower than write barrier [Furuso et al. 91] – Snapshot-at-beginning on VM. Large space overhead Recent version of [Boehm et al. 91] – Time limit on final marking. Risks of starvation [Printezis et al. 00] [Ossia et al. 02] – Keep # of dirty cards small. Final marking is still unbounded

27 Summary An incremental conservative GC Short pause (<5ms in 5 applications) GC progress Use both techniques: – Bounding dirty pages – Retrying incremental marking

28 Future direction Reducing overhead of BD – Strategy for proper limit for dirty pages Bounding roots to be scanned – Protect stacks partially

29 Mostly parallel garbage collector (cont. 1) Stop-the-world GC time Mostly parallel GC time User GC GC cycle Initialization &protection concurrent marking final marking concurrent sweeping markingsweeping

30 Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root

31 Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

32 Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

33 Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

34 Mostly parallel garbage collector (cont. 2) Protect heap and start marking from roots Proceed concurrent marking User program may – update pointers – create new objects Concurrent marking finishes – But some reachable objects are unmarked yet!! Perform final marking atomically from – marked objects in dirty pages – roots heap root heap root

35 Technique 2: Retrying concurrent marking Instead of a single final marking, we repeat concurrent marking and termination check – If termination check takes longer time than a given limit, it aborts and restarts concurrent marking Boehm ’ s implementation on Web repeats termination check up to twice time GC cycle Initialization concurrent marking termination check concurrent sweeping

36 Discussion on techniques Each technique is not novel, but combining the two is essential Without retrying, final marking may be still long Without bounding, progress of termination check may be insufficient w/o bounding with bounding termination check aborted termination check found unmarked objects

37 Other techniques Concurrent protecting Atomic protecting takes O(heap-size) time! Allocating black in later stages of GC cycle – Allocating always black retains many short lived objects – Allocating always white (unmarked) may prevent GC progress Allocating white first, and black in later

38 Results: Minimum mutator utilization (MMU) Window sizes are on a log scale The optimized collector shows good MMUs for small windows

39 Results of application benchmarks: the number of repetition BD+R: Repetition of incr. marking per GC Usually <2times No infinite loop The worst case is 5 times. Need improvement?

40


Download ppt "Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)"

Similar presentations


Ads by Google