380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads & experimental methodology –Alias analysis –Dependence analysis –Loop transformations –EDGE architectures 1
2 Today Garbage Collection –Why use garbage collection? –What is garbage? Reachable vs live, stack maps, etc. –Allocators and their collection mechanisms Semispace Marksweep Performance comparisons –Incremental age based collection Write barriers: Friend or foe? Generational Beltway –More performance
3 Basic VM Structure Executing Program Program/Bytecode Dynamic Compilation Subsystem Class Loader Verifier, etc. Heap Thread Scheduler Garbage Collector
4 True or False? Real programmers use languages with explicit memory management. –I can optimize my memory management much better than any garbage collector
5 True or False? Real programmers use languages with explicit memory management. –I can optimize my memory management much better than any garbage collector –Scope of effort?
6 Why Use Garbage Collection? Software engineering benefits –Less user code compared to explict memory management (MM) –Less user code to get correct –Protects against some classes of memory errors No free(), thus no premature free(), no double free(), or forgetting to free() Not perfect, memory can still leak –Programmers still need to eliminate all pointers to objects the program no longer needs Performance: space time tradeoff –Time proportional to dead objects (explicit mm, reference counting) or live objects (semispace, marksweep) –Throughput versus pause time Less frequent collection, typically reduces total time but increases space requirements and pause times –Hidden locality benefits?
7 What is Garbage? In theory, any object the program will never reference again –But compiler & runtime system cannot figure that out In practice, any object the program cannot reach is garbage –Approximate liveness with reachability Managed languages couple GC with “safe” pointers –Programs may not access arbitrary addresses in memory –The compiler can identify and provide to the garbage collector all the pointers, thus –“Once garbage, always garbage” –Runtime system can move objects by updating pointers –“Unsafe” languages can do non-moving GC by assuming anything that looks like a pointer is one.
8 Reachability stackglobalsregisters heap A B C {.... r0 = obj PC -> p.f = obj.... Compiler produces a stack-map at GC safe-points and Type Information Blocks GC safe points: new(), method entry, method exit, & back- edges (thread switch points) Stack-map: enumerate global variables, stack variables, live registers -- This code is hard to get right! Why? Type Information Blocks: identify reference fields in objects
9 Reachability stackglobalsregisters heap A B C {.... r0 = obj PC -> p.f = obj.... Compiler produces a stack-map at GC safe-points and Type Information Blocks Type Information Blocks: identify reference fields in objects for each type i (class) in the program, a map 302 TIB i
10 Reachability Tracing collector (semispace, marksweep) –Marks the objects reachable from the roots live, and then performs a transitive closure over them stackglobalsregisters heap A B C {.... r0 = obj PC -> p.f = obj.... mark
11 Reachability Tracing collector (semispace, marksweep) –Marks the objects reachable from the roots live, and then performs a transitive closure over them stackglobalsregisters heap A B C {.... r0 = obj PC -> p.f = obj.... mark
12 Reachability Tracing collector (semispace, marksweep) –Marks the objects reachable from the roots live, and then performs a transitive closure over them stackglobalsregisters heap A B C {.... r0 = obj PC -> p.f = obj.... mark
13 Reachability Tracing collector (semispace, marksweep) –Marks the objects reachable from the roots live, and then performs a transitive closure over them All unmarked objects are dead, and can be reclaimed stackglobalsregisters heap A B C {.... r0 = obj PC -> p.f = obj.... mark
14 Reachability Tracing collector (semispace, marksweep) –Marks the objects reachable from the roots live, and then performs a transitive closure over them All unmarked objects are dead, and can be reclaimed stackglobalsregisters heap A B C {.... r0 = obj PC -> p.f = obj.... sweep
15 Today Garbage Collection –Why use garbage collection? –What is garbage? Reachable vs live, stack maps, etc. –Allocators and their collection mechanisms Semispace Marksweep Performance comparisons –Incremental age based collection Write barriers: Friend or foe? Generational Beltway –More performance
16 Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live heap to spacefrom space
17 Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live heap to spacefrom space
18 Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live heap to spacefrom space
19 Semispace Fast bump pointer allocation Requires copying collection Cannot incrementally reclaim memory, must free en masse Reserves 1/2 the heap to copy in to, in case all objects are live heap to spacefrom space
20 Semispace Mark phase: –copies object when collector first encounters it –installs forwarding pointers heap from spaceto space
21 Semispace Mark phase: –copies object when collector first encounters it –installs forwarding pointers –performs transitive closure, updating pointers as it goes heap from spaceto space
22 Semispace Mark phase: –copies object when collector first encounters it –installs forwarding pointers –performs transitive closure, updating pointers as it goes heap from spaceto space
23 Semispace Mark phase: –copies object when collector first encounters it –installs forwarding pointers –performs transitive closure, updating pointers as it goes heap from spaceto space
24 Semispace Mark phase: –copies object when collector first encounters it –installs forwarding pointers –performs transitive closure, updating pointers as it goes –reclaims “from space” en masse heap from spaceto space
25 Semispace Mark phase: –copies object when collector first encounters it –installs forwarding pointers –performs transitive closure, updating pointers as it goes –reclaims “from space” en masse –start allocating again into “to space” heap from spaceto space
26 Semispace Mark phase: –copies object when collector first encounters it –installs forwarding pointers –performs transitive closure, updating pointers as it goes –reclaims “from space” en masse –start allocating again into “to space” heap from spaceto space
27 Semispace Notice: 4fast allocation 4locality of contemporaneously allocated objects 4locality of objects connected by pointers 8wasted space heap from spaceto space
28 Marksweep Free-lists organized by size –blocks of same size, or –individual objects of same size Most objects are small < 128 bytes free lists... heap
29 Marksweep Allocation –Grab a free object off the free list free lists... heap
30 Marksweep Allocation –Grab a free object off the free list free lists... heap
31 Marksweep Allocation –Grab a free object off the free list free lists... heap
32 Marksweep Allocation –Grab a free object off the free list –No more memory of the right size triggers a collection –Mark phase - find the live objects –Sweep phase - put free ones on the free list free lists... heap
33 Marksweep Mark phase –Transitive closure marking all the live objects Sweep phase –sweep the memory for free objects populating free list free lists... heap
34 Marksweep Mark phase –Transitive closure marking all the live objects Sweep phase –sweep the memory for free objects populating free list free lists... heap
35 Marksweep Mark phase –Transitive closure marking all the live objects Sweep phase –sweep the memory for free objects populating free list free lists... heap
36 Marksweep Mark phase –Transitive closure marking all the live objects Sweep phase –sweep the memory for free objects populating free list –can be made incremental by organizing the heap in blocks and sweeping one block at a time on demand free lists... heap
37 Marksweep 4space efficiency 4Incremental object reclamation 8relatively slower allocation time 8poor locality of contemporaneously allocated objects free lists... heap
38 How do these differences play out in practice? Marksweep 4space efficiency 4Incremental object reclamation 8relatively slower allocation time 8poor locality of contemporaneously allocated objects Semispace 4fast allocation 4locality of contemporaneously allocated objects 4locality of objects connected by pointers 8wasted space
39 Methodology [SIGMETRICS 2004] Compare Marksweep (MS) and Semispace (SS) Mutator time, GC time, total time Jikes RVM & MMTk replay compilation measure second iteration without compilation Platforms 1.6GHz G5 (PowerPC 970) 1.9GHz AMD Athlon GHz Intel P4 Linux with perfctr patch & libraries – Separate accounting of GC & Mutator counts SPECjvm98 & pseudojbb
40 Allocation Mechanism Bump pointer – ~70 bytes IA32 instructions, 726MB/s Free list – ~140 bytes IA32 instructions, 654MB/s Bump pointer 11% faster in tight loop – < 1% in practical setting – No significant difference (?)
41 Mutator Time
42 jess
43 jess
44 jess
45 jess
46 javac
47 pseudojbb
48 Geometric Mean Mutator Time
49 Garbage Collection Time
50 Garbage Collection Time Geometric mean pseudojbb jess javac
51 Total Time
52 Total Time pseudojbb Geometric mean jess javac
53 MS/SS Crossover: 1.6GHz PPC
54 MS/SS Crossover: 1.9GHz AMD
55 MS/SS Crossover: 2.6GHz P4
56 MS/SS Crossover: 3.2GHz P4
57 MS/SS Crossover 2.6GHz 1.9GHz 1.6GHz localityspace 3.2GHz
58 Today Garbage Collection –Why use garbage collection? –What is garbage? Reachable vs live, stack maps, etc. –Allocators and their collection mechanisms Semispace Marksweep Performance comparisons –Incremental age based collection Enabling mechanisms –write barrier & remembered sets Heap organizations –Generational –Beltway –Performance comparisons
59 One Big Heap? Pause times –it takes to long to trace the whole heap at once Throughput –the heap contains lots of long lived objects, why collect them over and over again? Incremental collection –divide up the heap into increments and collect one at a time. Increment 1 Increment 2 to spacefrom spaceto spacefrom space
60 Incremental Collection Ideally perfect pointer knowledge of live pointers between increments requires scanning whole heap, defeats the purpose to spacefrom spaceto spacefrom space Increment 1 Increment 2
61 Incremental Collection Ideally perfect pointer knowledge of live pointers between increments requires scanning whole heap, defeats the purpose to spacefrom spaceto spacefrom space Increment 1 Increment 2
62 Incremental Collection to spacefrom spaceto spacefrom space Increment 1 Increment 2 Ideally perfect pointer knowledge of live pointers between increments requires scanning whole heap, defeats the purpose
63 Incremental Collection Ideally perfect pointer knowledge of live pointers between increments requires scanning whole heap, defeats the purpose Mechanism: Write barrier records pointers between increments when the mutator installs them, conservative approximation of reachability to spacefrom spaceto spacefrom space Increment 1 Increment 2
64 Write barrier compiler inserts code that records pointers between increments when the mutator installs them // original program // compiler support for incremental collection p.f = o; if (incr(p) != incr(o) { remembered set (incr(o)) U p.f; } p.f = o; to spacefrom spaceto spacefrom space Increment 1 Increment 2 remset 1 ={w}remset 2 ={f,g} a b c d e f gt u v w x y z
65 Write barrier Install new pointer d -> v // original program // compiler support for incremental collection p.f = o; if (incr(p) != incr(o) { remembered set (incr(o)) U p.f; } p.f = o; to spacefrom spaceto spacefrom space Increment 1 Increment 2 remset 1 ={w}remset 2 ={f,g} a b c d e f gt u v w x y z
66 Write barrier Install new pointer d -> v, then update d-> y // original program // compiler support for incremental collection p.f = o; if (incr(p) != incr(o) { remembered set (incr(o)) = p.f; } p.f = o; to spacefrom spaceto spacefrom space Increment 1 Increment 2 remset 1 ={w}remset 2 ={f,g,d} a b c d e f gt u v w x y z
67 Write barrier Install new pointer d -> v, then update d-> y // original program // compiler support for incremental collection p.f = o; if (incr(p) != incr(o) { remembered set (incr(o)) = p.f; } p.f = o; to spacefrom spaceto spacefrom space Increment 1 Increment 2 remset 1 ={w}remset 2 ={f,g,d,d} a b c d e f gt u v w x y z
68 Write barrier At collection time collector re-examines all entries in the remset for the increment, treating them like roots Collect Increment 2 to spacefrom spaceto spacefrom space Increment 1 Increment 2 remset 1 ={w}remset 2 ={f,g,d,d} a b c d e f gt u v w x y z
69 Write barrier At collection time collector re-examines all entries in the remset for the increment, treating them like roots Collect Increment 2 to spacefrom spaceto spacefrom space Increment 1 Increment 2 remset 1 ={w}remset 2 ={f,g,d,d} a b c d e f gt u v w x y z
70 Summary of the costs of incremental collection write barrier to catch pointer stores crossing boundaries remsets to store crossing pointers processing remembered sets at collection time excess retention to spacefrom spaceto spacefrom space Increment 1 Increment 2 remset 1 ={w}remset 2 ={f,g,d,d} a b c d e f gt u v w x y z
71 Heap Organization What objects should we put where? Generational hypothesis –young objects die more quickly than older ones [Lieberman & Hewitt’83, Ungar’84] –most pointers are from younger to older objects [Appel’89, Zorn’90] ÜOrganize the heap in to young and old, collect young objects preferentially to space from space Young Old
72 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
73 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
74 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
75 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
76 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
77 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
78 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
79 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
80 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
81 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
82 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to space from space Young Old
83 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces –Generalizing to m generations if space n < m fills up, collect n through n-1 to spacefrom spaceto space Young Old
84 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces - ignore remembered sets –Generalizing to m generations if space n < m fills up, collect n through n-1 to spacefrom spaceto space Young Old
85 Generational Heap Organization Divide the heap in to two spaces: young and old Allocate in to the young space When the young space fills up, –collect it, copying into the old space When the old space fills up –collect both spaces - ignore remembered sets –Generalizing to m generations if space n < m fills up, collect 1 through n-1 to spacefrom spaceto space Young Old
86 Generational Write Barrier Unidirectional barrier 4record only older to younger pointers 8no need to record younger to older pointers, since we never collect the old space independently most pointers are from younger to older objects [Appel’89, Zorn’90] track the barrier between young objects and old spaces to space from space Young Old address barrier
87 Generational Write Barrier to space from space Young Old unidirectional boundary barrier // original program // compiler support for incremental collection p.f = o; if (p > barrier && o < barrier) { remset nursery U p.f; } p.f = o;
88 Generational Write Barrier Unidirectional 4record only older to younger pointers 8no need to record younger to older pointers, since we never collect the old space independently –most pointers are from younger to older objects [Appel’89, Zorn’90] –most mutations are to young objects [Stefanovic et al.’99] to space from space Young Old
89 Results
90 Garbage Collection Time
91 Mutator Time
92 Total Time
McKinley, UT Recap Copying improves locality Incrementality improves responsiveness Generational hypothesis –Young objects: Most very short lived Infant mortality: ~90% die young (within 4MB of alloc) –Old objects: most very long lived (bimodal) Mature morality: ~5% die each 4MB of new allocation Help from pointer mutations –In Java, pointers go in both directions, but older to younger pointers across many objects are rare less than 1% –Most mutations among young objects 92 to 98% of pointer mutations
380C Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Can we get mutator locality, space efficiency, and collector efficiency all in one collector? –Read: Blackburn and McKinley, Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp , Tucson AZ, June –Why you need to care about workloads & methodology –Alias analysis –Dependence analysis –Loop transformations –EDGE architectures 94