1 Lecture 22 Garbage Collection Mark and Sweep, Stop and Copy, Reference Counting Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
Dynamic Memory Management
Introduction to Memory Management. 2 General Structure of Run-Time Memory.
CMSC 330: Organization of Programming Languages Memory and Garbage Collection.
PZ10B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ10B - Garbage collection Programming Language Design.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Garbage Collection What is garbage and how can we deal with it?
CMSC 330: Organization of Programming Languages Memory and Garbage Collection.
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
Prof. Necula CS 164 Lecture 141 Run-time Environments Lecture 8.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
5. Memory Management From: Chapter 5, Modern Compiler Design, by Dick Grunt et al.
Run-time organization  Data representation  Storage organization: –stack –heap –garbage collection Programming Languages 3 © 2012 David A Watt,
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
CPSC 388 – Compiler Design and Construction
Ceg860 (Prasad)L9GC1 Memory Management Garbage Collection.
Mark and Sweep Algorithm Reference Counting Memory Related PitFalls
CS 536 Spring Automatic Memory Management Lecture 24.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
CS 1114: Data Structures – memory allocation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
CS 61C L07 More Memory Management (1) Garcia, Fall 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.
Adapted from Prof. Necula CS 169, Berkeley1 Memory Management and Debugging V Software Engineering Lecture 20.
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
CS 536 Spring Run-time organization Lecture 19.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Memory Allocation and Garbage Collection. Why Dynamic Memory? We cannot know memory requirements in advance when the program is written. We cannot know.
Garbage collection (& Midterm Topics) David Walker COS 320.
Linked lists and memory allocation Prof. Noah Snavely CS1114
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
CS61C L07 More Memory Management (1) Garcia, Spring 2008 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
CS3012: Formal Languages and Compilers The Runtime Environment After the analysis phases are complete, the compiler must generate executable code. The.
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
Storage Management. The stack and the heap Dynamic storage allocation refers to allocating space for variables at run time Most modern languages support.
Runtime Environments. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.
1 Records Record aggregate of data elements –Possibly heterogeneous –Elements/slots are identified by names –Elements in same fixed order in all records.
Garbage Collection and Memory Management CS 480/680 – Comparative Languages.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
ISBN Chapter 6 Data Types Pointer Types Reference Types Memory Management.
University of Washington Wouldn’t it be nice… If we never had to free memory? Do you free objects in Java? 1.
Memory Management -Memory allocation -Garbage collection.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
CS61C L07 More Memory Management (1) Garcia, Spring 2010 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Runtime Environments Chapter 7. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.
GARBAGE COLLECTION Student: Jack Chang. Introduction Manual memory management Memory bugs Automatic memory management We know... A program can only use.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Garbage Collection What is garbage and how can we deal with it?
Storage 18-May-18.
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 – More Memory Management Lecturer PSOE Dan Garcia
Storage Management.
Run-time organization
Concepts of programming languages
Automatic Memory Management
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Simulated Pointers.
Strategies for automatic memory management
Chapter 12 Memory Management
CS703 - Advanced Operating Systems
Garbage collection Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 7 – More Memory Management Lecturer PSOE Dan Garcia
Garbage Collection What is garbage and how can we deal with it?
Presentation transcript:

1 Lecture 22 Garbage Collection Mark and Sweep, Stop and Copy, Reference Counting Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction to Programming Languages and Compilers, Spring 2012 UC Berkeley

Lecture Outine Why Automatic Memory Management? Garbage Collection Three Techniques –Mark and Sweep –Stop and Copy –Reference Counting 2

Why Automatic Memory Management? Storage management is still a hard problem in modern programming C and C++ programs have many storage bugs –forgetting to free unused memory –dereferencing a dangling pointer –overwriting parts of a data structure by accident –and so on... Storage bugs are hard to find –a bug can lead to a visible effect far away in time and program text from the source 3

Type Safety and Memory Management Some storage bugs can be prevented in a strongly typed language –e.g., you cannot overrun the array limits Can types prevent errors in programs with manual allocation and deallocation of memory? – Some fancy type systems (linear types) were designed for this purpose but they complicate programming significantly If you want type safety then you must use automatic memory management 4

Automatic Memory Management This is an old problem: –Studied since the 1950s for LISP There are several well-known techniques for performing completely automatic memory management Until 90’s they were unpopular outside the Lisp family of languages –just like type safety used to be unpopular 5

The Basic Idea When an object that takes memory space is created, unused space is automatically allocated new objects are created by new X After a while there is no more unused space new X can no longer allocate objects Some space occupied by objects is never to be used again This space can be freed to be reused 6

The Basic Idea (Cont.) How can we tell whether an object will “never be used again”? –In general it is impossible to tell –We will have to use a heuristic to find many (but not all) objects that will never be used again Observation: a program can use only the objects that it can find: x = new A(); x = y; After x  y, no way to access the newly allocated object 7

Garbage An object x is reachable if and only if: –A register contains a pointer to x, or –Another reachable object y contains a pointer to x You can find all reachable objects by starting from registers and following all the pointers Clearly, an unreachable object can never by referred by the program These objects are called garbage 8

Reachability is an approximation Consider the program: x = new A(); y = new B(); x = y; if alwaysTrue() { x = new A() } else { x.foo(); } After x = y (assuming y becomes dead there) –The object A is not reachable anymore –The object B is reachable (through x) –Thus B is not garbage and is not collected –But object B is never going to be used 9

Tracing Reachable Values In a simple implementation, the only register is the accumulator –it points to an object –and this object may point to other objects, etc. The stack (as in 164) is more complex –each stack frame contains pointers e.g., method parameters –each stack frame also contains non-pointers e.g., return address –if we know the layout of the frame we can find the pointers in it 10

A Simple Example We start tracing from acc and stack –they are called the roots B and D are unreachable from acc or the stack. Thus we can reuse their storage. 11 A BC Frame 1 Frame 2 D Eacc SP

Elements of Garbage Collection Every garbage collector has the following steps 1.Allocate space as needed for new objects 2.When space runs out: a)Compute what objects might be used again (generally by tracing objects reachable from a set of “root” registers) b)Free the space used by objects not found in (a) Some strategies perform garbage collection before the space actually runs out 12

Mark and Sweep When memory runs out, GC executes two phases –the mark phase: traces reachable objects –the sweep phase: collects garbage objects Every object has an extra bit: the mark bit –reserved for memory management –initially the mark bit is 0 –set to 1 for the reachable objects in the mark phase 13

Mark and Sweep Example 14 A BC D FrootE free A BC D FrootE free After mark: A BC D FrootE free After sweep:

The Mark Phase let todo = { all roots } while todo   do pick v  todo todo  todo - { v } if mark(v) = 0 then (* v is unmarked yet *) mark(v)  1 let v 1,...,v n be the pointers contained in v todo  todo  {v 1,...,v n } fi od 15

The Sweep Phase The sweep phase scans the heap looking for objects with mark bit 0 –these objects have not been visited in the mark phase –they are garbage Any such object is added to the free list Objects with a set mark bit 1 have their mark bit reset 16

The Sweep Phase (Cont.) /* sizeof(p) is the size of block starting at p */ p  bottom of heap while p < top of heap do if mark(p) = 1 then mark(p)  0 else add block p...(p+sizeof(p)-1) to freelist fi p  p + sizeof(p) od 17

Details While conceptually simple, this algorithm has a number of tricky details –this is typical of GC algorithms A serious problem with the mark phase –it is invoked when we are out of space –yet it needs space to construct the todo list –the size of the todo list is unbounded so we cannot reserve space for it a priori 18

Mark and Sweep: Details The todo list is used as an auxiliary data structure to perform the reachability analysis There is a trick that allows the auxiliary data to be stored in the objects themselves –pointer reversal: when a pointer is followed it is reversed to point to its parent Similarly, the free list is stored in the free objects themselves 19

Mark and Sweep. Evaluation Space for a new object is allocated from the new list –a block large enough is picked –an area of the necessary size is allocated from it –the left-over is put back in the free list Mark and sweep can fragment the memory Advantage: objects are not moved during GC –no need to update the pointers to objects –works for languages like C and C++ 20

Another Technique: Stop and Copy Memory is organized into two areas –Old space: used for allocation –New space: used as a reserve for GC 21 old spacenew space heap pointer The heap pointer points to the next free word in the old space. Allocation just advances the heap pointer

Stop and Copy Garbage Collection Starts when the old space is full Copies all reachable objects from old space into new space –garbage is left behind –after the copy phase the new space uses less space than the old one before the collection After the copy the roles of the old and new spaces are reversed and the program resumes 22

Stop and Copy Garbage Collection. Example 23 A BCDFrootE Before collection: new space ACF root new space After collection: free heap pointer

Implementation of Stop and Copy We need to find all the reachable objects, as for mark and sweep As we find a reachable object we copy it into the new space –And we have to fix ALL pointers pointing to it! As we copy an object we store in the old copy a forwarding pointer to the new copy –when we later reach an object with a forwarding pointer we know it was already copied 24

Implementation of Stop and Copy (Cont.) We still have the issue of how to implement the traversal without using extra space. The following trick solves the problem: –partition the new space in three contiguous regions 25 copied and scanned scan copied objects whose pointer fields were followed and fixed copied objects whose pointer fields were NOT followed empty copied allocstart

Stop and Copy. Example (1) Before garbage collection 26 A BCDFrootEnew space start scan alloc

Stop and Copy. Example (3) Step 1: Copy the objects pointed by roots and set forwarding pointers (dotted arrow) 27 A BCDFroot EA start scan alloc

Stop and Copy. Example (3) Step 2: Follow the pointer in the next unscanned object (A) –copy the pointed objects (just C in this case) –fix the pointer in A –set forwarding pointer 28 A BCDFroot EA scan alloc C start

Stop and Copy. Example (4) Follow the pointer in the next unscanned object (C) –copy the pointed objects (F in this case) 29 A BCDFroot EA scan alloc CF start

Stop and Copy. Example (5) Follow the pointer in the next unscanned object (F) –the pointed object (A) was already copied. Set the pointer same as the forwading pointer 30 A BCDFroot EA scan alloc CF start

Stop and Copy. Example (6) Since scan caught up with alloc we are done Swap the role of the spaces and resume the program 31 root A scan alloc CFnew space

The Stop and Copy Algorithm while scan <> alloc do let O be the object at scan pointer for each pointer p contained in O do find O’ that p points to if O’ is without a forwarding pointer copy O’ to new space (update alloc pointer) set 1st word of old O’ to point to the new copy change p to point to the new copy of O’ else set p in O equal to the forwarding pointer fi end for increment scan pointer to the next object od 32

Stop and Copy. Details. As with mark and sweep, we must be able to tell how large is an object when we scan it –And we must also know where are the pointers inside the object We must also copy any objects pointed to by the stack and update pointers in the stack –This can be an expensive operation 33

Stop and Copy. Evaluation Stop and copy is generally believed to be the fastest GC technique Allocation is very cheap –Just increment the heap pointer Collection is relatively cheap –Especially if there is a lot of garbage –Only touch reachable objects But some languages do not allow copying (C, C++) 34

Why Doesn’t C Allow Copying? Garbage collection relies on being able to find all reachable objects –And it needs to find all pointers in an object In C or C++ it is impossible to identify the contents of objects in memory –E.g., how can you tell that a sequence of two memory words is a list cell (with data and next fields) or a binary tree node (with a left and right fields)? –Thus we cannot tell where all the pointers are 35

Conservative Garbage Collection But it is Ok to be conservative: –If a memory word looks like a pointer it is considered a pointer it must be aligned it must point to a valid address in the data segment –All such pointers are followed and we overestimate the reachable objects But we still cannot move objects because we cannot update pointers to them –What if what we thought to be a pointer is actually an account number? 36

Reference Counting Rather that wait for memory to be exhausted, try to collect an object when there are no more pointers to it Store in each object the number of pointers to that object –This is the reference count Each assignment operation has to manipulate the reference count 37

Implementation of Reference Counting new returns an object with a reference count of 1 If x points to an object then let rc(x) refer to the object’s reference count Every assignment x = y must be changed: rc(y) = rc(y) + 1 rc(x) = rc(x) - 1 if(rc(x) == 0) then mark x as free x = y 38

Reference Counting. Evaluation Advantages: –Easy to implement –Collects garbage incrementally without large pauses in the execution Disadvantages: –Manipulating reference counts at each assignment is very slow –Cannot collect circular structures 39

Garbage Collection. Evaluation Automatic memory management avoids some serious storage bugs But it takes away control from the programmer –e.g., layout of data in memory –e.g., when is memory deallocated Most garbage collection implementation stop the execution during collection –not acceptable in real-time applications 40

Garbage Collection. Evaluation Garbage collection is going to be around for a while Researchers are working on advanced garbage collection algorithms: –Concurrent: allow the program to run while the collection is happening –Generational: do not scan long-lived objects at every collection –Parallel: several collectors working in parallel 41