Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.

Similar presentations


Presentation on theme: "A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao."— Presentation transcript:

1 A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao

2 Outline Introduction Background and definitions Theoretical algorithm Extended algorithm Evaluation Conclusion

3 Introduction First garbage collectors: – Non-incremental, non-parallel Recent collector – Incremental – Concurrent – Parallel

4 Introduction Scalably parallel and real-time collector – All aspects of the collector are incremental – Parallel Arbitrary number of application and collector threads – Tight theoretical bounds on Pause time for any application Total memory usage – Asymptotically but not practically efficient

5 Introduction Extended collector algorithm – Work with generations – Increase the granularity of the incremental steps – Separately handle global variables – Delay the copy on write – Reduce the synchronization cost of copying small objects – Parallelize the processing of large objects – Reduce double allocation during collection – Allow program stacks

6 Background and Definitions A semispace Stop-Copy Collector – Divide heap memory into two equally-sized From-space and to-space – Suspend mutator and copy reachable objects to the to-space when from-space is full – Update root values and reversing the role of from- space and to-space

7 Background and Definitions Types of Garbage Collectors

8 Background and Definitions Type of Garbage Collector (continued)

9 Background and Definitions Real-time Collector – Maximum pause time – Utilization The fraction of time that the mutator executes – Minimum Mutator Utilization A function of window size Minimum utilization at all windows of that size = 0 when window size <= maximum pause time

10 Theoretical Algorithm A Parallel, incremental and concurrent collector – Base on Cheney’s simple copying collector – All objects are stored in a shared global pool of memory – Two atomic instruction FetchAndAdd CompareAndSwap – Collector interfaces with the application Allocating space for a new object Initializing the fields of a new object Modifying the field of an existing object

11 Theoretical Algorithm Scalable Parallelism – Maintain the set of gray objects – Cheney’s technique Keeping them in contiguous locations in to-space Pros – Simple Cons – Restricts the traversal order to breadth-first – Difficult to implement in a parallel setting

12 Theoretical Algorithm Scalable Parallelism (continued) – Explicitly managed local stack Each processor maintains a stack A shared stack of gray objects Periodically transfer gray objects between local and shared stack Avoid idleness – Pushes (or pops) can proceed in parallel Reserve a target region before transfer Pushes and pops are not concurrent Room sychronization

13 Theoretical Algorithm Scalable Parallelism (continued) – Avoid white objects being copied twice Exclusive access by atomic instructions Copy-copy synchronization

14 Theoretical Algorithm Incremental and Replicating Collection – Baker’s incremental collector Copy k units of data when allocate a unit of data – Bound the pause time Mutator can only see copied objects in to-space – A read barrier is needed – Modification to avoid the read barrier Mutator can only see the original objects in from-space – A write barrier is needed

15 Theoretical Algorithm Concurrency – Program and collector execute simultaneously – Program manipulate primary memory graph – Collector manipulate replica graph – A copy-write synchronization is needed Replica objects should be modified correspondently Avoid race condition – Mark objects being copied – Mutator’s update to replica should be delay – A write-write synchronization is needed Prohibit different mutator threads from modifying the same memory location concurrently

16 Theoretical Algorithm Space and Time Bounds – Time bounds on each memory operation ck – C : a constant – K: the number of words we collect per word allocated – Space bounds 2(R(1+1.5/k)+N+5PD) ≈ 2(R(1+1.5/k) – R: reachable space – N: maximum object count – P: P-way multiprocessor – D: maximum memory graph depth

17 Extended Algorithm Globals, Stacks and Stacklets – Globals Updated when collection ends Arbitrary many -> unbound time Replicate globals like other heap objects Every global has two location A single flag is used for all globals – Stacks and Stacklets Divided stacks into fixed-size stacklets At most one stacklet is active and the other can be replicated savely Also bound the waste space per stack

18 Extended Algorithm Granularity – Block Allocation and Free Initialization Avoid calling FetchAndAdd for every memory allocation Each processor maintain a local pool in from-space and a local pool in to-space when collector is on Using a FetchAndAdd when allocating a local pool – Write Barrier Avoid updating copied objects every time Record a triple in a write log and defer Invoke the collector when the write log is full Eliminating frequent context switches

19 Extended Algorithm Small and Large Objects – Original Algorithm One field at a time – Reinterpretation of the tag word – Transferring the object from and to the local stack – Extended Algorithm Small objects – Locked down and copied at a time Large objects – Divided into segments – One segment at a time

20 Extended Algorithm Algorithmic Modifications – Reducing double allocation One allocation by mutator and one by collector Deferring the double allocation – Rooms and Better Rooms A push room and a pop room Only one room can be non-empty Rooms – Enter the pop room, fetch work and perform, transition to the push room, push objects back to the shared stack – Graying objects is time-consuming – Wait for entering the push room

21 Extended Algorithm Algorithm modifications – Rooms and Better Rooms (continued) Better rooms – Leave the pop room after fetching work from shared stack – Detect the shared stack is empty by maintaining a borrow counter – Generational Collection Nursery and tenured space Trigger a minor collection when nursery space is full Trigger a major collection when tenured space is full Tenured references might not be modified during collection Hold two fields for mutable pointer – one for mutator to use, the other for collector to update

22 Evaluation

23

24

25

26

27

28

29 Conclusion Implements a scalably parallel, concurrent, real-time garbage collector Thread synchronization is minimized


Download ppt "A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao."

Similar presentations


Ads by Google