Presentation on theme: "Garbage Collecting the World. --Bernard Lang, Christian and Jose Presented by Shikha Khanna coen 317 Date – May25’ 2005."— Presentation transcript:
Garbage Collecting the World. --Bernard Lang, Christian and Jose Presented by Shikha Khanna coen 317 Date – May25’ 2005
Index Introduction Terminology Basic Algorithm Handling Failures Group Contention Modified marking scheme Conclusion
Introduction Computations performed by collection of processes are more and more common today. Ex - distributed symbolic computations - distributed databases This involves existence of remote references i.e objects at distant node referencing memory in each others address space which leads to unused memory. The paper presents a distributed garbage collection algorithm to remove such unused memory.
Terminology Node – Processor or a process on a processor able to manage its own memory space. Mutator – process that allocates chunks of memory (cells). Cells contain references to cells in same or other nodes. Root – each node contains roots which are references to memory (cells) it considers useful. Ex – all cell references in the cpu registers or in an execution stack are roots. Reachable – cells referenced to by root directly or indirectly through other cells are live. Unreachable – waste or unused memory
Terminology (contd) Local reference – reference to cell (memory) on the same node or processor. Remote reference – reference to cell on another node. Entry & exit items – a remote reference to cell is represented by a reference to an exit item on the same node, which references an entry item on another node.
The basic Algorithm Group Negotiation Initial marking Local propagation Global propagation Stabilization Dead cycles removal Group disbanding
Group Negotiation When a node decides to participate to new group GC, it first determines what group can be set up for that purpose. Why? It could have been idle or its entry items may not have been accessed for a long time or it was not currently involved in any group GC. How? Groups can be created based on geographic distances. Once created, a unique identifier is associated with each group GC and is made known to each of the nodes in the group.
Initial Marking Within each group, entry items have a mark( soft or hard). Hard – entry item referenced outside the group or roots. Soft – referenced from within group. Christopher’s algorithm used for marking – look at reference counter for entry item. Say it is K. If the number of exit items referencing that entry item within the group is K, then entry is marked soft. Else there is a reference from outside the group so mark it as Hard.
Local Propagation Local GC are responsible for propagation of marks from entry items to exit items they reference locally, directly or indirectly. 2 phase marking – Trace from entry items marked hard as well as from root set. Any exit item reached from this tracing is marked hard. – Start from soft entry items and mark all exit items reached by this tracing as soft
Hard marks of exit items are propagated to their corresponding entry items within the group. Global Propagation
Stabilization Group is said to be stable if – Nodes are stable, i.e they have no new data that could justify hardening more entry items locally or elsewhere in the group. – No new messages in transit that request the hardening of some entry item. – How? Group stability can be detected by any node termination detection algorithm.
Dead cycles removal After stabilization, – Remove all entry items marked as soft.
Group disbanding When a group GC is finished, its associated group may be disbanded. All data structures relative to this group can then be reclaimed.
Failure Handling A node can detect failure of other nodes based on acks and time-outs. A node that detect failure can – Decide that it is temporary and wait for failed node to wake up. – Re-organize the group i.e create a new group excluding the failed node.
Failure Handling(2) A transmission link may fail and divide a group into subgroups. These subgroups start independent GC. (result – all dead memory will be cleaned)
Failure Handling(3) When a node has a non-recoverable failure – What happens to entry referenced by failed node? Group G – calculate the number of entries and reference count of each entry. Suppose entry A has 4 references. Group G’ (G – failed node) do the same. Entry A now has 3 references. So A was being referenced by the failed node. Send a decrement message to that entry
Simultaneous Group Collections A node may belong to more than one group. Aim – The results obtained by a local garbage collector on a node can be used in other groups to which the node belongs. i.e markings can be used across the groups. Adv – In a large n/w with variations in network connectivity and communication speed, GC is much more faster and efficient if groups are broken down into sub groups.
Group Contention(1) Consider a subgroup G’ of G. If an entry item is marked hard in G then it can also be marked hard in G’. However some entry items that are soft w.r.t G will be hard w.r.t G’. G G’ x x x x x x x x - GC of any one group can take place at a time. Problem – Markings cannot be used across groups
Group Contention (2) Conversely if a local Garbage Collector works for the group GC of G’, its hard marking cannot be used for G. However soft markings may be used. G G’ x x x x x x x x
Group Contention (3) The situation is worse if a node belongs to overlapping groups. The markings (hard/soft) of local garbage collector with group A or B cannot be used at all for the other. AB
Contd… This necessitates of having strictly hierarchical embedding of groups to avoid contention over the services of local garbage collector.
Hierarchical cooperation of group GCs Soln to the group contention problem by modifying the marking scheme. Groups are organized in a strictly hierarchical order by inclusion. Each group is assigned a level index – number of groups it is strictly embedded in. – Universal group – level index 0
Contd.. In this scheme instead of binary hard and soft marks, we use integer marking scheme. REST OF ALGO REMAINS SAME. Mark for a node entry is the least level for which the entry could be marked hard. For group 3, left side is hard and right side is soft. H s 3 1 2
Marking scheme - Entry at N1 is hard w.r.t G2 - Entry at N1 is soft w.r.t G1 - Marking of entry is 2 N1 g0 g1 g2 g3 X X x
Contd.. Local Propagation – Instead of prop H/S marks levels are prop from entry to exit items. Rest of the algorithm is exactly SAME.
Final GC N1 g0 g1 g2 g3 X X x After prop of all marks, all Entry items With mark > 2, are soft w.r.t G2 hence Garbage collected.
Conclusion Thus we saw a distributed GC algo which is – Fault tolerant. – Does not need a centralized control. – Allows for multiple concurrent active GCs. – Eventually reclaims all inaccessible objects including distributed cycles.