Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 1 Directions for Distributed Garbage Collection Richard.

Similar presentations


Presentation on theme: "© Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 1 Directions for Distributed Garbage Collection Richard."— Presentation transcript:

1 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Directions for Distributed Garbage Collection Richard Jones Computing Laboratory University of Kent at Canterbury Microsoft Research, Cambridge Monday 7 August 2000 ©Richard Jones, All rights reserved.

2 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Outline Motivation Model An ideal GC Why conventional taxonomies are unsatisfactory A new taxonomy What this taxonomy offers Examples DGC in practice Research directions

3 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Motivation: GC Why GC? “Illusion of infinite memory” ?? A safety net? Language requirement Problem requirement (ownership) Software engineering Liveness is a global question Modularity Abstraction

4 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Motivation: DGC Arguments above apply Liveness is now an even harder problem Open systems Location transparency Lack of control over components Fault tolerance

5 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Motivation: this talk Several previous attempts at DGC survey [abdu92,abdu98] –Quite full, little structure or rationale, ?accuracy [plai95] –Better structure but incomplete Lins in [jone96] –Short on detail Towards a well-structured, complete survey Avoid centralised GC legacy  Insight into new areas for research References:

6 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Model and terminology Processes exchange messages Failure model is fail-stop: no Byzantine failures Mutators, local collectors, distributed collectors Liveness by reachability Entry and exit items Local and global roots Local: roots for the process Global: entry items which may be reachable from a local root of another process

7 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Root set Local roots Global roots Local roots Remotely reachable

8 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Properties of an ideal DGC safety: only garbage should be reclaimed. completeness: all objects, including components of distributed cycles, that are garbage at the start of a collection cycle should be reclaimed by its end. concurrency: neither mutator nor local collector processes should be suspended; distinct distributed collection processes should run concurrently. promptness: garbage should be reclaimed promptly. efficiency: time and space costs should be minimised. locality: inter-process communication should be minimised. expediency: garbage should be reclaimed despite the unavailability of parts of the system. scalability: the collector should scale to networks of many processes. fault tolerance: it should be robust against message delay, loss or replication, or process failure.

9 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Strategy/policy/mechanism Wilson suggests classification by strategy, policy and mechanism [wils95]. Malloc example: Strategy: “don’t let small objects prevent reclamation of a larger contiguous area” Policy: best-fit Most taxonomies are based on mechanisms

10 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Conventional mechanisms Reference counting Mark-sweep Mark-compact Copying Generations LOA Single Region organisation Single Large object area Generational Generations LOA Single Concurrent Incremental Sequential Parallelism Conventional taxonomy RCMSMCCopy

11 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Consequences (almost) All direct mechanisms are variants of simple reference counting. All indirect mechanisms are tracing collectors Conventional conclusion: indirect  tracing RC cannot reclaim garbage cycles. Conventional conclusion: All complete algorithms are indirect

12 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August What’s the problem? Indirect collectors are better called “live object detectors” They are set-difference algorithms: they must provide an estimate of the set of live objects. Depending on conservatism of this estimate Not scalable — every site must participate, or Not complete — assume live if no other information Synchronisation of phases is a bottleneck

13 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Direct algorithms Direct algorithms are inherently scalable. E.g. simple RC requires cooperation of only 3 objects Only necessary to visit objects that might be garbage It is always safe for a direct algorithm to ‘give up’ early in discovering garbage At worst this defers reclamation (e.g. [weiz69])

14 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Barrier technology Appropriateness of barrier technology changes as we move from centralised to distributed systems. Read barriers are conventionally held to be expensive (as reads are much more common than writes). But this overhead is diminished in context of message passing. Combinations of read and write barriers become viable.

15 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Tricolour abstraction Black object and its immediate descendants have been visited GC has finished with black objects and need not visit again. Grey object has been visited but its components may not have been scanned. or, for an incremental/concurrent GC, the mutator has rearranged connectivity of the graph. in either case, the collector must visit them again. White object is unvisited and, at the end of the phase, garbage. A collection terminates when no grey objects remain, i.e. all live objects have been blackened.

16 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Barrier technology There are two ways to prevent the mutator from interfering with a collection by writing white pointers into black objects. 1) Ensure the mutator never sees a white object when mutator attempts to access a white object, the object is visited by the collector protect white objects with a read-barrier 2) Record where mutator writes black-white pointers GC can (re)visit modified objects protect objects with a write-barrier

17 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Additional goals Necessity for compromises scalability, fault-tolerance and efficiency may only be achievable at the expense of completeness, concurrency introduces synchronisation overheads. Lack of empirical data A further goal: F lexibility — the collector should be configurable, guided by heuristics or hints

18 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August A more appropriate taxonomy A simple, orthogonal taxonomy but captures all proposed DGC algorithms Indirect Non-tracing Tracing Direct Non-tracing Tracing Note also Louboutin’s Proactive/Reactive taxonomy [loub98] — more later

19 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Indirect, non-tracing DC Global-root graph reconstruction Liskov & Ladin algorithms [lisk86, ladi92] (replicated) Central service + Clients Client local GC passes Service lists –Acc: all non-resident objects reachable from local roots –Paths: all pairs (g1,g2) where g2 is a remote global root reachable from locally unreachable global root g1 –Trans: references in transit Service reconstructs graph of global roots Periodically Clients query Service asking which of its global roots are no longer globally reachable

20 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Leases Provide fault tolerance by preventing leaks in the face of remote process failure Clients take out a lease on a remote object Until this lease expires, object is protected from local collector Java RMI: –Lease default is 10 minutes –Leases renewed every 5 minutes

21 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Indirect, tracing DGC Classify by the degree of synchronisation required. Centralised: single initiating process Examples: [huda82, augu87, juul92] Partitioned: a partition of processes cooperates to collect independently of other processes Example: [lang92a] Autonomous: multiple, simultaneous collections Timestamp propagation: pipelined collections Examples: [hugh85, fess98]

22 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Augusteijn Process initiates collection (active- disquiet) Sends scan request to remote processes for which it holds references On receipt of a scan request, a process disquiet: adds request to its work queue, ACKs immediately quiet: processes request (disquiet), ACKs on completion Stable algorithm. Only disquiet processes can send requests. Always chain of responsibility from each disquiet process back to active process. Marking terminates when initiator has received ACK of each scan request sent Active Disquiet Passive Quiet Passive Disquiet received all ACKs received mark request received all ACKs received mark request

23 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Garbage collecting the world 1.Processes negotiate partitions 2.Processes send decrement messages from each exit-item At end, entry-items with positive counters (hard) are reachable from outside the group; other entry-items are soft. 3.Global mark within the group from 1.local roots and black entry-items propagating black 2.Soft entry-items marking unvisited items soft 4.Detect termination and reclaim soft entry-items

24 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Timestamp algorithms All global objects contain time-stamps. Time-stamp of new object is local time. Local GC propagates time stamps to remote objects Time-stamp of remote object is increased if lower than value in message Intuition: time-stamp of garbage never increases Process p has time-stamp redo p  time-stamp of any live object in this process minredo = min {redo p | p  processes} Any object with time-stamp  redo is garbage.

25 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Direct, non-tracing DGC Reference Counting Standard RC algorithm insufficient — race conditions Simple protocol to avoid premature reclamation [lerm86] Weighted RC avoids race: doesn’t send INC messages [beva87,wats87] Diffusion tree algorithms [piqu91, more98a] Reference Listing Maintain lists of processes holding reference to global root rather than a count More fault tolerant Examples: Network Objects [birr93], SSP chains [shap92a]

26 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Copying ‘Move’ locally unreachable global objects to site that references them — require an ordering on sites ‘Move’ may be real or virtual [bish77,vest87,shap90,huds97] Causal dependency tracking Analyse mutator’s computation graph directly [sche89,loub98]

27 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Direct, tracing GC Completeness requires tracing. Direct algorithms offer scalability. How can we combine these ideas to produce effective DGCs? Back-tracing Examples: [fuch95, mahe97] Partial tracing Example: [rodr98]

28 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Back-tracing Identify suspects Back-step(o) for some exit-item o Back-step(e) If e is not a suspect, return Live If e is marked, return Garbage Mark e For each remote object r pointing to e if Back-step(r) is Live, return Live Return Garbage Problem of multiple overlapping traces R X X X

29 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Partial tracing Identify suspects 1.Mark red from these suspects Construct ‘red sets’ (akin to ‘client sets’) Dynamically forms a group 2.Scan suspects whose red and client sets differ Rescues objects inadvertently marked red — mark them green Run all scans concurrently 3.Reclaim any red objects Group merger scheme permits multiple, overlapping collections

30 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Mark-red

31 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Scan

32 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Benefits Both schemes Are direct — attempt to trace only garbage Are scalable — can limit extent of trace (by process, by object, by hop-count,…) No global synchronisation Can take advantage of heuristics Partial tracing Mark-red does not have to synchronise with mutators Scan synchronisation through read and write barriers, DGC piggy-backs on mutator messages for fault tolerance [rodr98] shows how to manage overlapping traces

33 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August What this taxonomy offers Simple and orthogonal approach Offers complete taxonomy Not distracted by legacy of centralised GC Identifies new, scalable approaches

34 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Louboutin taxonomy Global garbage detection Proactive –In-situ graph colouring –Global-root graph reconstruction Reactive –Time-stamp packet distribution –IRC –WRC –RL comprehensive

35 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August In practice Many direct, acyclic schemes Tracing is only needed to recover cycles How do these arise in practice? Stereotypes –A holds reference to B, and B holds reference to A –E.g. Callbacks –Use a Design Pattern to manage these by explicitly dropping references e.g. Client sends Disconnect; Server drops callback General patterns –Do these arise?

36 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August Further research Object demographics Study real applications How do distributed objects behave? How much of this behaviour is imposed by limits of DGC technology? Frameworks for DGC Build a framework into which component DGCs could be plugged cf. Sun’s RVM for Java Comparative analysis How do different DGCs perform against different applications? Allow developers to pick

37 © Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August


Download ppt "© Richard Jones, 2000Directions for Distributed Garbage Collection Microsoft Research, 7 August 2000 1 Directions for Distributed Garbage Collection Richard."

Similar presentations


Ads by Google