Presentation is loading. Please wait.

Presentation is loading. Please wait.

Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.

Similar presentations


Presentation on theme: "Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala."— Presentation transcript:

1 Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala University, Sweden

2 Goals of this work Efficiently implement concurrency through asynchronous message-passing Memory management with real-time characteristics o Short stop-times o High mutator utilization Design for multithreading

3 Our context: Erlang Designed for highly concurrent applications Soft Real-Time Light-weight processes No destructive updates Data types: atoms, numbers, PIDs, tuples, cons cells (lists), binaries heap data

4 Our context: the Erlang/OTP system Industrial-strength implementation Used in embedded applications Three memory architectures: [ISMM’02] o Private o Shared o Hybrid

5 Stack Heap Private heaps PP

6 PP O(|message|) copy

7 Private heaps PP Garbage collection is a private business Fast memory reclamation of terminated processes

8 O(1) Shared heap PP Global synchronization Longer stop-times No fast reclamation of process-local data

9 Hybrid architecture PP Message area Process-local heaps Big objects area

10 Several possible methods o User annotations o Dynamic monitoring [Petrank et al ISMM’02] o Static analysis guided allocation Allocating messages in the message area

11 Static message analysis [SAS’03] Similar to escape analysis Allocation is process-local by default o Possible messages allocated on message area o Copy on demand Analysis is quite precise o Typically finds 99% of all messages

12 Process-local heaps Private business: No synchronization required Message area Two generations Copying collector in young generation o Fast allocation Mark-and-sweep in old generation o Prevents repeated copying of old objects Garbage Collection in Hybrid Arch.

13 GC of the message area is a bottleneck 1.Generational process scanning 2.Remembered set in local heaps The root-set for the message area consists of all stacks and process-local heaps This is not enough... We need an incremental collector in the Message Area!

14 Properties of incremental collector No overhead on mutator No space overhead on heap objects Short stop-times High mutator utilization

15 Old generation Organization of the Message Area Fwd Black-map Young generation Nursery From- space Nursery and from- space always have a constant size,  (=100k words) Storage area for forwarding pointers. Size bound by  (currently =  ) List of arbitrary sized areas Free-list, first-fit allocation Bit-array used to mark objects in mark- and-sweep

16 N limit N top allocation limit Nursery Organization of the Message Area

17 Incremental collector Two approaches to choose from: Work-based Reclaim n live words each step Time-based A step takes no more than t ms n and t are user-specified

18 Work-based collection The mutator wants to allocate need words reclaim = max( n, need ) N limit N top allocation limit Allocation limit = N top + reclaim

19 Time-based collection 1.User annotations (as in Metronome) 2.Dynamic worst-case calculation How much can the mutator allocate? How much live data is there?

20 Time-based collection  GC = reclaimed after GC – reclaimed before GC GC steps =  – reclaimed after GC  GC w M = N free GC steps N limit N top allocation limit Allocation limit = N top + w M 

21 Collecting the Message Area P1P2P3 FwdNurseryFromspace

22 Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery

23 Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery

24 Process Queue Collecting the Message Area P1P2P3 FwdFromspaceNursery P1

25 Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery

26 Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery

27 Process Queue P1 Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit Cheap write barrier Link receiver to a list in the send operation

28 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

29 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

30 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

31 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit P1

32 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

33 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

34 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

35 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery P1 allocation limit

36 Process Queue Collecting the Message Area P2P3 FwdFromspaceNursery allocation limit P1

37 Collecting the Message Area P2P3 FwdFromspaceNurseryallocation limit P1

38 Performance evaluation: Settings Intel Xeon 2.4 GHz, 1GB RAM, Linux Start with small process-local heaps (233 words, grows when needed) Measure active CPU time o using hardware performance monitors

39 Performance evaluation: Benchmarks Mnesia – Distributed database system 1,109 processes 2,892,855 messages Yaws – HTTP Web server 420 processes 2,275,467 messages Adhoc – Data mining application 137 processes 246,021 messages

40 Stop-times – Time-based Mnesia Yaws t = 1ms

41 Stop-times – Work-based AdhocYaws n = 2 words Mean: 3 Geo. Mean: 2 Mean: 9 Geo. Mean: 1

42 Stop-times – Work-based AdhocYaws n = 100 words Mean: 53 Geo. Mean: 46 Mean: 268 Geo. Mean: 36 Time (  s)

43 Bench- mark n = 2 MA GC n = 100 MA GC n = 1000 MA GC Non-Inc. MA GC Mnesia18216415688 Yaws373374242153 Adhoc2442037827 Message area total GC times incremental vs. non-incremental Times in ms

44 Bench- mark Mutator Local GC MA n = 2 MA n = 100 MA n = 1000 Mnesia52,9064,439182164156 Yaws237,62911,728373374242 Adhoc61,0458,19424420378 Runtimes – Incremental Times in ms

45 Minimum Mutator Utilization The fraction of time that the mutator executes in any time window [Cheng & Blelloch PLDI 2001]

46 Mutator Utilization – Work-based Adhoc Yaws n = 100 words

47 Concluding Remarks Memory allocator is guided by the intended use of data Incremental Garbage Collector High mutator utilization Small overhead on total runtime No mutator overhead Small space overhead Really short stop-times!

48 Runtimes incremental vs. non-incremental Times in ms Bench- mark Inc. Mutator Non-Inc. Mutator Mnesia52,90653,276 Yaws237,629240,985 Adhoc61,04561,578

49 Total GC times incremental vs. non-incremental Times in ms Bench- mark Inc. Local GC Non-Inc. Local GC Mnesia4,4394,487 Yaws11,72811,359 Adhoc8,1947,848

50 Mutator Utilization – Time-based Mnesia Yaws t = 1ms


Download ppt "Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala."

Similar presentations


Ads by Google