Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Smart Memory for Smart Phones Chris Clack University College London

Similar presentations


Presentation on theme: "1 Smart Memory for Smart Phones Chris Clack University College London"— Presentation transcript:

1 1 Smart Memory for Smart Phones Chris Clack University College London clack@cs.ucl.ac.uk

2 2 Outline Target Architecture Problems Focus on Fragmentation Results from UT A fast allocator (not embedded) Doug Lea’s Allocator Can We Do Better? Overheads Results

3 3 Target Architecture Small hand-held integrated phone/PDA devices Soft real-time, “open box”, constrained applications heap Competition pressure for more, more flexible, and better (larger) applications

4 4 Problems (1) TOP livefree To compact: copy when nearly full Memory overhead Compaction delay A free fragment

5 5 Problems (2) TOP livefree To compact: do sliding compaction when nearly full Compaction delay

6 6 Problems (3) livefree To compact: do sliding compaction when allocation fails Compaction delay FREE LIST

7 7 Focus on Fragmentation What happens in real programs? Great paper by Mark Johnstone and Paul Wilson (UT): “The Memory Fragmentation Problem: Solved?”, M.Johnstone & P.Wilson, 1997 Fragmentation experiments using real programs running on real data

8 8 Max live at any time Max Kb at any time Average lifetime of an allocated byte

9 9 RESULTS No difference within experimental error

10 10 #4 #3 MEASURE OF FRAGMENTATION e.g. %frag #4 = (value_at_3 – value_at_2) * 100 / value_at_2

11 11 No difference within experimental error

12 12 Johnstone & Wilson’s conclusion The best free-list management policy in terms of fragmentation behaviour on real programs is BEST-FIT (Knuth notwithstanding)

13 13 A Fast Best-Fit Allocator IMPLICATION: use Best-fit allocation and we (maybe?) won’t ever need to compact At least, compaction delays will be minimized BUT: best-fit allocation is S-L-O-W Worst-case: have to scan the entire free list Let’s look at a widely-used best-fit allocator: Doug Lea’s malloc (arguably) the fastest best-fit allocator

14 14 Boundary tag – used for coalescing Boundary tag

15 15 exact-fit bins Fixed-width bins Sorted by size W Costs time to sort Worst case: all free blocks in one bin – reduces to O(n) search

16 16 Can we do better? Support boundary tags and coalescing Simple Idea (1) (of 4): Probability of fragmentation triggering compaction depends on RANGE of allocatable block sizes Very large block alloc more likely to fail due to frags Very small free blocks create frags (NB if all blocks same size, fragmentation is zero!)

17 17 Restrict range of allocatable sizes and create an exact-fit table: lbublb+1lb+2lb+3 … ub-1ub-2 No need to sort Worst case: O(n) search  for next highest occupied bin

18 18 Old idea Use an occupancy bitmap If (ub-lb) = 31, bitmap is just one word To search/allocate: read bitmap; AND with mask; find highest set bit; maybe modify bit and write lbublb+1lb+2lb+3 … ub-1ub-2 00110000000000000000000000000101

19 19 Problem What if range is very large? E.g. Nikhil wants to allocate blocks that vary from 2 words to 2 12 words 2 12 different block sizes Worst case = linear search of 128 bitmap words (128 reads + …) Two solutions: Use more efficient bitmapping Use unconstrained hybrid scheme (see later)

20 20 More efficient bitmapping Simple Idea (2) Use a bitmap tree: Requires 128 + 4 + 1 words Requires worst case 5 reads, 3 tests for zero, 3 masks, 3 finds of greatest set bit, 3 modify&writes Generally: O(log 32 ((ub-lb)/32)) (Depends what you are counting … but it is fast!) Ten times faster than any other scheme we know

21 21 LIFO/FIFO? Simple Idea (3) Although J&W found no difference between LIFO/FIFO/AO best fit, this might be different for embedded apps So far, we can only do LIFO We can achieve FIFO if we double-link ALL free blocks into one big chain Drawback – now free takes as long as malloc (but still O(log 32 ((ub-lb)/32)))

22 22 lbublb+1lb+2lb+3 … ub-1ub-2 Bitmap tree Freed blocks placed at heads of chains If requested size not available, for LIFO: search bitmap tree to the right  Or for FIFO: search bitmap tree to the left , then follow link to next highest free block

23 23 Simple Idea (4) We can trivially also support Worst-fit by adding a pointer that always refers to the biggest block And this is where we put our wilderness block! We have no data on fragmentation behaviour of worst-fit If it turns out to be similar to best fit, it would be preferable because we would have O(1) alloc and O(log 32 ((ub-lb)/32)) free.

24 24 lbublb+1lb+2lb+3 … ub-1ub-2 Bitmap tree max W

25 25 Overheads Dynamic per-block overhead Depends on (ub-lb) – can be very small Example (total 32 bits per live block): 16 bit signed int for size and availability of current block 16 bit signed int for size and availability of previous block Could optimize for live block overhead: 1 bit in header + free blocks also hold size at end of block But, if 4-byte aligned and ANY overhead per block, can’t do better than this! Free blocks additionally need to hold two pointers minimum block size = header + 2 pointers

26 26 Static overheads Code A few registers (e.g. max) Data structures: Bitmap tree: 133 words Table: (ub-lb) words NOTE if (ub-lb=heap) then table size is the size of the heap! (same overhead as semi-space) So we don’t want to use this scheme for large size ranges!!! – instead use a hybrid

27 27 Hybrid scheme Most used range of block sizes: Use the bitmap tree and exact-fit bins as described Bigger block sizes: These are all kept on the double-linked chain above the biggest exact-fit block. Can use fixed-width bins like Lea, together with a separate bitmap tree, We lose the worst-case property of the primary scheme

28 28 RESULTS Re-run Johnstone and Wilson’s tests, using our allocator on their trace files

29 29 Memory requirement halved ! Memory required by gmalloc Memory required by new allocator Memory requested by the program Test 1 Roughly 5% fragmentation?

30 30 Memory required by gmalloc Memory required by new allocator Memory requested by the program Test 2

31 31 Memory required by gmalloc Memory required by new allocator Memory requested by the program Test 3

32 32 Memory required by gmalloc Memory required by new allocator Memory requested by the program Test 4 Memory requirements consistently halved! Fragmentation consistently ~ 5% (?)

33 33 Status Currently working with Symbian to conduct malloc- replacement trials using real smartphone applications


Download ppt "1 Smart Memory for Smart Phones Chris Clack University College London"

Similar presentations


Ads by Google