Download presentation

Presentation is loading. Please wait.

Published byRowan Cantwell Modified over 2 years ago

1
Multiple Choice Hash Tables with Moves on Deletes and Inserts Adam Kirsch Michael Mitzenmacher

2
Hashing : Modern Perspective For many situations (e.g., hardware for routers) multiple choice hash tables are state- of-the-art. –Each item gets d possible hash locations, placed in one. Moving items among choices (e.g., cuckoo hashing) greatly improves space utilization. –Only cost : may take many moves per insert.

3
Previously Schemes that move at most 1 item per insertion. –Limit cost of cuckoo hashing. Schemes that batch move operations in a queue. –Amortize cost of cuckoo hashing. Using content addressable memories (CAMs) to reduce chance of overflow. –Small CAMs yield big gains.

4
Contributions Consider potential of moving items on deletions. –Focus on one move per deletion/insertion. Examine alternative approach using weaker hashing from [KTC, Peacock Hashing]. –Analyze limits of performance.

5
Multilevel Hash Table [BK90] Use a multilevel hash table (MHT) –Can store n elements with d = log log n + O(1) levels in O(n) space with high probability –Example with d = 4 hash functions Skew: more elements placed by early hash functions (double exponential decay) x 1 2 3 4 Level

6
Second Chance (SC) Scheme Standard MHT fills from top down –elements cascade from table to table. –We try to slow cascade at every step. Standard MHT Insertion x

7
Second Chance (SC) Scheme Standard MHT fills from top down –elements cascade from table to table. –We try to slow cascade at every step. x

8
Second Chance (SC) Scheme Standard MHT fills from top down –elements cascade from table to table. –We try to slow cascade at every step. x

9
CAMs Last few collisions hard to stop. –Can waste lots of space on few items. Solution : content addressable memory. –CAMs fully asociative. –Hold small numbers of items.

10
Moves on Deletions Harder to manage. What item to move up? x 1 2 3 4 Level

11
Hint-Based Approach Each cell stores hint for where an item to move on delete is held. Hints can be kept fairly small. –About log n bits. Various hint approaches possible. –We found replace hint on any collision works well. –May depend on item lifetime distribution, etc. –One move, recursive move variations.

12
Simulation Data No current method of analysis for hints. –Use simulations. 10,000 trials per data point. MHT levels decreasing in size by factor of 2. Plus small CAM. With n items, top level has size n. –Space usage just above 50%. Load table to n elements, alternate inserts/deletes for 2 18 steps. –Exponentially distributed lifetimes. Goal : how many hash functions needed?

13
Simulation Results SchemeItemsHash Functions Average Stash Max Stash No moves32768134.22531 Second Chance3276860.0012 Hint, 1 Move3276870.0133 Hint, Moves3276860.2467 Hint,1Move+SC3276844.67818 Hint,Moves+SC3276840.9119

14
Lessons from Simulations No moves very weak. Second Chance (move on insert) more powerful than hint-based move on delete. But the two combine well. –Four hash functions: better than 50% load, small CAM.

15
Alternative : Weak Hashes To avoid hints, overflow at each bucket splits to two buckets at next level. –Each bucket receives from four buckets. Less spreading of items, but know where to look on deletes. Conjecture : loss of randomness implies weak performance.

16
Picturing Weak Hashes

17
Two Idealized Schemes Each bucket holds random item, splits rest. Each bucket counts items passed to bucket A and bucket B at next level, greedily holds item from bucket with larger count. Assume invariants kept over insertions/deletions at all times. Can be analyzed recursively level by level. –Get distribution of bucket loads at each level. –Obtain average case peformance.

18
Results SchemeItemsHash Functions Average Stash Random3276862.470 Greedy3276860.182 Second Chance 3276860.001

19
Conclusions Weak hashes, based on buckets, much less effective than hints. –Even under optimistic assumptions. One move approaches effective. –Move on insert/delete complement each other. Need methods for analysis. –Challenging dependencies; hard to get exact numbers.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google