Download presentation

Presentation is loading. Please wait.

Published byLarry Merritt Modified over 2 years ago

1
Dan A. Alcantara Andrei Sharf Fatemeh Abbasinejad Shubhabrata SenguptaMichael Mitzenmacher* John D. Owens Nina Amenta University of California, Davis Harvard University* R EAL -T IME P ARALLEL H ASHING ON THE GPU 1

2
M OTIVATION R EAL -T IME KD-T REE C ONSTRUCTION ON G RAPHICS H ARDWARE Z HOU ET AL. [2008] P ERFECT S PATIAL H ASHING L EFEBVRE AND H OPPE [2006] G LIFT : G ENERIC, E FFICIENT, R ANDOM -A CCESS GPU D ATA S TRUCTURES L EFOHN ET AL. [2006]

3
M OTIVATION 3 3.5 million voxels (0.33%) 1024 3 voxel grid ~ 1 billion cells V OXELIZED L UCY MESH I MAGE FEATURES 140K pixels 946 feature points (0.67%)

4
M OTIVATION 4

5
HASH TABLES & FUNCTIONS Two-level structure –Input key/value pairs (items), with unique keys 1.Split into buckets Fits in fast shared memory 2.Parallel Cuckoo Hashing Ensures O(1) retrieval O UTLINE 5 K EYS & V ALUES V OXELIZED L UCY 3.5M VOXELS O UR S TRUCTURE 5M SLOTS 26.6 MS (GTX 280)

6
F? E? T RADITIONAL H ASHING : P ROBING 6 E E h(k) E K EYS H ASH TABLE H ASH F UNCTION WAIT G G

7
Perfect mapping gives O(1) retrieval Constructs collision-free mapping: –h 1 (k) indexes into auxiliary offset table –Offset removes collisions from h 2 (k) Offset table built in specific order P ERFECT S PATIAL H ASHING 7 L EFEBVRE & H OPPE [2006] h 1 (k) h 2 (k) + + I NPUT DATA

8
Use d sub-tables, each with randomly generated hash function Two keys unlikely to always collide Tries to find permutation without conflicts Retrieve by looking at d possible locations C UCKOO H ASHING 8 P AGH AND R ODLER [2001] A E ABCDE h 1 (k) h 2 (k) B B A A C C I NPUT T1T1 T2T2 BCD B B C C D D AE DE D D B B B A C C C D A B C E E ? E E E E

9
A A A A Sequential insertion: 1.Try empty slots first 2.Evict if none available 3.Evicted key checks its other locations 4.Recursively evict Assume impossible after O(lg n) iterations –Rebuild using new hash functions C UCKOO H ASHING h 1 (k) h 2 (k) B B B B C C C C D D D D ABCDEABCDE 9 P AGH AND R ODLER [2001]

10
ABCDE Sequential insertion: 1.Try empty slots first 2.Evict if none available 3.Evicted key checks its other locations 4.Recursively evict Assume impossible after O(lg n) iterations –Rebuild using new hash functions C UCKOO H ASHING 10 P AGH AND R ODLER [2001] h 1 (k) h 2 (k) A B B C D D D E E E E B E

11
Sequential insertion: 1.Try empty slots first 2.Evict if none available 3.Evicted key checks its other locations 4.Recursively evict Assume impossible after O(lg n) iterations –Rebuild using new hash functions C UCKOO H ASHING 11 P AGH AND R ODLER [2001] EABCDABCD h 1 (k)* h 2 (k)*

12
For d=2 sub-tables: –Proven high chance of success with 2n+ε slots –Expect O(1) iterations For d=3 sub-tables: –Hard to get theoretical bounds –In practice, high chance of success with 1.1n+ε slots C UCKOO H ASHING h 1 (k) 12 h 3 (k) h 2 (k) I NPUT

13
Cuckoo Hashing issues: 1.Reads & writes throughout table 2.Expensive rebuilds Two-level structure –Group into buckets with < 512 items –Utilize thread blocks –Each cuckoo table fits in shared memory P IPELINE 13 I NPUT P HASE 1 P HASE 2

14
B UCKETS R EARRANGED DATA Group into buckets of < 512 items using h(k) Allocate enough buckets to get average 80% load Rearranges data to coalesce reads in Phase 2 P HASE 1: P ARTITIONING 14 I TEMS

15
Initially: –h(k) = k mod |buckets| Re-distribute if any bucket gets > 512 items –125 restarts/25000 trials (0.5%) for 5 million random items –h(k) = ((a+bk) mod p) mod |buckets| P HASE 1: P ARTITIONING 15

16
ABCDEFGHIJKLMNO K EYS P OSITION 1 0324 1.Allocate buckets 2.Compute item buckets using h(k) 3.Determine bucket sizes –Orders items in same bucket 4.Reserve contiguous chunk for each bucket 5.Move items P HASE 1: P ARTITIONING 16 P ACKED BUCKET DATA B UCKET SIZES 564 B UCKET OFFSETS 05 11 EAIHN h(k) P REFIX SUM A TOMIC ADD

17
P HASE 2: C UCKOO H ASHING 17 G LOBAL MEMORY B UCKET DATA S HARED MEMORY S INGLE BUCKET ’ S CUCKOO TABLES ABCD EF GH A BCDEFGH Thread block per bucket Performed in shared memory to reduce overhead Three sub-tables for better occupancy T1T1 T2T2 T3T3

18
Generate hash functions Parallelized construction 1.Simultaneously insert 2.Synchronize block 3.If evicted, repeat for other sub-tables Fail after 25 iterations through all 3 sub-tables P HASE 2: C UCKOO H ASHING 18 S HARED MEMORY S INGLE BUCKET ’ S CUCKOO TABLES g 1 (k) g 3 (k) g 2 (k) ABCD EF GH A BCDEFGH T1T1 T2T2 T3T3

19
In trials, average of 5.5 iterations –Nearly all converge with first functions –Succeeded with < 2 new sets of functions P HASE 2: C UCKOO H ASHING 19 S HARED MEMORY S INGLE BUCKET ’ S CUCKOO TABLES B B E E A A C C g 1 (k) g 3 (k) g 2 (k) ABCD EF GH D D F F H H G G T1T1 T2T2 T3T3

20
S HARED MEMORY B UCKETS ’ TABLES At end of phase, save out to global memory: 1.Cuckoo hash functions 2.Rearranged sub-tables P HASE 2: C UCKOO H ASHING 20 G LOBAL MEMORY I NTERLEAVED CUCKOO TABLES G LOBAL MEMORY H ASH F UNCTIONS

21
Look in the 3 possible locations: 1.Compute bucket 2.Retrieve hash functions 3.Check each slot, stopping early if item found H ASH R ETRIEVALS 21 Q UERY vkvk vkvk V ALUE

22
P IPELINE : L UCY DATASET V OXELIZED L UCY I NPUT V OXELS I TEM DISTRIBUTION 22 P HASE 1 R EARRANGED DATA P HASE 2 C UCKOO HASH TABLES C UCKOO SUB - TABLES

23
T IMING RESULTS : L UCY DATASET 23 Timed on EVGA GTX 280 SSC All items retrieved in shuffled order, in parallel

24
T IMING RESULTS : R ANDOMIZED DATA 24

25
T IMING RESULTS : S TEP B REAKDOWN 25

26
K EYS V ALUES H ASH V ARIATIONS 26 V OXELS P OINTS

27
M ULTI - VALUE H ASH 27 V OXELS P OINTS M ULTI - VALUE HASH

28
C OMPACTING HASH 28 V OXELS C OMPACTING HASH 0123456789 A VG NORMAL A VG COLOR # POINTS

29
S PATIAL H ASHING 29

30
G EOMETRIC H ASHING 30

31
G EOMETRIC H ASHING 31

32
S PACE U TILIZATION C ONSTRUCTION S PEED R ETRIEVAL S PEED S PACE U TILIZATION C ONSTRUCTION & R ETRIEVAL S PEED 1.Bucket size & occupancy 2.Number of sub-tables 3.Cuckoo table sizes Ordered vs. random retrieval T RADE - OFFS 32 S PACE U TILIZATION R ETRIEVAL S PEED S PACE U TILIZATION C ONSTRUCTION S PEED H ASH T ABLE S ORTED A RRAY

33
Introduced method for building large hash tables in real-time using CUDA –O(1) random access to sparse data –Balances space usage, construction speed, and retrieval speed Generalized construction to handle non-unique keys Demonstrated use with spatial and geometric hashing Future work –Decrease restart penalty for bucket distribution –Reduce atomic usage to speed up construction S UMMARY 33

34
Thanks to our funding agencies: –National Science Foundation (awards 0541448, 0625744, 0635250, and 0721491) –SciDAC Institute for Ultrascale Visualization Companies: –NVIDIA for equipment donations & Shubho’s Graduate Fellowship –Cisco and Google for research grants Data sources: –Daniel Vlasic –The Stanford 3D Scanning Repository –The CAVIAR project –Matthew Harding (http://www.wherethehellismatt.com/) 2006 Matt Harding Dancing Video is provided courtesy of Cadbury Adams USA LLC. ©2006 Cadbury Adams USA LLC. All Rights Reserved. Stride is a registered trademark of Cadbury Adams USA LLC.http://www.wherethehellismatt.com/ Timothy Lee for his help in the early stages of the project A CKNOWLEDGMENTS 34

35
35

36
Shared memory Global memory B ACKGROUND 36 Threads Block WHILE not found { check hash table } 5713 777

37
Collision handling Insertion needs access to entire table Parallel insertions need locks to ensure write Extra space for list info Variable retrieval times T RADITIONAL H ASHING 37 E E h(k) A A B B D D C C E E ECBD P ROBING C HAINING E E

38
A A Can find a perfect hash function given n 2 slots Cut space usage to O(n) by using two-level scheme 1.Split into n ideally tiny buckets 2.Build perfect hash for each bucket b with |b| 2 slots Needs up to 6n space for basic version FKS P ERFECT H ASHING D D B B C C E E ABCDE D D h(k) 38 g 1 (k) g 3 (k) g 5 (k) K EYS B UCKETS F REDMAN, K OMLOS, AND S ZEMEREDI [1984]

39
G G F F Generalized chaining: key added to smallest of 2 lists With high probability, expected access O(1) –Still O(lg lg n) Shares issues with regular chaining –Hard to ensure that chosen list remains smaller between size check and insertion 2- WAY CHAINING D D B B E E h 1 (k) h 2 (k) A A C C ABCDEFG 39 A ZAR ET AL. [1994]

40
P HASE 1 K EYS Phase 1: Distribution –Mostly the same as for basic hash, except copies allowed –Items with the same key put in same bucket –Allow > 512 items per bucket to account for multiples –Overflow chance smaller since expecting < 512 unique keys per bucket M ETHOD 40 H ASH V ARIATIONS AGXFAXYAFGYMMAC B UCKET D ATA AAFAAFGMCMGYXXY

41
Phase 2: Cuckoo hashing –Single thread may manage multiple items if > 512 items fell into bucket –Evict only if items with different keys collide –Cuckoo tables contain only unique keys M ETHOD 41 H ASH V ARIATIONS B UCKET D ATA AAFAAFGMCMGYXXY A F A A F A

42
P HASE 2 Compacting, Phase 2: –Valid keys in Cuckoo table given value of 1, invalid entries get 0 –Prefix-sum on values gives unique IDs to valid Cuckoo table keys –Compact cuckoo table keys to get list of unique keys C OMPACTING H ASH 42 B UCKET D ATA AAFAAFGBCBGYXXY C UCKOO T ABLES U NIQUE K EYS AFGBCXY P REFIX -S UM

43
Multi-value hash, Phase 2: –Rearrange bucket contents so that all values for a key are contiguous –Process similar to phase 1, but done within the bucket Retrieval shows where values start, and how many values there are M ULTI -V ALUE H ASH 43 B UCKET DATA V ALUES 132312121212111 AFCBSMRT 036810111314 L OCATION 33221211 C OUNT P HASE 2 C UCKOO T ABLES A1A1 F3F3 A2A2 A3A3 F1F1 F2F2 C2C2 B1B1 B2B2 C1C1 S2S2 M1M1 R1R1 T1T1 S1S1

44
T IMING : M ULTI - VALUE H ASH 44

45
T IMING : C OMPACTING HASH 45

46
T IMING : N ON - EXISTENT QUERIES 46

Similar presentations

OK

Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.

Cuckoo Hashing : Hardware Implementations Adam Kirsch Michael Mitzenmacher.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Best ppt on maglev train Elementary ppt on cells Ppt on different types of computer softwares list Ppt on services in android Ppt on harmful effects of drinking alcohol Ppt on power generation by speed breaker symbols Ppt on google driverless car Ppt on phonetic transcription to english Ppt on indian cricket history Ppt on spiritual leadership theory