Download presentation

Presentation is loading. Please wait.

Published byLarry Merritt Modified about 1 year ago

1
Dan A. Alcantara Andrei Sharf Fatemeh Abbasinejad Shubhabrata SenguptaMichael Mitzenmacher* John D. Owens Nina Amenta University of California, Davis Harvard University* R EAL -T IME P ARALLEL H ASHING ON THE GPU 1

2
M OTIVATION R EAL -T IME KD-T REE C ONSTRUCTION ON G RAPHICS H ARDWARE Z HOU ET AL. [2008] P ERFECT S PATIAL H ASHING L EFEBVRE AND H OPPE [2006] G LIFT : G ENERIC, E FFICIENT, R ANDOM -A CCESS GPU D ATA S TRUCTURES L EFOHN ET AL. [2006]

3
M OTIVATION million voxels (0.33%) voxel grid ~ 1 billion cells V OXELIZED L UCY MESH I MAGE FEATURES 140K pixels 946 feature points (0.67%)

4
M OTIVATION 4

5
HASH TABLES & FUNCTIONS Two-level structure –Input key/value pairs (items), with unique keys 1.Split into buckets Fits in fast shared memory 2.Parallel Cuckoo Hashing Ensures O(1) retrieval O UTLINE 5 K EYS & V ALUES V OXELIZED L UCY 3.5M VOXELS O UR S TRUCTURE 5M SLOTS 26.6 MS (GTX 280)

6
F? E? T RADITIONAL H ASHING : P ROBING 6 E E h(k) E K EYS H ASH TABLE H ASH F UNCTION WAIT G G

7
Perfect mapping gives O(1) retrieval Constructs collision-free mapping: –h 1 (k) indexes into auxiliary offset table –Offset removes collisions from h 2 (k) Offset table built in specific order P ERFECT S PATIAL H ASHING 7 L EFEBVRE & H OPPE [2006] h 1 (k) h 2 (k) + + I NPUT DATA

8
Use d sub-tables, each with randomly generated hash function Two keys unlikely to always collide Tries to find permutation without conflicts Retrieve by looking at d possible locations C UCKOO H ASHING 8 P AGH AND R ODLER [2001] A E ABCDE h 1 (k) h 2 (k) B B A A C C I NPUT T1T1 T2T2 BCD B B C C D D AE DE D D B B B A C C C D A B C E E ? E E E E

9
A A A A Sequential insertion: 1.Try empty slots first 2.Evict if none available 3.Evicted key checks its other locations 4.Recursively evict Assume impossible after O(lg n) iterations –Rebuild using new hash functions C UCKOO H ASHING h 1 (k) h 2 (k) B B B B C C C C D D D D ABCDEABCDE 9 P AGH AND R ODLER [2001]

10
ABCDE Sequential insertion: 1.Try empty slots first 2.Evict if none available 3.Evicted key checks its other locations 4.Recursively evict Assume impossible after O(lg n) iterations –Rebuild using new hash functions C UCKOO H ASHING 10 P AGH AND R ODLER [2001] h 1 (k) h 2 (k) A B B C D D D E E E E B E

11
Sequential insertion: 1.Try empty slots first 2.Evict if none available 3.Evicted key checks its other locations 4.Recursively evict Assume impossible after O(lg n) iterations –Rebuild using new hash functions C UCKOO H ASHING 11 P AGH AND R ODLER [2001] EABCDABCD h 1 (k)* h 2 (k)*

12
For d=2 sub-tables: –Proven high chance of success with 2n+ε slots –Expect O(1) iterations For d=3 sub-tables: –Hard to get theoretical bounds –In practice, high chance of success with 1.1n+ε slots C UCKOO H ASHING h 1 (k) 12 h 3 (k) h 2 (k) I NPUT

13
Cuckoo Hashing issues: 1.Reads & writes throughout table 2.Expensive rebuilds Two-level structure –Group into buckets with < 512 items –Utilize thread blocks –Each cuckoo table fits in shared memory P IPELINE 13 I NPUT P HASE 1 P HASE 2

14
B UCKETS R EARRANGED DATA Group into buckets of < 512 items using h(k) Allocate enough buckets to get average 80% load Rearranges data to coalesce reads in Phase 2 P HASE 1: P ARTITIONING 14 I TEMS

15
Initially: –h(k) = k mod |buckets| Re-distribute if any bucket gets > 512 items –125 restarts/25000 trials (0.5%) for 5 million random items –h(k) = ((a+bk) mod p) mod |buckets| P HASE 1: P ARTITIONING 15

16
ABCDEFGHIJKLMNO K EYS P OSITION Allocate buckets 2.Compute item buckets using h(k) 3.Determine bucket sizes –Orders items in same bucket 4.Reserve contiguous chunk for each bucket 5.Move items P HASE 1: P ARTITIONING 16 P ACKED BUCKET DATA B UCKET SIZES 564 B UCKET OFFSETS EAIHN h(k) P REFIX SUM A TOMIC ADD

17
P HASE 2: C UCKOO H ASHING 17 G LOBAL MEMORY B UCKET DATA S HARED MEMORY S INGLE BUCKET ’ S CUCKOO TABLES ABCD EF GH A BCDEFGH Thread block per bucket Performed in shared memory to reduce overhead Three sub-tables for better occupancy T1T1 T2T2 T3T3

18
Generate hash functions Parallelized construction 1.Simultaneously insert 2.Synchronize block 3.If evicted, repeat for other sub-tables Fail after 25 iterations through all 3 sub-tables P HASE 2: C UCKOO H ASHING 18 S HARED MEMORY S INGLE BUCKET ’ S CUCKOO TABLES g 1 (k) g 3 (k) g 2 (k) ABCD EF GH A BCDEFGH T1T1 T2T2 T3T3

19
In trials, average of 5.5 iterations –Nearly all converge with first functions –Succeeded with < 2 new sets of functions P HASE 2: C UCKOO H ASHING 19 S HARED MEMORY S INGLE BUCKET ’ S CUCKOO TABLES B B E E A A C C g 1 (k) g 3 (k) g 2 (k) ABCD EF GH D D F F H H G G T1T1 T2T2 T3T3

20
S HARED MEMORY B UCKETS ’ TABLES At end of phase, save out to global memory: 1.Cuckoo hash functions 2.Rearranged sub-tables P HASE 2: C UCKOO H ASHING 20 G LOBAL MEMORY I NTERLEAVED CUCKOO TABLES G LOBAL MEMORY H ASH F UNCTIONS

21
Look in the 3 possible locations: 1.Compute bucket 2.Retrieve hash functions 3.Check each slot, stopping early if item found H ASH R ETRIEVALS 21 Q UERY vkvk vkvk V ALUE

22
P IPELINE : L UCY DATASET V OXELIZED L UCY I NPUT V OXELS I TEM DISTRIBUTION 22 P HASE 1 R EARRANGED DATA P HASE 2 C UCKOO HASH TABLES C UCKOO SUB - TABLES

23
T IMING RESULTS : L UCY DATASET 23 Timed on EVGA GTX 280 SSC All items retrieved in shuffled order, in parallel

24
T IMING RESULTS : R ANDOMIZED DATA 24

25
T IMING RESULTS : S TEP B REAKDOWN 25

26
K EYS V ALUES H ASH V ARIATIONS 26 V OXELS P OINTS

27
M ULTI - VALUE H ASH 27 V OXELS P OINTS M ULTI - VALUE HASH

28
C OMPACTING HASH 28 V OXELS C OMPACTING HASH A VG NORMAL A VG COLOR # POINTS

29
S PATIAL H ASHING 29

30
G EOMETRIC H ASHING 30

31
G EOMETRIC H ASHING 31

32
S PACE U TILIZATION C ONSTRUCTION S PEED R ETRIEVAL S PEED S PACE U TILIZATION C ONSTRUCTION & R ETRIEVAL S PEED 1.Bucket size & occupancy 2.Number of sub-tables 3.Cuckoo table sizes Ordered vs. random retrieval T RADE - OFFS 32 S PACE U TILIZATION R ETRIEVAL S PEED S PACE U TILIZATION C ONSTRUCTION S PEED H ASH T ABLE S ORTED A RRAY

33
Introduced method for building large hash tables in real-time using CUDA –O(1) random access to sparse data –Balances space usage, construction speed, and retrieval speed Generalized construction to handle non-unique keys Demonstrated use with spatial and geometric hashing Future work –Decrease restart penalty for bucket distribution –Reduce atomic usage to speed up construction S UMMARY 33

34
Thanks to our funding agencies: –National Science Foundation (awards , , , and ) –SciDAC Institute for Ultrascale Visualization Companies: –NVIDIA for equipment donations & Shubho’s Graduate Fellowship –Cisco and Google for research grants Data sources: –Daniel Vlasic –The Stanford 3D Scanning Repository –The CAVIAR project –Matthew Harding (http://www.wherethehellismatt.com/) 2006 Matt Harding Dancing Video is provided courtesy of Cadbury Adams USA LLC. ©2006 Cadbury Adams USA LLC. All Rights Reserved. Stride is a registered trademark of Cadbury Adams USA LLC.http://www.wherethehellismatt.com/ Timothy Lee for his help in the early stages of the project A CKNOWLEDGMENTS 34

35
35

36
Shared memory Global memory B ACKGROUND 36 Threads Block WHILE not found { check hash table }

37
Collision handling Insertion needs access to entire table Parallel insertions need locks to ensure write Extra space for list info Variable retrieval times T RADITIONAL H ASHING 37 E E h(k) A A B B D D C C E E ECBD P ROBING C HAINING E E

38
A A Can find a perfect hash function given n 2 slots Cut space usage to O(n) by using two-level scheme 1.Split into n ideally tiny buckets 2.Build perfect hash for each bucket b with |b| 2 slots Needs up to 6n space for basic version FKS P ERFECT H ASHING D D B B C C E E ABCDE D D h(k) 38 g 1 (k) g 3 (k) g 5 (k) K EYS B UCKETS F REDMAN, K OMLOS, AND S ZEMEREDI [1984]

39
G G F F Generalized chaining: key added to smallest of 2 lists With high probability, expected access O(1) –Still O(lg lg n) Shares issues with regular chaining –Hard to ensure that chosen list remains smaller between size check and insertion 2- WAY CHAINING D D B B E E h 1 (k) h 2 (k) A A C C ABCDEFG 39 A ZAR ET AL. [1994]

40
P HASE 1 K EYS Phase 1: Distribution –Mostly the same as for basic hash, except copies allowed –Items with the same key put in same bucket –Allow > 512 items per bucket to account for multiples –Overflow chance smaller since expecting < 512 unique keys per bucket M ETHOD 40 H ASH V ARIATIONS AGXFAXYAFGYMMAC B UCKET D ATA AAFAAFGMCMGYXXY

41
Phase 2: Cuckoo hashing –Single thread may manage multiple items if > 512 items fell into bucket –Evict only if items with different keys collide –Cuckoo tables contain only unique keys M ETHOD 41 H ASH V ARIATIONS B UCKET D ATA AAFAAFGMCMGYXXY A F A A F A

42
P HASE 2 Compacting, Phase 2: –Valid keys in Cuckoo table given value of 1, invalid entries get 0 –Prefix-sum on values gives unique IDs to valid Cuckoo table keys –Compact cuckoo table keys to get list of unique keys C OMPACTING H ASH 42 B UCKET D ATA AAFAAFGBCBGYXXY C UCKOO T ABLES U NIQUE K EYS AFGBCXY P REFIX -S UM

43
Multi-value hash, Phase 2: –Rearrange bucket contents so that all values for a key are contiguous –Process similar to phase 1, but done within the bucket Retrieval shows where values start, and how many values there are M ULTI -V ALUE H ASH 43 B UCKET DATA V ALUES AFCBSMRT L OCATION C OUNT P HASE 2 C UCKOO T ABLES A1A1 F3F3 A2A2 A3A3 F1F1 F2F2 C2C2 B1B1 B2B2 C1C1 S2S2 M1M1 R1R1 T1T1 S1S1

44
T IMING : M ULTI - VALUE H ASH 44

45
T IMING : C OMPACTING HASH 45

46
T IMING : N ON - EXISTENT QUERIES 46

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google