Download presentation

Presentation is loading. Please wait.

Published bySheila Atherley Modified over 2 years ago

1
A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008

2
Motivation Point-based graphics established Powerful algorithms –Representation –Processing –Manipulation –Rendering Decomposition –Get neighborhood –Operate on neighbors Graphics Hardware 2008 2

3
Motivation GPUs not suited for getting neighborhood –SIMD –Incoherent branching –Dynamic data structures slow –Recursive calls not supported CPUs –Small number of FPUs –Inflexible memory caches Graphics Hardware 2008 3 Courtesy of NVIDIA Courtesy of Intel

4
Contributions Hardware architecture for point sets –Neighbor search module –Novel advanced caching mechanism –Reconfigurable processing module –Programmability using FPGA compiler FPGA prototype and measurements Small & Lean Integration into multi-core CPU/GPU possible Graphics Hardware 2008 4

5
Outline Related Work Spatial Searching and Caching Architecture and Prototype Results Conclusion Graphics Hardware 2008 5

6
Related Work Kd-Tree [Bentley 75] Graphics Hardware 2008 6 kNN on GPUs [Ma and McCool 02] Kd-Tree Hardware [Woop et al. 05] [Woop et al. 06] Kd-Tree on GPUs [Popov et al. 07]

7
Related Work Adaptive SPH Fluid Simulation [Adams et al. ‘07] Graphics Hardware 2008 7 Linear Moving Least Squares, [Adamson and Alexa ’04] Algebraic Moving Least Squares, [Guennebaud and Gross ‘07]

8
Linear Moving Least Squares Graphics Hardware 2008 8 Implicit surface definition defined by set of points

9
Linear Moving Least Squares Graphics Hardware 2008 9 x Implicit surface definition defined by set of points

10
Linear Moving Least Squares Graphics Hardware 2008 10 x pipi nini

11
Linear Moving Least Squares Graphics Hardware 2008 11 x Iterative projections onto plane

12
Linear Moving Least Squares Graphics Hardware 2008 12 x Iterative projections onto plane x’ ’

13
Linear Moving Least Squares Graphics Hardware 2008 13 x Iterative projections onto plane x’’ ’

14
Linear Moving Least Squares Graphics Hardware 2008 14 x Iterative projections onto plane x’’’ ’ ’ ’

15
Linear Moving Least Squares Graphics Hardware 2008 15 x Surface defined by points projecting onto themselves

16
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 16

17
Spatial Search Spatial search: kNN and NN –Common in most point operations –Based on kd-tree Example NN: Graphics Hardware 2008 17

18
Spatial Search kNN search similar to NN search: –Start with infinite radius –Sort leaf points into priority queue –Shrink radius with every point sorted Graphics Hardware 2008 18

19
Coherent Neighbor Cache ( NN) Find neighbors in slightly bigger radius Re-use result for spatially close query Graphics Hardware 2008 19 Re-use if

20
Coherent Neighbor Cache (kNN, exact) Find (k+1) neighbors Re-use result for spatially close query Graphics Hardware 2008 20 Re-use if

21
Coherent Neighbor Cache (kNN, approximation) Approximation error –Enlarge radius Graphics Hardware 2008 21 Re-use if

22
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 22

23
The Architecture Graphics Hardware 2008 23 Host

24
Eight cached neighborhoods Problem: parallel queries in kd-tree module Interleave spatially similar queries Coherent Neighbor Cache Graphics Hardware 2008 24 11 1 0 0 0 nn n

25
Kd-Tree Traversal Graphics Hardware 2008 25

26
Graphics Hardware 2008 26 Kd-tree structure on chip 16 threads Pipelining and multi-threading Node Recurse

27
Stacks 16 stacks Parallel read/write Bounded in depth 6 bytes per thread per recursion Graphics Hardware 2008 27

28
Leaf 16 parallel priority queues (1-cycle ops) Queues store pointers and distances Bandwidth bottleneck Graphics Hardware 2008 28

29
Multithreaded quad-port bank of 16 registers 128 threads Programmability using FPGA-technology Processing Module Graphics Hardware 2008 29

30
Further Data Implemented on two FPGAs –64 bit DDR DRAM –Interconnection: no overhead Resource usage regs and LUTs –Virtex 2 Pro 100 (kNN): 26% registers, 38% LUTs –Virtex 2 Pro 70 (MLS): 47% registers, 52% LUTs Clock frequency: 75 MHz Graphics Hardware 2008 30

31
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 31

32
Applications Tested on various applications PCI interface of prototype slow Graphics Hardware 2008 32 [Weyrich et al. 04] [Adams et al. 07]

33
Results kNN Graphics Hardware 2008 33 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6

34
Results kNN Graphics Hardware 2008 34 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6 Small hardware footprint FPGA slightly slower Realistic clock frequency Prototype faster than CPU/GPU

35
Results MLS Graphics Hardware 2008 35 FPGA: x1 MLS CPU: x0.4 MLS CUDA x3.8 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries FPGA faster than CPU kNN bottleneck –FPGA –GPU

36
Coherent Neighbor Cache Graphics Hardware 2008 36 CPU, =0.1 FPGA, exact FPGA, =0.1 Level of coherence Number of queries

37
Results Approximation Error (MLS projection) Graphics Hardware 2008 37 approximation MLS Error no approx.

38
Results Approximation Error (MLS projection) Graphics Hardware 2008 38 Cache hits Cache Hits approximation

39
Approximation Error (visual) Graphics Hardware 2008 39

40
Approximation Error (visual) Graphics Hardware 2008 40 Coherent Neighbor Cache: Not optimal for exact queries Approximate queries –Can be tolerated in most cases –Greatly increases performance –Even for small approximations

41
Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 41

42
Conclusion Novel hardware architecture for –Nearest-neighbor searches –Generic meshless processing operators Cache exploiting spatial coherence Good performance considering resources Possible GPU integration Graphics Hardware 2008 42

43
Future Work Programmable data structure –Support different data structures –Programmability in data structure –Construction on-chip ‘Real’ programmability in point processing module Graphics Hardware 2008 43

44
A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google