Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.

Similar presentations


Presentation on theme: "A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008."— Presentation transcript:

1 A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008

2 Motivation Point-based graphics established Powerful algorithms –Representation –Processing –Manipulation –Rendering Decomposition –Get neighborhood –Operate on neighbors Graphics Hardware 2008 2

3 Motivation GPUs not suited for getting neighborhood –SIMD –Incoherent branching –Dynamic data structures slow –Recursive calls not supported CPUs –Small number of FPUs –Inflexible memory caches Graphics Hardware 2008 3 Courtesy of NVIDIA Courtesy of Intel

4 Contributions Hardware architecture for point sets –Neighbor search module –Novel advanced caching mechanism –Reconfigurable processing module –Programmability using FPGA compiler FPGA prototype and measurements Small & Lean  Integration into multi-core CPU/GPU possible Graphics Hardware 2008 4

5 Outline Related Work Spatial Searching and Caching Architecture and Prototype Results Conclusion Graphics Hardware 2008 5

6 Related Work Kd-Tree [Bentley 75] Graphics Hardware 2008 6 kNN on GPUs [Ma and McCool 02] Kd-Tree Hardware [Woop et al. 05] [Woop et al. 06] Kd-Tree on GPUs [Popov et al. 07]

7 Related Work Adaptive SPH Fluid Simulation [Adams et al. ‘07] Graphics Hardware 2008 7 Linear Moving Least Squares, [Adamson and Alexa ’04] Algebraic Moving Least Squares, [Guennebaud and Gross ‘07]

8 Linear Moving Least Squares Graphics Hardware 2008 8 Implicit surface definition defined by set of points

9 Linear Moving Least Squares Graphics Hardware 2008 9 x Implicit surface definition defined by set of points

10 Linear Moving Least Squares Graphics Hardware 2008 10 x pipi nini

11 Linear Moving Least Squares Graphics Hardware 2008 11 x Iterative projections onto plane

12 Linear Moving Least Squares Graphics Hardware 2008 12 x Iterative projections onto plane x’ ’

13 Linear Moving Least Squares Graphics Hardware 2008 13 x Iterative projections onto plane x’’ ’

14 Linear Moving Least Squares Graphics Hardware 2008 14 x Iterative projections onto plane x’’’ ’ ’ ’

15 Linear Moving Least Squares Graphics Hardware 2008 15 x Surface defined by points projecting onto themselves

16 Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 16

17 Spatial Search Spatial search: kNN and  NN –Common in most point operations –Based on kd-tree Example  NN: Graphics Hardware 2008 17

18 Spatial Search kNN search similar to  NN search: –Start with infinite radius –Sort leaf points into priority queue –Shrink radius with every point sorted Graphics Hardware 2008 18

19 Coherent Neighbor Cache (  NN) Find neighbors in slightly bigger radius Re-use result for spatially close query Graphics Hardware 2008 19 Re-use if

20 Coherent Neighbor Cache (kNN, exact) Find (k+1) neighbors Re-use result for spatially close query Graphics Hardware 2008 20 Re-use if

21 Coherent Neighbor Cache (kNN, approximation) Approximation error  –Enlarge radius Graphics Hardware 2008 21 Re-use if

22 Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 22

23 The Architecture Graphics Hardware 2008 23 Host

24 Eight cached neighborhoods Problem: parallel queries in kd-tree module  Interleave spatially similar queries Coherent Neighbor Cache Graphics Hardware 2008 24 11 1 0 0 0 nn n

25 Kd-Tree Traversal Graphics Hardware 2008 25

26 Graphics Hardware 2008 26 Kd-tree structure on chip 16 threads Pipelining and multi-threading Node Recurse

27 Stacks 16 stacks Parallel read/write Bounded in depth 6 bytes per thread per recursion Graphics Hardware 2008 27

28 Leaf 16 parallel priority queues (1-cycle ops) Queues store pointers and distances Bandwidth bottleneck Graphics Hardware 2008 28

29 Multithreaded quad-port bank of 16 registers 128 threads Programmability using FPGA-technology Processing Module Graphics Hardware 2008 29

30 Further Data Implemented on two FPGAs –64 bit DDR DRAM –Interconnection: no overhead Resource usage regs and LUTs –Virtex 2 Pro 100 (kNN): 26% registers, 38% LUTs –Virtex 2 Pro 70 (MLS): 47% registers, 52% LUTs Clock frequency: 75 MHz Graphics Hardware 2008 30

31 Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 31

32 Applications Tested on various applications PCI interface of prototype slow Graphics Hardware 2008 32 [Weyrich et al. 04] [Adams et al. 07]

33 Results kNN Graphics Hardware 2008 33 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6

34 Results kNN Graphics Hardware 2008 34 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6 Small hardware footprint FPGA slightly slower Realistic clock frequency  Prototype faster than CPU/GPU

35 Results MLS Graphics Hardware 2008 35 FPGA: x1 MLS CPU: x0.4 MLS CUDA x3.8 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries FPGA faster than CPU kNN bottleneck –FPGA –GPU

36 Coherent Neighbor Cache Graphics Hardware 2008 36 CPU,  =0.1 FPGA, exact FPGA,  =0.1 Level of coherence Number of queries

37 Results Approximation Error (MLS projection) Graphics Hardware 2008 37  approximation MLS Error no approx.

38 Results Approximation Error (MLS projection) Graphics Hardware 2008 38 Cache hits Cache Hits  approximation

39 Approximation Error (visual) Graphics Hardware 2008 39

40 Approximation Error (visual) Graphics Hardware 2008 40 Coherent Neighbor Cache: Not optimal for exact queries Approximate queries –Can be tolerated in most cases –Greatly increases performance –Even for small approximations

41 Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 41

42 Conclusion Novel hardware architecture for –Nearest-neighbor searches –Generic meshless processing operators Cache exploiting spatial coherence Good performance considering resources Possible GPU integration Graphics Hardware 2008 42

43 Future Work Programmable data structure –Support different data structures –Programmability in data structure –Construction on-chip ‘Real’ programmability in point processing module Graphics Hardware 2008 43

44 A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008


Download ppt "A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008."

Similar presentations


Ads by Google