A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.

A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008

Motivation Point-based graphics established Powerful algorithms –Representation –Processing –Manipulation –Rendering Decomposition –Get neighborhood –Operate on neighbors Graphics Hardware 2008 2

Motivation GPUs not suited for getting neighborhood –SIMD –Incoherent branching –Dynamic data structures slow –Recursive calls not supported CPUs –Small number of FPUs –Inflexible memory caches Graphics Hardware 2008 3 Courtesy of NVIDIA Courtesy of Intel

Contributions Hardware architecture for point sets –Neighbor search module –Novel advanced caching mechanism –Reconfigurable processing module –Programmability using FPGA compiler FPGA prototype and measurements Small & Lean  Integration into multi-core CPU/GPU possible Graphics Hardware 2008 4

Outline Related Work Spatial Searching and Caching Architecture and Prototype Results Conclusion Graphics Hardware 2008 5

Related Work Kd-Tree [Bentley 75] Graphics Hardware 2008 6 kNN on GPUs [Ma and McCool 02] Kd-Tree Hardware [Woop et al. 05] [Woop et al. 06] Kd-Tree on GPUs [Popov et al. 07]

Related Work Adaptive SPH Fluid Simulation [Adams et al. ‘07] Graphics Hardware 2008 7 Linear Moving Least Squares, [Adamson and Alexa ’04] Algebraic Moving Least Squares, [Guennebaud and Gross ‘07]

Linear Moving Least Squares Graphics Hardware 2008 8 Implicit surface definition defined by set of points

Linear Moving Least Squares Graphics Hardware 2008 9 x Implicit surface definition defined by set of points

Linear Moving Least Squares Graphics Hardware 2008 10 x pipi nini

Linear Moving Least Squares Graphics Hardware 2008 11 x Iterative projections onto plane

Linear Moving Least Squares Graphics Hardware 2008 12 x Iterative projections onto plane x’ ’

Linear Moving Least Squares Graphics Hardware 2008 13 x Iterative projections onto plane x’’ ’

Linear Moving Least Squares Graphics Hardware 2008 14 x Iterative projections onto plane x’’’ ’ ’ ’

Linear Moving Least Squares Graphics Hardware 2008 15 x Surface defined by points projecting onto themselves

Outline Related Work Spatial Searching and Caching Architecture & Prototype Results Conclusion Graphics Hardware 2008 16

Spatial Search Spatial search: kNN and  NN –Common in most point operations –Based on kd-tree Example  NN: Graphics Hardware 2008 17

Spatial Search kNN search similar to  NN search: –Start with infinite radius –Sort leaf points into priority queue –Shrink radius with every point sorted Graphics Hardware 2008 18

Coherent Neighbor Cache (  NN) Find neighbors in slightly bigger radius Re-use result for spatially close query Graphics Hardware 2008 19 Re-use if

Coherent Neighbor Cache (kNN, exact) Find (k+1) neighbors Re-use result for spatially close query Graphics Hardware 2008 20 Re-use if

Coherent Neighbor Cache (kNN, approximation) Approximation error  –Enlarge radius Graphics Hardware 2008 21 Re-use if

The Architecture Graphics Hardware 2008 23 Host

Eight cached neighborhoods Problem: parallel queries in kd-tree module  Interleave spatially similar queries Coherent Neighbor Cache Graphics Hardware 2008 24 11 1 0 0 0 nn n

Kd-Tree Traversal Graphics Hardware 2008 25

Graphics Hardware 2008 26 Kd-tree structure on chip 16 threads Pipelining and multi-threading Node Recurse

Stacks 16 stacks Parallel read/write Bounded in depth 6 bytes per thread per recursion Graphics Hardware 2008 27

Leaf 16 parallel priority queues (1-cycle ops) Queues store pointers and distances Bandwidth bottleneck Graphics Hardware 2008 28

Multithreaded quad-port bank of 16 registers 128 threads Programmability using FPGA-technology Processing Module Graphics Hardware 2008 29

Further Data Implemented on two FPGAs –64 bit DDR DRAM –Interconnection: no overhead Resource usage regs and LUTs –Virtex 2 Pro 100 (kNN): 26% registers, 38% LUTs –Virtex 2 Pro 70 (MLS): 47% registers, 52% LUTs Clock frequency: 75 MHz Graphics Hardware 2008 30

Applications Tested on various applications PCI interface of prototype slow Graphics Hardware 2008 32 [Weyrich et al. 04] [Adams et al. 07]

Results kNN Graphics Hardware 2008 33 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6

Results kNN Graphics Hardware 2008 34 CUDA: x4 CPU: x1.5 FPGA: x1 CUDA: x2.4 CPU: x1.4 FPGA: x1 CUDA w/o sort: x4.0 CUDA: x1.6 CPU: x1.1 FPGA: x1 CUDA w/o sort: x3.1 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries ASIC estimate, 500 MHz x6.6 Small hardware footprint FPGA slightly slower Realistic clock frequency  Prototype faster than CPU/GPU

Results MLS Graphics Hardware 2008 35 FPGA: x1 MLS CPU: x0.4 MLS CUDA x3.8 75 MHz 1200 MHz 2200 MHz Number of Neighbors Number of queries FPGA faster than CPU kNN bottleneck –FPGA –GPU

Coherent Neighbor Cache Graphics Hardware 2008 36 CPU,  =0.1 FPGA, exact FPGA,  =0.1 Level of coherence Number of queries

Results Approximation Error (MLS projection) Graphics Hardware 2008 37  approximation MLS Error no approx.

Results Approximation Error (MLS projection) Graphics Hardware 2008 38 Cache hits Cache Hits  approximation

Approximation Error (visual) Graphics Hardware 2008 39

Approximation Error (visual) Graphics Hardware 2008 40 Coherent Neighbor Cache: Not optimal for exact queries Approximate queries –Can be tolerated in most cases –Greatly increases performance –Even for small approximations

Conclusion Novel hardware architecture for –Nearest-neighbor searches –Generic meshless processing operators Cache exploiting spatial coherence Good performance considering resources Possible GPU integration Graphics Hardware 2008 42

Future Work Programmable data structure –Support different data structures –Programmability in data structure –Construction on-chip ‘Real’ programmability in point processing module Graphics Hardware 2008 43

A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008

A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.

Similar presentations

Presentation on theme: "A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008.

Similar presentations

Presentation on theme: "A Hardware Processing Unit For Point Sets S. Heinzle, G. Guennebaud, M. Botsch, M. Gross Graphics Hardware 2008."— Presentation transcript:

Similar presentations

About project

Feedback