Optimization of Mesh Locality for Transparent Vertex Caching Hugues Hoppe Microsoft Research SIGGRAPH 99 Hugues Hoppe Microsoft Research SIGGRAPH 99
Triangle meshes
geometric processing rasterizationrasterization graphics processor frame buffer busbus (e.g. AGP) texture image geometrygeometry verticesverticesfacesfaces mesh in memory System architecture texture cache busbusCPUCPU L2 cache L1 cache bottleneckbottleneck verticesvertices??
Previous work l 16-entry FIFO buffer [Deering95, Chow97] l stack buffer [BarYehuda-Gotsman96] l mesh compression [Taubin-Rossignac98] [Gumhold-Strasser98] … l 16-entry FIFO buffer [Deering95, Chow97] l stack buffer [BarYehuda-Gotsman96] l mesh compression [Taubin-Rossignac98] [Gumhold-Strasser98] … compressed geometry stream geometric processing graphics processor busbus mesh buffer parsing logic v1cv2-v1c v3-v2cv4-v3c v5-v4cc c
Previous work Drawbacks: n only static geometry n new API n not backward compatible Drawbacks: n only static geometry n new API n not backward compatible compressed geometry stream geometric processing graphics processor busbus mesh buffer parsing logic v1cv2-v1c v3-v2cv4-v3c v5-v4cc c
Our approach texture image... texture cache texture image geometric processing rasterizationrasterization graphics processor busbus vertex cache traditional mesh API vertex array indexed strips Optimize ordering! No explicit cache management
applicationapplication Transparent vertex caching l Pros: n animated geometry n application program unchanged n backward compatible on legacy hardware l Cons: n less compression (but still a factor ~2) l Pros: n animated geometry n application program unchanged n backward compatible on legacy hardware l Cons: n less compression (but still a factor ~2) traditional mesh API vertex array indexed strips graphics system
Indexed triangle strips v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v position x y z normal nx ny nz normal color rgba texture1 u v texture2 u v ~ 32 bytes ~ 2 bytes
Cache parameters size ? replacement policy ? vertex cache 16 entries FIFOFIFO
Vertex data access = cache hit = cache miss traditional strips with caching transfer ~0.5 vertex/tri assume in cache transfer ~1.0 vertex/tri
# misses Vertex data access traditional strips with caching transfer ~0.5 vertex/tri transfer ~1.0 vertex/tri
ExampleExample before optimization # misses after optimization 47% bandwidth reduction
Optimization problem Given mesh, find strips minimizing bus bandwidth ( strips correspond to ordering of faces F ) Given mesh, find strips minimizing bus bandwidth ( strips correspond to ordering of faces F ) cache miss rate # vertex indices
Two reordering techniques l Greedy strip-growing n fast: 40,000 faces/sec l Local optimization n improve initial greedy solution n very slow l Greedy strip-growing n fast: 40,000 faces/sec l Local optimization n improve initial greedy solution n very slow
Greedy strip-growing l Inspired by [Chow97] l To decide when to restart strip, perform lookahead cache simulations
When to restart strip? good strip length (cache size 4)
When to restart strip? good strip length strip too long jump in miss rate! (cache size 4)
Lookahead simulations l Perform s simulations (a) restart immediately, after 0 faces (b) restart after 0 < i < s faces l If (a) is best, restart strip
ResultResult traditional long strips face order within strip strip restart
ResultResult traditional long strips greedy strip-growing
ResultResult beforebeforeafterafter 45.8 bytes/triangle 25.5 bytes/triangle
Local optimization Apply perturbations to face ordering if cost is lowered: Initial order F F 1..x-1 F y+1..m F’=Reflect x,y (F) F 1..x-1 F y+1..m F’=Insert1 x,y (F) FyFyFyFy FyFyFyFy F 1..x-1 F y+1..m F’=Insert2 x,y (F) F y-1..y FyFyFyFy FyFyFyFyxxyy FxFxFxFx FxFxFxFx F x..y-1 F y..x FyFyFyFy FyFyFyFy F y-1..y F x..y-2 F x..y-1
ResultResult greedy strip-growing local optimization 25.5 bytes/triangle 24.2 bytes/triangle
Bandwidth Results Improvement by factor of 1.6 – 1.9
Choice of cache size size 16 sufficient for most gain
Cache replacement policy all is OK FIFOFIFOLRULRU (cache size 4)
Cache replacement policy FIFOFIFOLRULRU strips twice as long strips twice as long (cache size 4)
ComparisonComparison FIFOFIFOLRULRU
FIFOFIFOLRULRU ComparisonComparison
SummarySummary l Vertex caching reduces geometry bandwidth by factor of 1.6 to 1.9 l Transparent to application: simply pre-process the models (fast) l Still efficient on legacy hardware l Supports dynamic geometry l Vertex caching reduces geometry bandwidth by factor of 1.6 to 1.9 l Transparent to application: simply pre-process the models (fast) l Still efficient on legacy hardware l Supports dynamic geometry
Future work l Issue of cache size n Find face ordering good for all sizes? n Standardize on size 16? n Reprocess mesh at load time l Interaction with texture caching l Cache efficiency during runtime LOD l Issue of cache size n Find face ordering good for all sizes? n Standardize on size 16? n Reprocess mesh at load time l Interaction with texture caching l Cache efficiency during runtime LOD