Optimization of Mesh Locality for Transparent Vertex Caching Hugues Hoppe Microsoft Research SIGGRAPH 99 Hugues Hoppe Microsoft Research SIGGRAPH 99.

Slides:



Advertisements
Similar presentations
Lapped textures Emil Praun Adam Finkelstein Hugues Hoppe
Advertisements

Geometry Clipmaps: Terrain Rendering Using Nested Regular Grids
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
361 Computer Architecture Lecture 15: Cache Memory
CMSC 611: Advanced Computer Architecture Cache Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
1 Thursday, July 06, 2006 “Experience is something you don't get until just after you need it.” - Olivier.
Fast Triangle Reordering for Vertex Locality and Reduced Overdraw Pedro V. Sander Hong Kong University of Science and Technology Diego Nehab Princeton.
Memory-Efficient Sliding Window Progressive Meshes Pavlo Turchyn University of Jyvaskyla.
Geometry Compression Michael Deering, Sun Microsystems SIGGRAPH (1995) Presented by: Michael Chung.
Real-Time Rendering POLYGONAL TECHNIQUES Lecture 05 Marina Gavrilova.
CSC 4250 Computer Architectures December 8, 2006 Chapter 5. Memory Hierarchy.
New quadric metric for simplifying meshes with appearance attributes Hugues Hoppe Microsoft Research IEEE Visualization 1999 Hugues Hoppe Microsoft Research.
Discontinuity Edge Overdraw
View-Dependent Refinement of Progressive Meshes Hugues Hoppe Microsoft Research SIGGRAPH 97.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Meshes Dr. Scott Schaefer. 3D Surfaces Vertex Table.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
Lapped Textures Emil Praun Adam Finkelstein Hugues Hoppe Emil Praun Adam Finkelstein Hugues Hoppe Princeton University Microsoft Research Princeton University.
Numerical geometry of non-rigid shapes
Irregular to Completely Regular Meshing in Computer Graphics Hugues Hoppe Microsoft Research International Meshing Roundtable 2002/09/17 Hugues Hoppe Microsoft.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
3D Rendering & Algorithms__ Sean Reichel & Chester Gregg a.k.a. “The boring stuff happening behind the video games you really want to play right now.”
Raghu Machiraju Slides: Courtesy - Prof. Huamin Wang, CSE, OSU
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
Memory Management ◦ Operating Systems ◦ CS550. Paging and Segmentation  Non-contiguous memory allocation  Fragmentation is a serious problem with contiguous.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
Modeling and representation 2 – the economics of polygon meshes 3.1 Compressing polygonal models 3.2 Compressing the geometry (information per vertex)
CSNB123 coMPUTER oRGANIZATION
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
CSCE 552 Fall D Models By Jijun Tang. Triangles Fundamental primitive of pipelines  Everything else constructed from them  (except lines and point.
Level of Detail & Visibility: A Brief Overview David Luebke University of Virginia.
Mesh Data Structure. Meshes Boundary edge: adjacent to 1 face Regular edge: adjacent to 2 faces Singular edge: adjacent to >2 faces Mesh: straight-line.
Transmeta and Dynamic Code Optimization Ashwin Bharambe Mahim Mishra Matthew Rosencrantz.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
By Andrew Yee. Virtual Memory Memory Management What is Page Replacement?
User Interface Programming in C#: Direct Manipulation Chris North CS 3724: HCI.
Virtual Memory 1 1.
- Laboratoire d'InfoRmatique en Image et Systèmes d'information
CSCE 552 Spring D Models By Jijun Tang. Triangles Fundamental primitive of pipelines  Everything else constructed from them  (except lines and.
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
1 CSCI 2510 Computer Organization Memory System II Cache In Action.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Mesh Skinning Sébastien Dominé. Agenda Introduction to Mesh Skinning 2 matrix skinning 4 matrix skinning with lighting Complex skinning for character.
Rendering Large Models (in real time)
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
CS161 – Design and Architecture of Computer
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
CS161 – Design and Architecture of Computer
CSC 4250 Computer Architectures
Computer Graphics Index Buffers
5.2 Eleven Advanced Optimizations of Cache Performance
Cache Memory Presentation I
Virtual Memory Chapter 8.
Module IV Memory Organization.
Performance metrics for caches
Performance metrics for caches
CS 704 Advanced Computer Architecture
Performance metrics for caches
Performance metrics for caches
Presentation transcript:

Optimization of Mesh Locality for Transparent Vertex Caching Hugues Hoppe Microsoft Research SIGGRAPH 99 Hugues Hoppe Microsoft Research SIGGRAPH 99

Triangle meshes

geometric processing rasterizationrasterization graphics processor frame buffer busbus (e.g. AGP) texture image geometrygeometry verticesverticesfacesfaces mesh in memory System architecture texture cache busbusCPUCPU L2 cache L1 cache bottleneckbottleneck verticesvertices??

Previous work l 16-entry FIFO buffer [Deering95, Chow97] l stack buffer [BarYehuda-Gotsman96] l mesh compression [Taubin-Rossignac98] [Gumhold-Strasser98] … l 16-entry FIFO buffer [Deering95, Chow97] l stack buffer [BarYehuda-Gotsman96] l mesh compression [Taubin-Rossignac98] [Gumhold-Strasser98] … compressed geometry stream geometric processing graphics processor busbus mesh buffer parsing logic v1cv2-v1c v3-v2cv4-v3c v5-v4cc c

Previous work Drawbacks: n only static geometry n new API n not backward compatible Drawbacks: n only static geometry n new API n not backward compatible compressed geometry stream geometric processing graphics processor busbus mesh buffer parsing logic v1cv2-v1c v3-v2cv4-v3c v5-v4cc c

Our approach texture image... texture cache texture image geometric processing rasterizationrasterization graphics processor busbus vertex cache traditional mesh API vertex array indexed strips Optimize ordering! No explicit cache management

applicationapplication Transparent vertex caching l Pros: n animated geometry n application program unchanged n backward compatible on legacy hardware l Cons: n less compression (but still a factor ~2) l Pros: n animated geometry n application program unchanged n backward compatible on legacy hardware l Cons: n less compression (but still a factor ~2) traditional mesh API vertex array indexed strips graphics system

Indexed triangle strips v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v position x y z normal nx ny nz normal color rgba texture1 u v texture2 u v ~ 32 bytes ~ 2 bytes

Cache parameters size ? replacement policy ? vertex cache 16 entries FIFOFIFO

Vertex data access = cache hit = cache miss traditional strips with caching transfer ~0.5 vertex/tri assume in cache transfer ~1.0 vertex/tri

# misses Vertex data access traditional strips with caching transfer ~0.5 vertex/tri transfer ~1.0 vertex/tri

ExampleExample before optimization # misses after optimization 47% bandwidth reduction

Optimization problem Given mesh, find strips minimizing bus bandwidth ( strips correspond to ordering of faces F ) Given mesh, find strips minimizing bus bandwidth ( strips correspond to ordering of faces F ) cache miss rate # vertex indices 

Two reordering techniques l Greedy strip-growing n fast: 40,000 faces/sec l Local optimization n improve initial greedy solution n very slow l Greedy strip-growing n fast: 40,000 faces/sec l Local optimization n improve initial greedy solution n very slow

Greedy strip-growing l Inspired by [Chow97] l To decide when to restart strip, perform lookahead cache simulations

When to restart strip? good strip length (cache size 4)

When to restart strip? good strip length strip too long  jump in miss rate! (cache size 4)

Lookahead simulations l Perform s simulations (a) restart immediately, after 0 faces (b) restart after 0 < i < s faces l If (a) is best, restart strip

ResultResult traditional long strips face order within strip strip restart

ResultResult traditional long strips greedy strip-growing

ResultResult beforebeforeafterafter 45.8 bytes/triangle 25.5 bytes/triangle

Local optimization Apply perturbations to face ordering if cost is lowered: Initial order F F 1..x-1 F y+1..m F’=Reflect x,y (F) F 1..x-1 F y+1..m F’=Insert1 x,y (F) FyFyFyFy FyFyFyFy F 1..x-1 F y+1..m F’=Insert2 x,y (F) F y-1..y FyFyFyFy FyFyFyFyxxyy FxFxFxFx FxFxFxFx F x..y-1 F y..x FyFyFyFy FyFyFyFy F y-1..y F x..y-2 F x..y-1

ResultResult greedy strip-growing local optimization 25.5 bytes/triangle 24.2 bytes/triangle

Bandwidth Results Improvement by factor of 1.6 – 1.9

Choice of cache size size 16 sufficient for most gain

Cache replacement policy  all is OK FIFOFIFOLRULRU (cache size 4)

Cache replacement policy FIFOFIFOLRULRU strips twice as long strips twice as long (cache size 4)

ComparisonComparison FIFOFIFOLRULRU

FIFOFIFOLRULRU ComparisonComparison

SummarySummary l Vertex caching reduces geometry bandwidth by factor of 1.6 to 1.9 l Transparent to application: simply pre-process the models (fast) l Still efficient on legacy hardware l Supports dynamic geometry l Vertex caching reduces geometry bandwidth by factor of 1.6 to 1.9 l Transparent to application: simply pre-process the models (fast) l Still efficient on legacy hardware l Supports dynamic geometry

Future work l Issue of cache size n Find face ordering good for all sizes? n Standardize on size 16? n Reprocess mesh at load time l Interaction with texture caching l Cache efficiency during runtime LOD l Issue of cache size n Find face ordering good for all sizes? n Standardize on size 16? n Reprocess mesh at load time l Interaction with texture caching l Cache efficiency during runtime LOD