Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer.

Similar presentations


Presentation on theme: "Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer."— Presentation transcript:

1 Status – Week 226 Victor Moya

2 Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer.

3 Recursive Rasterization TILE FIFO TRIANGLE SETUP TILE EVAL TILE EVAL TILE EVAL HZ TEST FRAGMENT FIFO TILE EVAL

4 Recursive Rasterization Tile FIFO: Tile FIFO: Store start position of tiles to test/split. Store start position of tiles to test/split. Each tile has the following information: Each tile has the following information: 4 c values (3 edge, 1 z/w): 4 x 32 bits. 4 c values (3 edge, 1 z/w): 4 x 32 bits. Tile level/size: log2(max(maxH, maxV)) bits. Tile level/size: log2(max(maxH, maxV)) bits. –Ex: for 2048x2048, 12 bits. Expand bit: if the tile must be expanded. Expand bit: if the tile must be expanded. For N tile evaluators could be arranged as a NxM queue. For N tile evaluators could be arranged as a NxM queue. Triangle setup could add 1 tile (the full viewport) or N tiles (reduces in 1 the traversal depth). Triangle setup could add 1 tile (the full viewport) or N tiles (reduces in 1 the traversal depth).

5 Recursive Rasterization Expand Tile: New tiles generated. level, no expand level – 1, expand start sample generated sample level, no expand

6 Recursive Rasterization No Expand Tile: new tiles level –1, expand start sample generated sample

7 Recursive Rasterization Tile evaluator: N = 1 Tile evaluator: N = 1 1x1 tiles (no subtiles tested at tile evaluator). 1x1 tiles (no subtiles tested at tile evaluator). Calculates three new sample positions. Calculates three new sample positions. Tests if any triangle fragment is inside the tile: Tests if any triangle fragment is inside the tile: If all the tile 4 corners are negative (outside) for any of the edge equations. If all the tile 4 corners are negative (outside) for any of the edge equations. Performs HZ test. Performs HZ test. Only top level. Only top level. Top level and N middle levels. Top level and N middle levels. All levels. All levels. Generates a 2x2 fragment stamp. Generates a 2x2 fragment stamp.

8 Recursive Rasterization Tile Evaluator: N = 1 Tile Evaluator: N = 1 3 x 4 equations evaluators: 3 x 4 equations evaluators: Linear equations : u = ax + by + c Linear equations : u = ax + by + c 3 edge equations. 3 edge equations. 1 z/w parameter equation. 1 z/w parameter equation. Incremental update: Incremental update: –c new = c start + (a << level) + (b << level). 4 x 4 x (e >= 0) tests: 4 x 4 x (e >= 0) tests: Sample inside/outside triangle. Sample inside/outside triangle. 4 x Z tests. 4 x Z tests. Against the proper HZ level. Against the proper HZ level.

9 Recursive Rasterization CAB << + Equation Evaluator Only for N = 2 Samples NxN

10 Recursive Rasterization Tile Evaluator: N = 2 Tile Evaluator: N = 2 2x2 tile (subtiles/fragments). 2x2 tile (subtiles/fragments). 8 new samples generated. 8 new samples generated. 8 x 4 equation evaluators. 8 x 4 equation evaluators. 8 x (e >= 0) tests. 8 x (e >= 0) tests. 8 x Z tests. 8 x Z tests. Generates 2x2 or 3x3 fragment stamps. Generates 2x2 or 3x3 fragment stamps. EXPAND TILES ARE NO LONGER REQUIRED. EXPAND TILES ARE NO LONGER REQUIRED.

11 Recursive Rasterization CMP HZ Level i CMP AND 4 x Z/W Tile Passes only one value read Tile Start Position Level

12 Recursive Rasterization Tile evaluator critical path? Tile evaluator critical path? Z/W parameter evaluation | HZ access. Z/W parameter evaluation | HZ access. Z Compare/Test (>). Z Compare/Test (>). But could be pipelined: But could be pipelined: Same throughput Same throughput Longer latency. Longer latency. Larger tile queue? Larger tile queue?

13 Recursive Rasterization Each tile evaluator could also work in more than one triangle at the time. Each tile evaluator could also work in more than one triangle at the time. Tile evaluator for 2 triangles: N = 1 Tile evaluator for 2 triangles: N = 1 2 x 3 x 4 equation evaluators. 2 x 3 x 4 equation evaluators. 2 x 3 x 4 e >=0 tests. 2 x 3 x 4 e >=0 tests. 2 x 4 Z tests. 2 x 4 Z tests. Generates 2 2x2 fragment stamps. Generates 2 2x2 fragment stamps.

14 Recursive Rasterization Benefits: Benefits: Single access to the HZ hierarchy. Single access to the HZ hierarchy. Increases throughput. Increases throughput. Shares latency for first fragment. Shares latency for first fragment. Problems: Problems: Overlaping triangles. Overlaping triangles. Produces more tiles. Produces more tiles. Produces more fragments per cycle (but that would also happen with N=2). Produces more fragments per cycle (but that would also happen with N=2).

15 Recursive Rasterization Triangle Setup Recursive Rasterization Hierarchical Z Fragment FIFO Simulator Boxes

16 Recursive Rasterization Recursive Rasterization newTile newTriangle newFragment HZ Update (Tile level) Recursive Rasterization Box and Signals

17 Hierarchical Z Buffer Multiple levels. Multiple levels. Level 0: Level 0: 1 to 16 registers. 1 to 16 registers. Level 1: Level 1: Fixed size ~64 KB. Fixed size ~64 KB. Maps to 4x4, 8x8 or 16x16 tiles (relative to viewport size). Maps to 4x4, 8x8 or 16x16 tiles (relative to viewport size). Z-Buffer. Z-Buffer. Add additional level(s) between 0 and 1. Add additional level(s) between 0 and 1. More memory. More memory. Latency for updates? Latency for updates? Comparators? Comparators?

18 Hierarchical Z Buffer Z-Buffer could be accessed at the fragment level. Z-Buffer could be accessed at the fragment level. Good for prefetching? Good for prefetching? But would require N reads. But would require N reads. Provide the full Z cache line. Provide the full Z cache line. Fragments stamps (NxM) map to a single Z cache line. Fragments stamps (NxM) map to a single Z cache line.

19 Hierarchical Z Buffer Update mechanism: Update mechanism: Write to Z-Buffer. Write to Z-Buffer. At cache misses pack/compress the cache line and calculate the larger Z value for that line. At cache misses pack/compress the cache line and calculate the larger Z value for that line. Propagate upwards the Z value of the line. Propagate upwards the Z value of the line. Could require a lot of comparations for top levels. Could require a lot of comparations for top levels. Expensive? Expensive?

20 Hierarchical Z Buffer Access: Access: Tiles: Tiles: At tile evaluators: At tile evaluators: For fragments: For fragments: At tile evaluators: At tile evaluators: –Shares hardware. –Larger latency for tile evaluators at fragment level. –Use HZ Level 1 for tiles larger than a stamp but smaller than a HZ level 1 block. At an HZ test stage before Fragment FIFO: At an HZ test stage before Fragment FIFO: –Smaller latency for tile evaluators. –Access to Z buffer.

21 Hierarchical Z Buffer Location: Location: Level 0: Level 0: At tile evaluators (duplicated): At tile evaluators (duplicated): –Better latency. –Broadcast for updates. At HZ separated memory: At HZ separated memory: –Worst latency? –Shared => multiported!!!!.

22 Hierarchical Z Buffer Location: Location: Level 1: Level 1: At tile evaluators: At tile evaluators: –Too large. At HZ separated on die memory: At HZ separated on die memory: –Better solution? –Must be multiported: 1 access per tile evaluator. 1 access per tile evaluator. Multiple sets? Multiple sets? Set conflict? Set conflict? At video memory: At video memory: –For very large HZ buffers or very small precission? –Access time? –HZ cache?

23 Hierarchical Z Buffer Location: Location: Z-Buffer: Z-Buffer: On video memory. On video memory. –Compressed. –Reduce read/write bandwidth usage. Cache on die. Cache on die. –Uncompressed. Hardware packer/unpacker: Hardware packer/unpacker: –Used also for the HZ update.

24 Hierarchical Z Buffer Sizes Sizes Level 0: Level 0: Register kind access. Register kind access. 1 – 2 cycles max. 1 – 2 cycles max. 1 – 32 values. 1 – 32 values. Block size depends of the viewport resolution: Block size depends of the viewport resolution: –2048x2048 1 register: 2048x2048 block. 1 register: 2048x2048 block. 16 registers: 128x128 blocks 16 registers: 128x128 blocks 32 registers: 64x64 blocks. 32 registers: 64x64 blocks.

25 Hierarchical Z Buffer Level 1: Level 1: On die memory: On die memory: –Fixed size. –Limited by technology. –Estimation: 2048x2048 viewport. 2048x2048 viewport. 8x8 blocks. 8x8 blocks. 16 bits Z value. 16 bits Z value. 128KB. 128KB. Video memory: Video memory: –Unlimited? –Variable size. –But larger access time!!! –Requires cache. –Update!!

26 Hierarchical Z Buffer Z-Buffer: Z-Buffer: Video memory. Video memory. Unlimited size. Unlimited size. Large access time. Large access time. 32 bits (with stencil) per value. 32 bits (with stencil) per value. Cache line size: Cache line size: –HZ block 8x8 (tiled): 2048 bits. –HZ block row/column 8: 256 bits. –Less than an HZ block row/column.

27 Hierarchical Z Buffer Packer/Unpacker: Packer/Unpacker: Diferential ‘whatever is called’ compression. Diferential ‘whatever is called’ compression. Calculate max Z value in cache line. Calculate max Z value in cache line. Z block type: Z block type: 00: cleared Z line. 00: cleared Z line. 01: uncompressed. 01: uncompressed. 10: compression 1. 10: compression 1. 11: compression 2. 11: compression 2. Two compression levels: Two compression levels: 4 bits per value. 4 bits per value. 16 bits per value. 16 bits per value.

28 Hierarchical Z Buffer HZ update hardware. HZ update hardware. If a cache line has the size of a HZ level 1 block If a cache line has the size of a HZ level 1 block Nothing, just store. Nothing, just store. If a cache line is smaller than a HZ level 1 block. If a cache line is smaller than a HZ level 1 block. Must be stored in a combine cache. Must be stored in a combine cache. –Stores the current larger value for HZ level 1 block. –Stores if a HZ block has been fully updated. When a combine cache line is full the HZ level 1 can be updated. When a combine cache line is full the HZ level 1 can be updated. Use a FIFO policy for the combining cache (only space locality). Use a FIFO policy for the combining cache (only space locality). Combining cache size? Combining cache size?

29 Hierarchical Z Buffer Hierarchical Z Buffer. Hierarchical Z Buffer. How to update HZ level 0? How to update HZ level 0? For a 2048x2048 viewport a 2x2 level 0 has 128x128 HZ level 1 blocks. For a 2048x2048 viewport a 2x2 level 0 has 128x128 HZ level 1 blocks. Compare the new HZ level 1 value against all the other HZ level 1 would require too much hardware or too much time. Compare the new HZ level 1 value against all the other HZ level 1 would require too much hardware or too much time. Combining cache for level 0: Combining cache for level 0: 4 combining lines (2x2). 4 combining lines (2x2). Stores the further Z value written for the level 0 block. Stores the further Z value written for the level 0 block. Update when the full level 0 block is written. Update when the full level 0 block is written. –!!! Could never happen !!!

30 Hierarchical Z Buffer L0 Combine Cache. L0 Combine Cache. Size: Size: 2x2 HZ Level 0 buffer. 2x2 HZ Level 0 buffer. 256x256 HZ Level 1 buffer. 256x256 HZ Level 1 buffer. –at 2048x2048 HZ L1 block is 8x8. 128x128 L1 blocks per L0 block 128x128 L1 blocks per L0 block –16Kbit Mask for each L0 combine cache entry !!!!

31 Hierarchical Z Buffer Solution: Solution: –Add a L0 line combine cache. 128 bit mask for line combine cache entry. 128 bit mask for line combine cache entry. Size: 4 entries? Size: 4 entries? Update when line full to L0 combine cache. Update when line full to L0 combine cache. FIFO? FIFO? –128 bit mask for L0 combine cache entry. 1 Z value per entry, 1 Z comparator per entry. 1 Z value per entry, 1 Z comparator per entry. No replacement!! No replacement!! Only update. Only update. 1Kbit for bitmasks. 1Kbit for bitmasks. 128 bits for Z values (16 bits). 128 bits for Z values (16 bits). 8 Z comparators (16 bits). 8 Z comparators (16 bits).

32 Hierarchical Z Buffer L1 Combine Cache: L1 Combine Cache: Size: Size: Z cache line: 8 Z values (256 bits). Z cache line: 8 Z values (256 bits). 2048x2048 viewport: 2048x2048 viewport: –8x8 L1 blocks. –8bit bitmask per L1 combine cache entry. 4096x4096 viewport: 4096x4096 viewport: –16x16 L1 blocks. –16bit bitmask per L1 combine cache entry. 1 Z value and 1 Z comparator per entry. 1 Z value and 1 Z comparator per entry. FIFO replacement policy? FIFO replacement policy?

33 Hierarchical Z Buffer Number of entries: 256 Number of entries: 256 –4Kbits for bitmask. –4Kbits for Z values (16bit). –256 Z comparators. Fully associative. Fully associative.

34 Hierarchical Z Buffer Combine Cache Z cache line max Z HZ Level 1 Buffer HZ Level 0 Buffer Level 1 block Z Level 1 combine cache Level 0 combine cache Level 0 block Z lines blocks

35 Hierarchical Z Buffer TmpZBitMask Combine cache line - TmpZ stores max Z value written in the block. - BitMask stores a mask with the written positions in the block

36 Hierarchical Z Buffer Optimized access to HZ from the Tile Evaluators. Optimized access to HZ from the Tile Evaluators. Child tiles can reuse HZ Z value read for parent tiles. Child tiles can reuse HZ Z value read for parent tiles. Add a Z value at each tile in the Tile Buffer. Add a Z value at each tile in the Tile Buffer. Initialized with HZ L0 Z value at the proper tile level. Initialized with HZ L0 Z value at the proper tile level. Reuse tile Z value until tile level is the same as HZ L1. Reuse tile Z value until tile level is the same as HZ L1. At 8x8 for example for 8x8 HZ L1 blocks. At 8x8 for example for 8x8 HZ L1 blocks. Final fragment stamps can be smaller than HZ L1 blocks. Final fragment stamps can be smaller than HZ L1 blocks.

37 Hierarchical Z Buffer Optimized access from Tile Evaluators: Optimized access from Tile Evaluators: Fetch of HZ L1 is done at 8x8 tile level and reused for the fragment stamp. Fetch of HZ L1 is done at 8x8 tile level and reused for the fragment stamp. That implies that there is no latency penalty for fragments. That implies that there is no latency penalty for fragments. In fact it doesn’t have to access the HZ L1 buffer at stamp level. In fact it doesn’t have to access the HZ L1 buffer at stamp level. 0 cycles for fragment stamp HZ test. 0 cycles for fragment stamp HZ test. Reduces accesses to the HZ buffer. Reduces accesses to the HZ buffer.

38 Rasterization

39 Rasterization

40 Rasterization

41 Rasterization

42 Rasterization

43 Rasterization

44 Rasterization

45 Rasterization


Download ppt "Status – Week 226 Victor Moya. Summary Recursive descent. Recursive descent. Hierarchical Z Buffer. Hierarchical Z Buffer."

Similar presentations


Ads by Google