Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.

Similar presentations


Presentation on theme: "1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel."— Presentation transcript:

1 1 Modular Refinement of H.264 Kermin Fleming

2 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel blocks –Smaller blocks 4x4, 8x4, 4x8 In-loop deblocking filter Base profile Bluespec implementation –Works on FPGA!

3 3 H.264 Overview

4 4 H.264 Modules NAL unwrap –Unwraps network packets –Byte stream separated by special tags Entropy Decoder –Decodes various slices, parameters –Primarily Golomb encoded –Residual data uses CAVLC Inverse Transform –Reconstructs whole blocks –Quantized frequency coefficients

5 5 H.264 Modules Intra-prediction –Prediction based on previously blocks –Corrected by residual Inter-predication –Correlation between frames –Motion vectors Deblocking filter –Removes prediction artifacts Frame Buffer –Maintains cache of previous frames

6 6 Modular Refinement Latency insensitive design –Data centric –Swap functionally equivalent modules –Design exploration easy Bluespec generates control –Design timing change? –No problem.

7 7 Deblocking Filter Details Block prediction leaves artifacts Apply a smoothing filter across macroblock boundaries Highly configurable Macroblock Filter Order

8 8 Original Implementation Store the whole macroblock Iteratively filter the macroblock Store and stream left macroblock Simple to reason about – very like software BAD!!!! –Highly sequential –Large storage requirements –Wiring:

9 9 Pipelining Sequential execution was a problem Unclear how to pipeline design –Data stored in row major –Can be rotated to column major 16-stage pipeline –Horizontal Filter –Row-to-Column –Vertical Filter –Column-to-Row

10 10 Pipelining Parallelism Improved –Two filtrations per cycle Memory Reduced –5/8 of macroblock stored –Accesses simplified Fewer Filters –Only need one… Design now far more complex –2x code size

11 11 Pipeline Issues Throughput improved, but not perfect Structural Hazards –Loads and Stores to the Above memory –Third and Fourth Macroblocks conflict Both need to be rotated at the same time –Outputing Left Blocks Pipeline drain –Control data shared – Pipeline control state

12 12 Relaxed Memory Ordering Original Sequential Ordering too conservative Above data is not immediately used –Allowing stores to bypass loads –Separate load and store request queues Stalls eliminated –Design complexity stays the same –Artificial dependency removed

13 13 Side Buffering Frequent conflicts between 4x4 blocks Store one of them in a side buffer When the resource is available, release the stored data –Sometimes ordering matters – sometimes not –Memory acts a reorder buffer Encode priority in rule Deadlock can be a problem…

14 14 Other Refinement Pipelined Interpredict rules –Chroma interpolation Improved Interpolator filter implementation Improved memory subsystem –Previously too general –Needless crossbar Interpolation Sampling

15 15 Results

16 16 Results Nearly 60 fps at 1080p Power, area, and throughput improvements Fast Deblocking filter implementation –Faster than any known implementation –Does it really matter?

17 17 Questions?


Download ppt "1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel."

Similar presentations


Ads by Google