1
2 Design of a 125 W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1, Ting-An Lin 2, and Sheng-Zen Wang 2 1 National Chiao-Tung University, Hsin-Chu, Taiwan 2 MediaTek Inc. Hsin-Chu, Taiwan 2006/7/26
3 Outline Introduction System Specification Improved Memory Hierarchy Low-Power Architectures Design Flow Measured Results Conclusion
4 Motivation Low power demands The power consumption of existing solutions is still not applicable for portable devices. A memory system becomes a critical factor in power budgets. High speed requirements H.264/AVC requires high-speed modules to accomplish the extensive accesses between the memory and logic. 70% 30% Misc. SRAM H.264/AVC Core Power Profiling
5 Design Contributions To reduce power consumption We exploit the memory hierarchy to reduce memory power consumption. We develop low-power architectures to lower the working frequency with only a few additional buffers and an additional logic unit. In addition to the power reduction through architectural levels, an efficient design flow can further reduce the power dissipation.
6 Target Specification FormatResolutionMPixels/sec QCIF176x CIF352x D1720x HD1280x HD1920x Dual Standard H.264/AVC Baseline Profile, Level 4 MPEG-2 Simple Profile, Main Level High Quality Decoding (30fps,4:2:0)
7 System Block Diagram System BUS 8MB SDRAM SDRAM I/F Display I/F In/Post- Loop Filter 4x4/8x8 IDCT + Syntax Parser Line-Pixel-Lookahead Slice Pixel SRAM Entropy Decoder Intra, Inter Prediction Display Engine
8 16 Improved Memory Hierarchy Proposed three-level memory hierarchy 1 st Level 2 nd Level 3 rd Level SDRAM I/O Interface Pipeline Register Intra Pred. Motion Comp. ….. Slice SRAM LPL Unit 32-b bypass request i Slice SRAM stores rows of pixels 24
9 Improved Memory Hierarchy Line-Pixel-Lookahead (LPL) Unit We exploit an LPL unit to eliminate redundant data and thereby reduce memory space. Line-Pixel-Lookahead Functional UnitCondition Deblocking Filter bS=0 Intra Prediction Horizontal Prediction Horizontal-Up Prediction HorizontalHorizontal-Up Slice SRAM (153.6kb) Slice SRAM (19.2kb) LPL Unit w/o LPL unitw/t LPL unit
10 Improved Memory Hierarchy Memory Power Consumption w/o Memory Hierarchy 3-level Memory Hierarchy 3-level Memory Hierarchy + LPL Scheme 44% 11% 51% SRAM Power DRAM Power Memory Power Consumption mW
11 Low-Power Architectures Motion Compensation (MC) We utilize the data reuse of interpolation window by allocating content buffers x4 sub-block x9 content buffers SDRAM 1% cost of MC
12 Low-Power Architectures Deblocking Filter (DF) We reduce the access overhead of different filtering directions by developing novel filtering orders. 4x4 sub-block SRAM % access reduction!!
13 Low-Power Architectures A lower working frequency is sufficient to meet our design specification. Sequencesalesman Resolution1920x1088 Frame rate30fps YUV4:2:0 Pipelined Stage 920cycles/MB 580cycles/MB 380cycles/MB 100MHz 152MHz242MHz This Work Preliminary Improved MC Improved DF
14 Design Flow A design flow for this video decoder 1. Improved Memory Hierarchy (memory size: C ) 2. Motion Compensation (working frequency: f ) 3. Deblocking Filter (working frequency: f ) 1. Improved Memory Hierarchy (memory size: C ) 2. Motion Compensation (working frequency: f ) 3. Deblocking Filter (working frequency: f ) 1. Physical wire-load model (timing closure) 2. Low-power synthesis 3. Timing-aware and SI-prevention routing 1. Physical wire-load model (timing closure) 2. Low-power synthesis 3. Timing-aware and SI-prevention routing 73% power reduction Further 8.2% power reduction System SPEC Architectural Design SynthesisP&R C/C++ ModelRTL DescriptionRTL CompilerSoC Encounter Design LoopTiming/SI Closure Loop Phase 1 Phase 2
15 Measured Results 3.9 mm
16 Measured Results SpecificationDual MPEG-2 H.264/AVC Technology 0.18 m 1P6M CMOS PackageCQFP208 Logic Gates303.78K Internal External Memory 22.75Kb 4MB 2 SRAMs SDRAMs Max. System Clock100MHz Max. Processing Throughput MPixels/sec Chip Summary
17 Measured Results Core power dissipationMPEG-2H.264/AVC 1.8V 1.0V Power Measurement
18 Measured Results Power Measurement Measured accuracy: Voltage scaling Max. working freq. (MHz) (V) 225 W 125 W 1.15MHz (V) 112MHz H.264 Core Power ( W) 31MHz
19 Conclusion A MPEG-2 and H.264/AVC video decoder is developed for dual standard requirements. The tremendous saving in power consumption is attained through both improved memory hierarchy and low-power architectures, and this power can be further reduced through EDA tools. Sub-mW power consumption can be achieved when real-time decoding MPEG-2 or H.264/AVC video sequences for mobile applications at 1V operating voltage.
20 Thanks for your attention!