Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. 2 Design of a 125  W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1,

Similar presentations


Presentation on theme: "1. 2 Design of a 125  W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1,"— Presentation transcript:

1 1

2 2 Design of a 125  W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1, Ting-An Lin 2, and Sheng-Zen Wang 2 1 National Chiao-Tung University, Hsin-Chu, Taiwan 2 MediaTek Inc. Hsin-Chu, Taiwan 2006/7/26

3 3 Outline Introduction System Specification Improved Memory Hierarchy Low-Power Architectures Design Flow Measured Results Conclusion

4 4 Motivation Low power demands  The power consumption of existing solutions is still not applicable for portable devices. A memory system becomes a critical factor in power budgets. High speed requirements  H.264/AVC requires high-speed modules to accomplish the extensive accesses between the memory and logic. 70% 30% Misc. SRAM H.264/AVC Core Power Profiling

5 5 Design Contributions To reduce power consumption  We exploit the memory hierarchy to reduce memory power consumption.  We develop low-power architectures to lower the working frequency with only a few additional buffers and an additional logic unit.  In addition to the power reduction through architectural levels, an efficient design flow can further reduce the power dissipation.

6 6 Target Specification FormatResolutionMPixels/sec QCIF176x1440.57 CIF352x2884.56 D1720x48015.5 720HD1280x72041.5 1080HD1920x108894.0 Dual Standard  H.264/AVC Baseline Profile, Level 4  MPEG-2 Simple Profile, Main Level High Quality Decoding (30fps,4:2:0)

7 7 System Block Diagram System BUS 8MB SDRAM SDRAM I/F Display I/F In/Post- Loop Filter 4x4/8x8 IDCT + Syntax Parser Line-Pixel-Lookahead Slice Pixel SRAM Entropy Decoder Intra, Inter Prediction Display Engine

8 8 16 Improved Memory Hierarchy Proposed three-level memory hierarchy 1 st Level 2 nd Level 3 rd Level SDRAM I/O Interface Pipeline Register Intra Pred. Motion Comp. ….. Slice SRAM LPL Unit 32-b bypass request i Slice SRAM stores rows of pixels 24

9 9 Improved Memory Hierarchy Line-Pixel-Lookahead (LPL) Unit  We exploit an LPL unit to eliminate redundant data and thereby reduce memory space. Line-Pixel-Lookahead Functional UnitCondition Deblocking Filter bS=0 Intra Prediction Horizontal Prediction Horizontal-Up Prediction HorizontalHorizontal-Up Slice SRAM (153.6kb) Slice SRAM (19.2kb) LPL Unit w/o LPL unitw/t LPL unit

10 10 Improved Memory Hierarchy Memory Power Consumption w/o Memory Hierarchy 3-level Memory Hierarchy 3-level Memory Hierarchy + LPL Scheme 44% 11% 51% SRAM Power DRAM Power Memory Power Consumption 20 40 60 mW

11 11 Low-Power Architectures Motion Compensation (MC)  We utilize the data reuse of interpolation window by allocating content buffers. 01 23 45 67 4x4 sub-block 01 23 45 67 6x9 content buffers 01 23 45 67 SDRAM 1% cost of MC

12 12 Low-Power Architectures Deblocking Filter (DF)  We reduce the access overhead of different filtering directions by developing novel filtering orders. 4x4 sub-block 1 2 5 6 9 1014 13 17181920 21222324 1 2 3 4 7 12 11 1315 101416 5 6 8 9 15 17 21 SRAM 13 5 6 50% access reduction!!

13 13 Low-Power Architectures A lower working frequency is sufficient to meet our design specification. Sequencesalesman Resolution1920x1088 Frame rate30fps YUV4:2:0 Pipelined Stage 920cycles/MB 580cycles/MB 380cycles/MB 100MHz 152MHz242MHz This Work Preliminary Improved MC Improved DF

14 14 Design Flow A design flow for this video decoder 1. Improved Memory Hierarchy (memory size: C ) 2. Motion Compensation (working frequency: f ) 3. Deblocking Filter (working frequency: f ) 1. Improved Memory Hierarchy (memory size: C ) 2. Motion Compensation (working frequency: f ) 3. Deblocking Filter (working frequency: f ) 1. Physical wire-load model (timing closure) 2. Low-power synthesis 3. Timing-aware and SI-prevention routing 1. Physical wire-load model (timing closure) 2. Low-power synthesis 3. Timing-aware and SI-prevention routing 73% power reduction Further 8.2% power reduction System SPEC Architectural Design SynthesisP&R C/C++ ModelRTL DescriptionRTL CompilerSoC Encounter Design LoopTiming/SI Closure Loop Phase 1 Phase 2

15 15 Measured Results 3.9 mm

16 16 Measured Results SpecificationDual MPEG-2 SP@ML H.264/AVC BL@L4 Technology 0.18  m 1P6M CMOS PackageCQFP208 Logic Gates303.78K Internal External Memory 22.75Kb 4MB  2 SRAMs SDRAMs Max. System Clock100MHz Max. Processing Throughput 101.04 MPixels/sec Chip Summary

17 17 Measured Results Core power dissipationMPEG-2H.264/AVC 1.8V Core1080HD@100MHz89.46mW102.3mW 720HD@45MHz41.76mW48.24mW D1@16.6MHz15.60mW18.54mW CIF@4.6MHz4.68mW4.86mW QCIF@1.15MHz0.194mW0.225mW 1.0V CoreQCIF@1.15MHz0.108mW0.125mW Power Measurement

18 18 Measured Results Power Measurement  Measured accuracy:  Voltage scaling Max. working freq. (MHz) 1.81.61.41.21.0 (V) 225  W 125  W QCIF@15fps 1.15MHz 1.81.61.41.21.0 (V) 112MHz H.264 Core Power (  W) 31MHz

19 19 Conclusion A MPEG-2 SP@ML and H.264/AVC BL@L4 video decoder is developed for dual standard requirements. The tremendous saving in power consumption is attained through both improved memory hierarchy and low-power architectures, and this power can be further reduced through EDA tools. Sub-mW power consumption can be achieved when real-time decoding MPEG-2 or H.264/AVC video sequences for mobile applications at 1V operating voltage.

20 20 Thanks for your attention!


Download ppt "1. 2 Design of a 125  W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1,"

Similar presentations


Ads by Google