1. 2 Design of a 125  W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1,

Slides:



Advertisements
Similar presentations
MPEG-2 to H.264/AVC Transcoding Techniques Jun Xin Xilient Inc. Cupertino, CA.
Advertisements

1 A HIGH THROUGHPUT PIPELINED ARCHITECTURE FOR H.264/AVC DEBLOCKING FILTER Kefalas Nikolaos, Theodoridis George VLSI Design Lab. Electrical & Computer.
Memory Address Decoding
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs Mrinmoy Ghosh Hsien-Hsin S. Lee School.
Evaluating an Adaptive Framework For Energy Management in Processor- In-Memory Chips Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas.
System Design Tricks for Low-Power Video Processing Jonah Probell, Director of Multimedia Solutions, ARC International.
Design center Vienna Donau-City-Str. 1 A-1220 Vienna Vers SVEN Scalable Video Engine Gerald Krottendorfer.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
A Low-Power 9-bit Pipelined CMOS ADC for the front-end electronics of the Silicon Tracking System Yuri Bocharov, Vladimir Butuzov, Dmitry Osipov, Andrey.
1 Video Coding Concept Kai-Chao Yang. 2 Video Sequence and Picture Video sequence Large amount of temporal redundancy Intra Picture/VOP/Slice (I-Picture)
In God We Trust Class presentation for the course: “Custom Implementation of DSP systems” Presented by: Mohammad Haji Seyed Javadi May 2013 Instructor:
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Ch. 6- H.264/AVC Part I (pp.160~199) Sheng-kai Lin
Recursive End-to-end Distortion Estimation with Model-based Cross-correlation Approximation Hua Yang, Kenneth Rose Signal Compression Lab University of.
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
Michael A. Baker, Pravin Dalale, Karam S. Chatha, Sarma B. K. Vrudhula
Die-Hard SRAM Design Using Per-Column Timing Tracking
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
Processor Frequency Setting for Energy Minimization of Streaming Multimedia Application by A. Acquaviva, L. Benini, and B. Riccò, in Proc. 9th Internation.
Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.
EEL 6935 Embedded Systems Long Presentation 2 Group Member: Qin Chen, Xiang Mao 4/2/20101.
An Introduction to H.264/AVC and 3D Video Coding.
HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.
HW/SW CODESIGN OF THE MPEG-2 VIDEO DECODER Matjaz Verderber, Andrej Zemva, Andrej Trost University of Ljubljana Faculty of Electrical Engineering Trzaska.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
COOL Chips IV A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi, Won-Jong.
Conference title 1 A WYNER-ZIV TO H.264 VIDEO TRANSCODER José Luis Martínez, Pedro Cuenca, Gerardo Fernández-Escribano, Francisco José Quiles and Hari.
EE 5359 H.264 to VC 1 Transcoding Vidhya Vijayakumar Multimedia Processing Lab MSEE, University of Arlington Guided.
- 1 - A Powerful Dual-mode IP core for a/b Wireless LANs.
Institute of Electronics, National Chiao Tung University VLSI Signal Processing Lab A 242mW, 10mm2 H.264/AVC High Profile Encoder H.264 High Profile Encoder.
1 Background The latest video coding standard H.263 -> MPEG4 Part2 -> MPEG4 Part10/AVC Superior compression performance 50%-70% bitrate saving (H.264 v.s.MPEG-2)
Case Study - SRAM & Caches
Motivation Mobile embedded systems are present in: –Cell phones –PDA’s –MP3 players –GPS units.
1 Sidevõrgud IRT 0020 loeng 723. okt Avo Ots telekommunikatsiooni õppetool raadio- ja sidetehnika instituut
H.264 Deblocking Filter Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
11 1 Customizing Wide-SIMD Architectures for H.264 Sangwon Seo 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 Vijay Sundaram 2, Chaitali Chakrabarti 2 1.
1 A 252Kgates/4.9Kbytes SRAM/71mW Multi-Standard Video Decoder for High Definition Video Applications Motivation A variety of video coding standards Increasing.
“Low-Power, Real-Time Object- Recognition Processors for Mobile Vision Systems”, IEEE Micro Jinwook Oh ; Gyeonghoon Kim ; Injoon Hong ; Junyoung.
A CMOS SoC for 56/18/16 CD/DVD-dual/RAM Applications (ISSCC2006 paper 14.8) Speaker: Bing-Yu Hsieh MediaTek Inc., Hsin-Chu, Taiwan Authors: Jyh-Shin Pan,
A Flexible Multi-Core Platform For Multi-Standard Video Applications Soo-Ik Chae Center for SoC Design Technology Seoul National University MPSoC 2009.
Real-Time HD Harmonic Inc. Real Time, Single Chip High Definition Video Encoder! December 22, 2004.
Power Management of Flash Memory for Portable Devices ELG 4135, Fall 2006 Faculty of Engineering, University of Ottawa November 1, 2006 Thayalan Selvam.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
Figure 1.a AVS China encoder [3] Video Bit stream.
Low-Power Wireless Video System Advisor: Professor Alex Doboli Students: Christian Austin Artur Kasperek Edward Safo.
By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
Shih-Fan, Peng 2013 IEE5008 –Autumn 2013 Memory Systems DRAM Controller for Video Application Shih-Fan, Peng Department of Electronics Engineering National.
Igor Jánoš. Goal of This Project Decode and process a full-HD video clip using only software resources Dimension – 1920 x 1080 pixels.
1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.
The World Leader in High Performance Signal Processing Solutions Multi-core programming frameworks for embedded systems Kaushal Sanghai and Rick Gentile.
A 1.2V 26mW Configurable Multiuser Mobile MIMO-OFDM/-OFDMA Baseband Processor Motivations –Most are single user, SISO, downlink OFDM solutions –Training.
1 Dual-V cc SRAM Class presentation for Advanced VLSIPresenter:A.Sammak Adopted from: M. Khellah,A 4.2GHz 0.3mm 2 256kb Dual-V CC SRAM Building Block in.
Hierarchical Systolic Array Design for Full-Search Block Matching Motion Estimation Noam Gur Arie,August 2005.
Chia-Ho Pan DSPIC/GIEE NTU
PRESENTED BY: MOHAMAD HAMMAM ALSAFRJALANI UFL ECE Dept. 3/31/2010 UFL ECE Dept 1 CACHE OPTIMIZATION FOR AN EMBEDDED MPEG-4 VIDEO DECODER.
App. Specific DRAMs Eyad Al-Hazmi. Roadmap Introduction High-Speed DRAMs Fast DRAMs using Multi Banks Graphics DRAMs Pseudo-SRAMs.
Low Power, High-Throughput AD Converters
CMPT365 Multimedia Systems 1 Media Compression - Video Spring 2015 CMPT 365 Multimedia Systems.
ALPHA 21164PC. Alpha 21164PC High-performance alternative to a Windows NT Personal Computer.
DAC50, Designer Track, 156-VB543 Parallel Design Methodology for Video Codec LSI with High-level Synthesis and FPGA-based Platform Kazuya YOKOHARI, Koyo.
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
What Choices Make A Killer Video Processor Architecture?
Presentation transcript:

1

2 Design of a 125  W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1, Ting-An Lin 2, and Sheng-Zen Wang 2 1 National Chiao-Tung University, Hsin-Chu, Taiwan 2 MediaTek Inc. Hsin-Chu, Taiwan 2006/7/26

3 Outline Introduction System Specification Improved Memory Hierarchy Low-Power Architectures Design Flow Measured Results Conclusion

4 Motivation Low power demands  The power consumption of existing solutions is still not applicable for portable devices. A memory system becomes a critical factor in power budgets. High speed requirements  H.264/AVC requires high-speed modules to accomplish the extensive accesses between the memory and logic. 70% 30% Misc. SRAM H.264/AVC Core Power Profiling

5 Design Contributions To reduce power consumption  We exploit the memory hierarchy to reduce memory power consumption.  We develop low-power architectures to lower the working frequency with only a few additional buffers and an additional logic unit.  In addition to the power reduction through architectural levels, an efficient design flow can further reduce the power dissipation.

6 Target Specification FormatResolutionMPixels/sec QCIF176x CIF352x D1720x HD1280x HD1920x Dual Standard  H.264/AVC Baseline Profile, Level 4  MPEG-2 Simple Profile, Main Level High Quality Decoding (30fps,4:2:0)

7 System Block Diagram System BUS 8MB SDRAM SDRAM I/F Display I/F In/Post- Loop Filter 4x4/8x8 IDCT + Syntax Parser Line-Pixel-Lookahead Slice Pixel SRAM Entropy Decoder Intra, Inter Prediction Display Engine

8 16 Improved Memory Hierarchy Proposed three-level memory hierarchy 1 st Level 2 nd Level 3 rd Level SDRAM I/O Interface Pipeline Register Intra Pred. Motion Comp. ….. Slice SRAM LPL Unit 32-b bypass request i Slice SRAM stores rows of pixels 24

9 Improved Memory Hierarchy Line-Pixel-Lookahead (LPL) Unit  We exploit an LPL unit to eliminate redundant data and thereby reduce memory space. Line-Pixel-Lookahead Functional UnitCondition Deblocking Filter bS=0 Intra Prediction Horizontal Prediction Horizontal-Up Prediction HorizontalHorizontal-Up Slice SRAM (153.6kb) Slice SRAM (19.2kb) LPL Unit w/o LPL unitw/t LPL unit

10 Improved Memory Hierarchy Memory Power Consumption w/o Memory Hierarchy 3-level Memory Hierarchy 3-level Memory Hierarchy + LPL Scheme 44% 11% 51% SRAM Power DRAM Power Memory Power Consumption mW

11 Low-Power Architectures Motion Compensation (MC)  We utilize the data reuse of interpolation window by allocating content buffers x4 sub-block x9 content buffers SDRAM 1% cost of MC

12 Low-Power Architectures Deblocking Filter (DF)  We reduce the access overhead of different filtering directions by developing novel filtering orders. 4x4 sub-block SRAM % access reduction!!

13 Low-Power Architectures A lower working frequency is sufficient to meet our design specification. Sequencesalesman Resolution1920x1088 Frame rate30fps YUV4:2:0 Pipelined Stage 920cycles/MB 580cycles/MB 380cycles/MB 100MHz 152MHz242MHz This Work Preliminary Improved MC Improved DF

14 Design Flow A design flow for this video decoder 1. Improved Memory Hierarchy (memory size: C ) 2. Motion Compensation (working frequency: f ) 3. Deblocking Filter (working frequency: f ) 1. Improved Memory Hierarchy (memory size: C ) 2. Motion Compensation (working frequency: f ) 3. Deblocking Filter (working frequency: f ) 1. Physical wire-load model (timing closure) 2. Low-power synthesis 3. Timing-aware and SI-prevention routing 1. Physical wire-load model (timing closure) 2. Low-power synthesis 3. Timing-aware and SI-prevention routing 73% power reduction Further 8.2% power reduction System SPEC Architectural Design SynthesisP&R C/C++ ModelRTL DescriptionRTL CompilerSoC Encounter Design LoopTiming/SI Closure Loop Phase 1 Phase 2

15 Measured Results 3.9 mm

16 Measured Results SpecificationDual MPEG-2 H.264/AVC Technology 0.18  m 1P6M CMOS PackageCQFP208 Logic Gates303.78K Internal External Memory 22.75Kb 4MB  2 SRAMs SDRAMs Max. System Clock100MHz Max. Processing Throughput MPixels/sec Chip Summary

17 Measured Results Core power dissipationMPEG-2H.264/AVC 1.8V 1.0V Power Measurement

18 Measured Results Power Measurement  Measured accuracy:  Voltage scaling Max. working freq. (MHz) (V) 225  W 125  W 1.15MHz (V) 112MHz H.264 Core Power (  W) 31MHz

19 Conclusion A MPEG-2 and H.264/AVC video decoder is developed for dual standard requirements. The tremendous saving in power consumption is attained through both improved memory hierarchy and low-power architectures, and this power can be further reduced through EDA tools. Sub-mW power consumption can be achieved when real-time decoding MPEG-2 or H.264/AVC video sequences for mobile applications at 1V operating voltage.

20 Thanks for your attention!