NCTU, EE, Vision Lab Implementation and Parallelization of H.264 Based System on Multi-DSPs Board  陳奕安  2008.06.11 1.

Slides:



Advertisements
Similar presentations
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Advertisements

Parallel H.264 Decoding on an Embedded Multicore Processor
Tae-wan You, Seoul National University, Korea
Design center Vienna Donau-City-Str. 1 A-1220 Vienna Vers SVEN Scalable Video Engine Gerald Krottendorfer.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
1 Video Coding Concept Kai-Chao Yang. 2 Video Sequence and Picture Video sequence Large amount of temporal redundancy Intra Picture/VOP/Slice (I-Picture)
Software Architecture of High Efficiency Video Coding for Many-Core Systems with Power- Efficient Workload Balancing Muhammad Usman Karim Khan, Muhammad.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Communication & Multimedia C. -Y. Tsai 2006/4/20 1 Multiview Video Compression Student: Chia-Yang Tsai Advisor: Prof. Hsueh-Ming Hang Institute of Electronics,
1 Adaptive slice-level parallelism for H.264/AVC encoding using pre macroblock mode selection Bongsoo Jung, Byeungwoo Jeon Journal of Visual Communication.
Evaluation of Data-Parallel Splitting Approaches for H.264 Decoding
Ch. 6- H.264/AVC Part I (pp.160~199) Sheng-kai Lin
Overview of Error Resiliency Schemes in H.264/AVC Standard Sunil Kumar, Liyang Xu, Mrinal K. Mandal, and Sethuraman Panchanathan Elsevier Journal of Visual.
Rate-Distortion Optimized Layered Coding with Unequal Error Protection for Robust Internet Video Michael Gallant, Member, IEEE, and Faouzi Kossentini,
1 Single Reference Frame Multiple Current Macroblocks Scheme for Multiple Reference IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY Tung-Chien.
1 Efficient Multithreading Implementation of H.264 Encoder on Intel Hyper- Threading Architectures Steven Ge, Xinmin Tian, and Yen-Kuang Chen IEEE Pacific-Rim.
1 Slice-Balancing H.264 Video Encoding for Improved Scalability of Multicore Decoding Michael Roitzsch Technische Universität Dresden ACM & IEEE international.
Fundamentals of Multimedia Chapter 11 MPEG Video Coding I MPEG-1 and 2
H.264/AVC for Wireless Applications Thomas Stockhammer, and Thomas Wiegand Institute for Communications Engineering, Munich University of Technology, Germany.
EEL 6935 Embedded Systems Long Presentation 2 Group Member: Qin Chen, Xiang Mao 4/2/20101.
1. 1. Problem Statement 2. Overview of H.264/AVC Scalable Extension I. Temporal Scalability II. Spatial Scalability III. Complexity Reduction 3. Previous.
MPEG-2 Digital Video Coding Standard
EE 5359 H.264 to VC 1 Transcoding Vidhya Vijayakumar Multimedia Processing Lab MSEE, University of Arlington Guided.
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Real-Time DSP System Design Course and DSP/BIOS II David J. Waldo Associate Professor Oklahoma Christian University 2501 E. Memorial Rd. Oklahoma City,
1 An Extensible Videoconference Tool for a Collaborative Computing Network Junjun He.
By Sudeep Gangavati ID EE5359 Spring 2012, UT Arlington
Kai-Chao Yang Hierarchical Prediction Structures in H.264/AVC.
Philipp Merkle, Aljoscha Smolic Karsten Müller, Thomas Wiegand CSVT 2007.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
PROJECT INTERIM REPORT HEVC DEBLOCKING FILTER AND ITS IMPLEMENTATION RAKESH SAI SRIRAMBHATLA UTA ID:
Video in future 不屈号的航海长 July, 2009
A Flexible Multi-Core Platform For Multi-Standard Video Applications Soo-Ik Chae Center for SoC Design Technology Seoul National University MPSoC 2009.
EE 5359 PROJECT PROPOSAL FAST INTER AND INTRA MODE DECISION ALGORITHM BASED ON THREAD-LEVEL PARALLELISM IN H.264 VIDEO CODING Project Guide – Dr. K. R.
1 Data Partition for Wavefront Parallelization of H.264 Video Encoder Zhuo Zhao, Ping Liang IEEE ISCAS 2006.
By, ( ) Low Complexity Rate Control for VC-1 to H.264 Transcoding.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison of H.264/MPEG4.
Video Compression Standards for High Definition Video : A Comparative Study Of H.264, Dirac pro And AVS P2 By Sudeep Gangavati EE5359 Spring 2012, UT Arlington.
EE 5359 TOPICS IN SIGNAL PROCESSING PROJECT ANALYSIS OF AVS-M FOR LOW PICTURE RESOLUTION MOBILE APPLICATIONS Under Guidance of: Dr. K. R. Rao Dept. of.
Compression video overview 演講者:林崇元. Outline Introduction Fundamentals of video compression Picture type Signal quality measure Video encoder and decoder.
- By Naveen Siddaraju - Under the guidance of Dr K R Rao Study and comparison between H.264.
NCTU, EE, Vision Lab Implementation of H.264 Based System on Multi-DSPs Board  陳奕安 
A New Coding Mode for Error Resilient Video EE368C Final Presentation Stanford University Sangoh Jeong Mar.8, 2001.
Figure 1.a AVS China encoder [3] Video Bit stream.
1 DSP handling of Video sources and Etherenet data flow Supervisor: Moni Orbach Students: Reuven Yogev Raviv Zehurai Technion – Israel Institute of Technology.
PERFORMANCE ANALYSIS OF AVS-M AND ITS APPLICATION IN MOBILE ENVIRONMENT By Vidur Vajani ( ) Under the guidance of Dr.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
Vamsi Krishna Vegunta University of Texas, Arlington
1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.
Transcoding from H.264/AVC to HEVC
Video Compression—From Concepts to the H.264/AVC Standard
Time Optimization of HEVC Encoder over X86 Processors using SIMD Kushal Shah Advisor: Dr. K. R. Rao Spring 2013 Multimedia.
1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.
PRESENTED BY: MOHAMAD HAMMAM ALSAFRJALANI UFL ECE Dept. 3/31/2010 UFL ECE Dept 1 CACHE OPTIMIZATION FOR AN EMBEDDED MPEG-4 VIDEO DECODER.
1. 2 Design of a 125  W, Fully-Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ching-Che Chung 1, Chen-Yi Lee 1,
Implementation and comparison study of H.264 and AVS china EE 5359 Multimedia Processing Spring 2012 Guidance : Prof K R Rao Pavan Kumar Reddy Gajjala.
DaVinci Overview (features and programming) Kim dong hyouk.
CMPT365 Multimedia Systems 1 Media Compression - Video Spring 2015 CMPT 365 Multimedia Systems.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
ATLAS Pre-Production ROD Status SCT Version
Steven Ge, Xinmin Tian, and Yen-Kuang Chen
EE 445S Real-Time Digital Signal Processing Lab Spring 2017
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
Research Topic Error Concealment Techniques in H.264/AVC for Wireless Video Transmission Vineeth Shetty Kolkeri EE Graduate,UTA.
Supplement, Chapters 6 MC Course, 2009.
PROJECT PROPOSAL HEVC DEBLOCKING FILTER AND ITS IMPLIMENTATION RAKESH SAI SRIRAMBHATLA UTA ID: EE 5359 Under the guidance of DR. K. R. RAO.
Fast Decision of Block size, Prediction Mode and Intra Block for H
Standards Presentation ECE 8873 – Data Compression and Modeling
MPEG-1 MPEG is short for the ‘Moving Picture Experts Group‘.
Presentation transcript:

NCTU, EE, Vision Lab Implementation and Parallelization of H.264 Based System on Multi-DSPs Board  陳奕安 

NCTU, EE, Vision Lab Outline  System Architecture  Multithreading of this system  Reference framework 5  Parallelism of H.264  Memory issue 2

NCTU, EE, Vision Lab System Architecture PC 2MEX Board 2 MEX Board 1 Capture Frame H.264 Encode Send to Network Display H.264 Decode Receive from Network PC 1 PC 2 3

NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 4 Camera Computer

NCTU, EE, Vision Lab PC MEX Host/ MEX Communication DSP started : fill memory Initialize transfer DSP to PCI transfer request Start Transfer Transfer finished Set DSP FIFO Direction Set FIFO Full Flag value DSP FIFO is reset Start EDMA Unreset DSP1 FIFO Clear PCI Interrupt PCI started : wait for interrupt Initialize transfer PCI to DSP start transfer request Wait for transfer finished Transfer finished Set transfer size Set PCI FIFO direction Select DSP data sources Set transfer destination address Start PCI FIFO Clear DSP Interrupt 5 Data transfer from the 4 DSP (SDRAM) to PCI [7]

NCTU, EE, Vision Lab Host/ MEX Communication 6 Data Image

NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 7 Camera Computer

NCTU, EE, Vision Lab Networking of H.264 Video Application Video Coding Layer Network Abstraction Layer Bitstream Adoption Packet Adoption Reconstructed picture VCL Data Parameter Sets NAL-unit H.320 System MPEG-2 System AVC Storage RTP Payload Supplemental Enhancement Information AVC / H.264 Transport H.264 VCL and NAL[6]  H.264 High Level Architecture

NCTU, EE, Vision Lab Transport layer Session layer Networking of H.264 Video MAC header IP header UDP header RDP header Video Packet IP header UDP header RTP header Video Packet UDP header RTP header Video Packet RTP header Video Packet Video Packet Application layer Network layer Data link layer Physical layer NAL-Unit of H.264 TMS320C600 Network Developer’s Kit  Video Packetization

NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 10 Camera Computer

NCTU, EE, Vision Lab  Input buffers  Output buffers I/O buffer management 11 InputingHead Inputing Tail Head Inputing Tail Head Outputing Tail Head Tail HeadTail Outputing

NCTU, EE, Vision Lab  Input / output buffers I/O buffer management 12 Tail Head Inputing Tail Head Outputing Inputing Tail Head Outputing Head Inputing Tail Head Outputing Tail Head Inputing Tail Head Outputing Tail Head Inputing Tail Head Outputing Tail Head

NCTU, EE, Vision Lab System Architecture  Multithreading of this system Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 13 Camera Computer

NCTU, EE, Vision Lab Reference framework for DSP  Reference framework 5 DSP/BIOS, TMS320 DSP Algorithm Standard  Processing flow of RF5 14 SplitJoint F0F0 F1F1 F2F2 V0V0 V1V1 V2V2 14 cell channel task Fi, Vi XDAIS algorithm

NCTU, EE, Vision Lab Reference framework for DSP  Data communication of RF5 SIO : Task & Device SCOM : Task & Task 15 device driver task SIO object data buffer data pointer writer task reader task task SCOM message data buffer data pointer SCOM queue

NCTU, EE, Vision Lab  Data communication of RF5 ICC : Cell& Cell Reference framework for DSP in outin out 3 in out data buffer data pointer cell ICC object describing a buffer element in an a list of pointers to ICC objects

NCTU, EE, Vision Lab  Application Control of RF5 Task Receiving both SCOM messages and control messages Reference framework for DSP 17 task SCOM queue for data messages SCOM message MBX mailbox for control messages

NCTU, EE, Vision Lab  The present system System Architecture Input task H.264 Encode Processing task TX networking task 18 Frame i Frame i+1 Slice NAL Control task Rx

NCTU, EE, Vision Lab  Multithreading of this system System Architecture Input task H.264 Encode Processing task TX networking task 19 Frame i Frame i+1 MB NAL Control task Rx

NCTU, EE, Vision Lab Parallelizing H.264  Task-level Decomposition Divide the algorithm into balance tasks Accelerate each task  Data-level Decomposition GOP-level Parallelism Frame-level Parallelism Slice-level Parallelism Macroblock-level Parallelism 20

NCTU, EE, Vision Lab H.264 Encoder Block Diagram 21 F n (Current) TQReorder Entropy encode ME F’ n-1 (reference) MC Choose Intra prediction Intra prediction F’ n (reconstructed) T -1 Q -1 Filter + - Dn P Inter Intra + - D’n uF’n X NAL

NCTU, EE, Vision Lab H.264 Decoder Block Diagram 22 Reorder Entropy decode F’ n-1 (reference) MC Intra prediction F’ n (reconstructed) T -1 Q -1 Filter P Inter Intra + D’n uF’n - NAL

NCTU, EE, Vision Lab Task-level Decomposition  Task profile for H [2]

NCTU, EE, Vision Lab  H.264 data structure Parallelizing H.264 GOP0GOP1GOP2…GOPn F0F1F2Fn …. Slice 0 Slice 1 Slice 2 …. Slice 3 Video Sequence Group of picture MB0MB1 Frame Slice MB2…MBn Y Cb Cr Macroblock 24

NCTU, EE, Vision Lab Data-level Decomposition  GOP-level Parallelism  High latency, large memory  Frame-level Parallelism  I, P, B frame imbalance  Slice-level Parallelism  Bitrates increase  Macroblock-level Parallelism 25

NCTU, EE, Vision Lab Macroblock-level Parallelism  Spatial parallelism  Temporal parallelism  Spatial & temporal parallelism  Possible data dependencies for macroblock 26 Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Current MB frame i + 1 frame i search window

NCTU, EE, Vision Lab Macroblock-level Parallelism  Spatial parallelism 27 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T3 MB(1,1) T4 MB(2,1) T5 MB(3,1) T6 MB(4,1) T7 MB(0,2) T5 MB(1,2) T6 MB(2,2) T7 MB(3,2) T8 MB(4,2) T9 MB(0,3) T7 MB(1,3) T8 MB(2,3) T9 MB(3,3) T10 MB(4,3) T11 MB(0,4) T9 MB(1,4) T10 MB(2,4) T11 MB(3,4) T12 MB(4,4) T13 MBs processed MBs processing MBs to be process

NCTU, EE, Vision Lab Macroblock-level Parallelism  Temporal parallelism 28 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T6 MB(1,1) T7 MB(2,1) T8 MB(3,1) T9 MB(4,1) T10 MB(0,2) T11 MB(1,2) T12 MB(2,2) T13 MB(3,2) T14 MB(4,2) T15 MB(0,3) T16 MB(1,3) T17 MB(2,3) T18 MB(3,3) T19 MB(4,3) T20 MB(0,4) T21 MB(1,4) T22 MB(2,4) T23 MB(3,4) T24 MB(4,4) T25 MB(0,0) T1 MB(1,0) T2 MB(2,0) T13 MB(3,0) T14 MB(4,0) T15 MB(0,1) T16 MB(1,1) T17 MB(2,1) T18 MB(3,1) T19 MB(4,1) T20 MB(0,2) T21 MB(1,2) T22 MB(2,2) T23 MB(3,2) T24 MB(4,2) T25 MB(0,3) T26 MB(1,3) T27 MB(2,3) T28 MB(3,3) T29 MB(4,3) T30 MB(0,4) T31 MB(1,4) T32 MB(2,4) T33 MB(3,4) T34 MB(4,4) T35 frame i + 1 frame i MBs processed MBs processingMBs to be process

NCTU, EE, Vision Lab Macroblock-level Parallelism  Spatial & temporal parallelism 29 MB(0,0) T5 MB(1,0) T6 MB(2,0) T7 MB(3,0) T8 MB(4,0) T9 MB(0,1) T7 MB(1,1) T8 MB(2,1) T9 MB(3,1) T10 MB(4,1) T11 MB(0,2) T9 MB(1,2) T10 MB(2,2) T11 MB(3,2) T12 MB(4,2) T13 MB(0,3) T11 MB(1,3) T12 MB(2,3) T13 MB(3,3) T14 MB(4,3) T15 MB(0,4) T13 MB(1,4) T14 MB(2,4) T15 MB(3,4) T16 MB(4,4) T17 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T3 MB(1,1) T4 MB(2,1) T5 MB(3,1) T6 MB(4,1) T7 MB(0,2) T5 MB(1,2) T6 MB(2,2) T7 MB(3,2) T8 MB(4,2) T9 MB(0,3) T7 MB(1,3) T8 MB(2,3) T9 MB(3,3) T10 MB(4,3) T11 MB(0,4) T9 MB(1,4) T10 MB(2,4) T11 MB(3,4) T12 MB(4,4) T13 frame i + 1 frame i

NCTU, EE, Vision Lab  Multithreading of this system System Architecture Input task H.264 Encode Processing task TX networking task 30 Frame i Frame i+1 MB NAL Control task Rx

NCTU, EE, Vision Lab Memory Issue 31 L1P Cache Direct Mapped 16Kbytes Total DM642 DSP Core L1D Cache 2-way Set Associated 16Kbytes Total L2 Cache/ Memory 256Kbytes Total Two-level cache architecture of DM642 EDMA Controller peripherals  Limited memory of DM642  Use memory buffer to reduce memory access

NCTU, EE, Vision Lab Memory Issue  Memory hierarchy for inter prediction 32 Memory hierarchy [4]

NCTU, EE, Vision Lab Memory Issue  Slice memory buffer for intra prediction and deblocking filter Slice Memory [5] 33

NCTU, EE, Vision Lab Reference  [1] Texas Instruments, Incorporated “Reference Frameworks for eXpressDSP Software: RF5, An Extensive, High-Density System.” (spru795a)  [2] TC Chen, HC Fang, CJ Lian, CH Tsai “Algorithm analysis and architecture design for HDTV applications - a look at the H.264/AVC video compressor system “IEEE CIRCUITS & DEVICES MAGAZINE MAY/JUNE 2006  [3] Cor Meenderinck, Arnaldo Azevedo and Ben Juurlink “Parallel Scalability of Video Decoders” April 29,  [4] Denolf, K. De Vleeschouwer, et al,, “Memory centric design of an MPEG-4 video encoder”, IEEE Trans. CSVT, Vol. 15, No. 5, pp , May  [5] Tsu-Ming Liu et al., “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications,” ISSCC Digest of Technical Papers, pp , Feb  [6] T. Wiegand et al., “Overview of H.264/AVC Video Coding Standard”, IEEE Trans. on Circ. and Sys. For Video Technology, Vol. 13, No. 7, pp. 560–576, July  [7] VITEC MULTIMEDIA, “MEX User manual Revision 1.7”. 34