Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCTU, EE, Vision Lab Implementation and Parallelization of H.264 Based System on Multi-DSPs Board  陳奕安  2008.06.11 1.

Similar presentations


Presentation on theme: "NCTU, EE, Vision Lab Implementation and Parallelization of H.264 Based System on Multi-DSPs Board  陳奕安  2008.06.11 1."— Presentation transcript:

1 NCTU, EE, Vision Lab Implementation and Parallelization of H.264 Based System on Multi-DSPs Board  陳奕安 

2 NCTU, EE, Vision Lab Outline  System Architecture  Multithreading of this system  Reference framework 5  Parallelism of H.264  Memory issue 2

3 NCTU, EE, Vision Lab System Architecture PC 2MEX Board 2 MEX Board 1 Capture Frame H.264 Encode Send to Network Display H.264 Decode Receive from Network PC 1 PC 2 3

4 NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 4 Camera Computer

5 NCTU, EE, Vision Lab PC MEX Host/ MEX Communication DSP started : fill memory Initialize transfer DSP to PCI transfer request Start Transfer Transfer finished Set DSP FIFO Direction Set FIFO Full Flag value DSP FIFO is reset Start EDMA Unreset DSP1 FIFO Clear PCI Interrupt PCI started : wait for interrupt Initialize transfer PCI to DSP start transfer request Wait for transfer finished Transfer finished Set transfer size Set PCI FIFO direction Select DSP data sources Set transfer destination address Start PCI FIFO Clear DSP Interrupt 5 Data transfer from the 4 DSP (SDRAM) to PCI [7]

6 NCTU, EE, Vision Lab Host/ MEX Communication 6 Data Image

7 NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 7 Camera Computer

8 NCTU, EE, Vision Lab Networking of H.264 Video Application Video Coding Layer Network Abstraction Layer Bitstream Adoption Packet Adoption Reconstructed picture VCL Data Parameter Sets NAL-unit H.320 System MPEG-2 System AVC Storage RTP Payload Supplemental Enhancement Information AVC / H.264 Transport H.264 VCL and NAL[6]  H.264 High Level Architecture

9 NCTU, EE, Vision Lab Transport layer Session layer Networking of H.264 Video MAC header IP header UDP header RDP header Video Packet IP header UDP header RTP header Video Packet UDP header RTP header Video Packet RTP header Video Packet Video Packet Application layer Network layer Data link layer Physical layer NAL-Unit of H.264 TMS320C600 Network Developer’s Kit  Video Packetization

10 NCTU, EE, Vision Lab System Architecture Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 10 Camera Computer

11 NCTU, EE, Vision Lab  Input buffers  Output buffers I/O buffer management 11 InputingHead Inputing Tail Head Inputing Tail Head Outputing Tail Head Tail HeadTail Outputing

12 NCTU, EE, Vision Lab  Input / output buffers I/O buffer management 12 Tail Head Inputing Tail Head Outputing Inputing Tail Head Outputing Head Inputing Tail Head Outputing Tail Head Inputing Tail Head Outputing Tail Head Inputing Tail Head Outputing Tail Head

13 NCTU, EE, Vision Lab System Architecture  Multithreading of this system Input task H.264 Encode Processing task TX networking task RX networking task H.264 Decode processing task Output task 13 Camera Computer

14 NCTU, EE, Vision Lab Reference framework for DSP  Reference framework 5 DSP/BIOS, TMS320 DSP Algorithm Standard  Processing flow of RF5 14 SplitJoint F0F0 F1F1 F2F2 V0V0 V1V1 V2V2 14 cell channel task Fi, Vi XDAIS algorithm

15 NCTU, EE, Vision Lab Reference framework for DSP  Data communication of RF5 SIO : Task & Device SCOM : Task & Task 15 device driver task SIO object data buffer data pointer writer task reader task task SCOM message data buffer data pointer SCOM queue

16 NCTU, EE, Vision Lab  Data communication of RF5 ICC : Cell& Cell Reference framework for DSP in outin out 3 in out data buffer data pointer cell ICC object describing a buffer element in an a list of pointers to ICC objects

17 NCTU, EE, Vision Lab  Application Control of RF5 Task Receiving both SCOM messages and control messages Reference framework for DSP 17 task SCOM queue for data messages SCOM message MBX mailbox for control messages

18 NCTU, EE, Vision Lab  The present system System Architecture Input task H.264 Encode Processing task TX networking task 18 Frame i Frame i+1 Slice NAL Control task Rx

19 NCTU, EE, Vision Lab  Multithreading of this system System Architecture Input task H.264 Encode Processing task TX networking task 19 Frame i Frame i+1 MB NAL Control task Rx

20 NCTU, EE, Vision Lab Parallelizing H.264  Task-level Decomposition Divide the algorithm into balance tasks Accelerate each task  Data-level Decomposition GOP-level Parallelism Frame-level Parallelism Slice-level Parallelism Macroblock-level Parallelism 20

21 NCTU, EE, Vision Lab H.264 Encoder Block Diagram 21 F n (Current) TQReorder Entropy encode ME F’ n-1 (reference) MC Choose Intra prediction Intra prediction F’ n (reconstructed) T -1 Q -1 Filter + - Dn P Inter Intra + - D’n uF’n X NAL

22 NCTU, EE, Vision Lab H.264 Decoder Block Diagram 22 Reorder Entropy decode F’ n-1 (reference) MC Intra prediction F’ n (reconstructed) T -1 Q -1 Filter P Inter Intra + D’n uF’n - NAL

23 NCTU, EE, Vision Lab Task-level Decomposition  Task profile for H [2]

24 NCTU, EE, Vision Lab  H.264 data structure Parallelizing H.264 GOP0GOP1GOP2…GOPn F0F1F2Fn …. Slice 0 Slice 1 Slice 2 …. Slice 3 Video Sequence Group of picture MB0MB1 Frame Slice MB2…MBn Y Cb Cr Macroblock 24

25 NCTU, EE, Vision Lab Data-level Decomposition  GOP-level Parallelism  High latency, large memory  Frame-level Parallelism  I, P, B frame imbalance  Slice-level Parallelism  Bitrates increase  Macroblock-level Parallelism 25

26 NCTU, EE, Vision Lab Macroblock-level Parallelism  Spatial parallelism  Temporal parallelism  Spatial & temporal parallelism  Possible data dependencies for macroblock 26 Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Current MB frame i + 1 frame i search window

27 NCTU, EE, Vision Lab Macroblock-level Parallelism  Spatial parallelism 27 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T3 MB(1,1) T4 MB(2,1) T5 MB(3,1) T6 MB(4,1) T7 MB(0,2) T5 MB(1,2) T6 MB(2,2) T7 MB(3,2) T8 MB(4,2) T9 MB(0,3) T7 MB(1,3) T8 MB(2,3) T9 MB(3,3) T10 MB(4,3) T11 MB(0,4) T9 MB(1,4) T10 MB(2,4) T11 MB(3,4) T12 MB(4,4) T13 MBs processed MBs processing MBs to be process

28 NCTU, EE, Vision Lab Macroblock-level Parallelism  Temporal parallelism 28 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T6 MB(1,1) T7 MB(2,1) T8 MB(3,1) T9 MB(4,1) T10 MB(0,2) T11 MB(1,2) T12 MB(2,2) T13 MB(3,2) T14 MB(4,2) T15 MB(0,3) T16 MB(1,3) T17 MB(2,3) T18 MB(3,3) T19 MB(4,3) T20 MB(0,4) T21 MB(1,4) T22 MB(2,4) T23 MB(3,4) T24 MB(4,4) T25 MB(0,0) T1 MB(1,0) T2 MB(2,0) T13 MB(3,0) T14 MB(4,0) T15 MB(0,1) T16 MB(1,1) T17 MB(2,1) T18 MB(3,1) T19 MB(4,1) T20 MB(0,2) T21 MB(1,2) T22 MB(2,2) T23 MB(3,2) T24 MB(4,2) T25 MB(0,3) T26 MB(1,3) T27 MB(2,3) T28 MB(3,3) T29 MB(4,3) T30 MB(0,4) T31 MB(1,4) T32 MB(2,4) T33 MB(3,4) T34 MB(4,4) T35 frame i + 1 frame i MBs processed MBs processingMBs to be process

29 NCTU, EE, Vision Lab Macroblock-level Parallelism  Spatial & temporal parallelism 29 MB(0,0) T5 MB(1,0) T6 MB(2,0) T7 MB(3,0) T8 MB(4,0) T9 MB(0,1) T7 MB(1,1) T8 MB(2,1) T9 MB(3,1) T10 MB(4,1) T11 MB(0,2) T9 MB(1,2) T10 MB(2,2) T11 MB(3,2) T12 MB(4,2) T13 MB(0,3) T11 MB(1,3) T12 MB(2,3) T13 MB(3,3) T14 MB(4,3) T15 MB(0,4) T13 MB(1,4) T14 MB(2,4) T15 MB(3,4) T16 MB(4,4) T17 MB(0,0) T1 MB(1,0) T2 MB(2,0) T3 MB(3,0) T4 MB(4,0) T5 MB(0,1) T3 MB(1,1) T4 MB(2,1) T5 MB(3,1) T6 MB(4,1) T7 MB(0,2) T5 MB(1,2) T6 MB(2,2) T7 MB(3,2) T8 MB(4,2) T9 MB(0,3) T7 MB(1,3) T8 MB(2,3) T9 MB(3,3) T10 MB(4,3) T11 MB(0,4) T9 MB(1,4) T10 MB(2,4) T11 MB(3,4) T12 MB(4,4) T13 frame i + 1 frame i

30 NCTU, EE, Vision Lab  Multithreading of this system System Architecture Input task H.264 Encode Processing task TX networking task 30 Frame i Frame i+1 MB NAL Control task Rx

31 NCTU, EE, Vision Lab Memory Issue 31 L1P Cache Direct Mapped 16Kbytes Total DM642 DSP Core L1D Cache 2-way Set Associated 16Kbytes Total L2 Cache/ Memory 256Kbytes Total Two-level cache architecture of DM642 EDMA Controller peripherals  Limited memory of DM642  Use memory buffer to reduce memory access

32 NCTU, EE, Vision Lab Memory Issue  Memory hierarchy for inter prediction 32 Memory hierarchy [4]

33 NCTU, EE, Vision Lab Memory Issue  Slice memory buffer for intra prediction and deblocking filter Slice Memory [5] 33

34 NCTU, EE, Vision Lab Reference  [1] Texas Instruments, Incorporated “Reference Frameworks for eXpressDSP Software: RF5, An Extensive, High-Density System.” (spru795a)  [2] TC Chen, HC Fang, CJ Lian, CH Tsai “Algorithm analysis and architecture design for HDTV applications - a look at the H.264/AVC video compressor system “IEEE CIRCUITS & DEVICES MAGAZINE MAY/JUNE 2006  [3] Cor Meenderinck, Arnaldo Azevedo and Ben Juurlink “Parallel Scalability of Video Decoders” April 29,  [4] Denolf, K. De Vleeschouwer, et al,, “Memory centric design of an MPEG-4 video encoder”, IEEE Trans. CSVT, Vol. 15, No. 5, pp , May  [5] Tsu-Ming Liu et al., “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications,” ISSCC Digest of Technical Papers, pp , Feb  [6] T. Wiegand et al., “Overview of H.264/AVC Video Coding Standard”, IEEE Trans. on Circ. and Sys. For Video Technology, Vol. 13, No. 7, pp. 560–576, July  [7] VITEC MULTIMEDIA, “MEX User manual Revision 1.7”. 34


Download ppt "NCTU, EE, Vision Lab Implementation and Parallelization of H.264 Based System on Multi-DSPs Board  陳奕安  2008.06.11 1."

Similar presentations


Ads by Google