Agenda Video/Image/Graphics System in iMX6

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Computer Architecture
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
System Integration and Performance
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
GStreamer as multimedia framework in Android: a new alternative.
Embedded Streaming Media with GStreamer and BeagleBoard ESC-228 Presented by Santiago Nunez santiago.nunez (at) ridgerun.com.
Media Player for the i.MX31 Advanced Embedded Systems Architecture Class Project May 14, 2011 Rafael Castro Ryan Ugland Carlos Cabral.
Multimedia with VPU/IPU HW acceleration in Android
StreamBlade SOE TM Initial StreamBlade TM Stream Offload Engine (SOE) Single Board Computer SOE-4-PCI Rev 1.2.
Programmable Interval Timer
H.264 Intra Frame Coder System Design Özgür Taşdizen Microelectronics Program at Sabanci University 4/8/2005.
Basics of MPEG Picture sizes: up to 4095 x 4095 Most algorithms are for the CCIR 601 format for video frames Y-Cb-Cr color space NTSC: 525 lines per frame.
Design center Vienna Donau-City-Str. 1 A-1220 Vienna Vers SVEN Scalable Video Engine Gerald Krottendorfer.
-1/20- MPEG 4, H.264 Compression Standards Presented by Dukhyun Chang
Internet Video By Mo Li. Video over the Internet Introduction Video & Internet: the problems Solutions & Technologies in use Discussion.
H.264/AVC Baseline Profile Decoder Complexity Analysis Michael Horowitz, Anthony Joch, Faouzi Kossentini, and Antti Hallapuro IEEE TRANSACTIONS ON CIRCUITS.
Conversion Between Video Compression Protocols Performed by: Dmitry Sezganov, Vitaly Spector Instructor: Stas Lapchev, Artyom Borzin Cooperated with:
ATSC Digital Television
1 Input/Output Chapter 3 TOPICS Principles of I/O hardware Principles of I/O software I/O software layers Disks Clocks Reference: Operating Systems Design.
Statistical Multiplexer of VBR video streams By Ofer Hadar Statistical Multiplexer of VBR video streams By Ofer Hadar.
Multicore Design Considerations. Multicore: The Forefront of Computing Technology “We’re not going to have faster processors. Instead, making software.
EEL 6935 Embedded Systems Long Presentation 2 Group Member: Qin Chen, Xiang Mao 4/2/20101.
An Introduction to H.264/AVC and 3D Video Coding.
MPEG-2 Digital Video Coding Standard
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Samsung Poland R&D Center © Samsung Electronics Co., LTD S/W Platform Team | Ver.DateDescriptionAuthorReviewer /09/18Initial VersionMarek.
USB host for web camera connection
H.264 Deblocking Filter Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
Embedded Streaming Media with GStreamer and BeagleBoard
Profiles and levelstMyn1 Profiles and levels MPEG-2 is intended to be generic, supporting a diverse range of applications Different algorithmic elements.
A DSP-Based Platform for Wireless Video Compression Patrick Murphy, Vinay Bharadwaj, Erik Welsh & J. Patrick Frantz Rice University November 18, 2002.
1 Lecture 20: I/O n I/O hardware n I/O structure n communication with controllers n device interrupts n device drivers n streams.
Samsung ARM S3C4510B Product overview System manager
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
ScreenPlay Director Training By Erik Collett
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
SPCA554A Mobile Camera Multimedia Processor By Harrison Tsou.
Adaptive Multi-path Prediction for Error Resilient H.264 Coding Xiaosong Zhou, C.-C. Jay Kuo University of Southern California Multimedia Signal Processing.
242/102/49 0/51/59 181/172/166 Primary colors 248/152/29 PMS 172 PMS 137 PMS 546 PMS /206/ /227/ /129/123 Secondary colors 114/181/204.
MULTIMEDIA INPUT / OUTPUT TECHNOLOGIES
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
UNDER THE GUIDANCE DR. K. R. RAO SUBMITTED BY SHAHEER AHMED ID : Encoding H.264 by Thread Level Parallelism.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
MOTOROLA and the Stylized M Logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
1 Modular Refinement of H.264 Kermin Fleming. 2 What is H.264? Mobile Devices Low bit-rate Video Decoder –Follow on to MPEG-2 and H.26x Operates on pixel.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
PRESENTED BY: MOHAMAD HAMMAM ALSAFRJALANI UFL ECE Dept. 3/31/2010 UFL ECE Dept 1 CACHE OPTIMIZATION FOR AN EMBEDDED MPEG-4 VIDEO DECODER.
Introduction to MPEG Video Coding Dr. S. M. N. Arosha Senanayake, Senior Member/IEEE Associate Professor in Artificial Intelligence Room No: M2.06
Development of a Bluetooth based web camera module.
242/102/49 0/51/59 181/172/166 Primary colors 248/152/29 PMS 172 PMS 137 PMS 546 PMS /206/ /227/ /129/123 Secondary colors 114/181/204.
DaVinci Overview (features and programming) Kim dong hyouk.
Introduction to H.264 / AVC Video Coding Standard Multimedia Systems Sharif University of Technology November 2008.
Software and Communication Driver, for Multimedia analyzing tools on the CEVA-X Platform. June 2007 Arik Caspi Eyal Gabay.
Lab 4 HW/SW Compression and Decompression of Captured Image
Video Compression - MPEG
Progress Report Chester Liu 2013/11/29.
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
AT91RM9200 Boot strategies This training module describes the boot strategies on the AT91RM9200 including the internal Boot ROM and the U-Boot program.
ENEE 631 Project Video Codec and Shot Segmentation
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
I/O Systems I/O Hardware Application I/O Interface
What Choices Make A Killer Video Processor Architecture?
Chapter 13: I/O Systems.
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

Video/Image Codec and Data Pipeline AMF-CON-T0165 Jones He| Multimedia System & Architecture Mar, 2015

Agenda Video/Image/Graphics System in iMX6 VPU Performance/Capability Overview Measured Performance VPU Architecture Overview VPU Programmable Engine VPU Decoder VPU Encoder Tile Format Support JPEG Processing Unit (JPU) VPU Software Structure Stereo 3D VPU with Multimedia Framework VPU encode/decode with IPU Pre-/Post-process Use-case Demos and Bandwidth Profiling Tool

Video/Graphics System in i.MX6 Dual/Quad (D/Q) Outline Video/Graphics Subsystem in i.MX6 D/Q VPU-IPU-GPU Dataflow

Video/Graphics Subsystem in i.MX6 D/Q i.MX6 Dual/Quad Data Control ARM CPU GPUs Video Sources Memory Interface IPUs External Memories Bridges DCICs VDOA Displays VPU IRAM

VPU-IPU-GPU – Dataflow Image Sensor Display Image Processing Unit (IPU) Display Enhancement Video/Graphics Combining Image Conversions Camera Preview Image Conversions VDOA Frame Buffers Frame Buffers Frame Buffers Frame Buffers Frame Buffers Frame Buffers Video Processing Unit Graphic Processing Unit (GPU) and ARM 2D/3D Graphics Generation Compression Decompression with post processing Memory Combining with Audio Separation from Audio Communication Network HW Accelerated Audio Compression Decompression

VPU Performance/Capability Overview Outline Performance & capability for decoder Performance & capability for encoder and simultaneous encode/decode Performance & capability for multi-streams (decodes, encodes) Performance & capability for transcoding i.MX6x VPU vs i.MX53 VPU

Performance Overview—iMX6Q/D (Decoder) Standard Profile Performance (DDR=532MHz) (VPU=266MHz if not specified) Bitrate HW Decoder MPEG-2 Main-High 2D 1080i/p+720p@30fps, or 2x 1080p@24fps, or 2x 1080p@30fps (VPU=352MHz) 50Mbps H.264 BP/MP/HP-L4.1 VC1 SP/MP/AP-L3 45Mbps MPEG4 SP/ASP 40Mbps DivX/XviD 3/4/5/6 AVS Jizhun H.263 P0/P3 1080p+720p@30fps, or 20Mbps RV10 8/9/10 Sorenson -- MJPEG Baseline 8k x 8k 120Mpel/s On2 VP8 1080p@30fps (VPU =352MHz) H.264-MVC for 3D (FW/HW) H.264-MVC SHP 3D 720p@30fps each view 1080i/p@24fps each view 1080p@30fps (VPU=352MHz) Simulcast for 3D Two independent streams 720p@60fps (30fps each view) Frame-packing for 3D Combine two frames into one 1080p@30fps decode  1080p@60fps playback (30fps each view) HW Post-proc rotation, mirror, deblocking/deringing

Performance Overview—iMX6DL/S (Decoder) Standard Profile Performance (DDR=400MHz) (VPU=266MHz if not specified) Bitrate HW Decoder MPEG-2 Main-High 2D 1080i/p+D1@30fps, 1080i/p+720p@30fps (content depend.) Dual 1080p @24fps, (Content depend.) 50Mbps H.264 BP/MP/HP-L4.1 VC1 SP/MP/AP-L3 45Mbps MPEG4 SP/ASP 40Mbps DivX/XviD 3/4/5/6 AVS Jizhun H.263 P0/P3 1080p+D1@30fps, 1080p+720p@30fps, (content depend.) Dual 1080p @24fps, (content depend.) 20Mbps RV10 8/9/10 Sorenson -- MJPEG Baseline 8k x 8k 120Mpel/s On2 VP8 1080p@30fps H.264-MVC for 3D (FW/HW) H.264-MVC SHP 3D 720p@30fps each view 1080i/p@24fps each view (content depend.) Simulcast for 3D Two independent streams 720p@60fps (30fps each view) Frame-packing for 3D Combine two frames into one 1080p@30fps decode  1080p@60fps playback (30fps each view) HW Post-proc rotation, mirror, deblocking/deringing

Performance Overview—iMX6Q/D (Encoder & Full-duplex) Standard Profile Performance (VPU=266MHz if not specified) Bitrate HW Encoder H.264 BP 2D 1080p@30fps 720p@60fps 20Mbps MJPEG Baseline 8k x 8k 160Mpel/s MPEG4 Simple 720p@30fps (1080p@30fps is doable) 15Mbps H.263 P0/P3 H.264-MVC for 3D Stereo HP (no interview prediction) 3D 1080p@48fps (24fps/view, VPU=352MHz) Simulcast for 3D All VPU encoder supported profiles Frame-packing 1080p@30fps encoding  1080p@60fps capture (30fps for each view) Full-duplex HW Codec 720p@30fps 1080p@24fps 1080p@30fps (VPU = 352MHz)

Performance Overview—i.MX6Q/D (Multi-streams) HW Decoder Standard Profile Max # Streams @ 30fps D1@ 30fps 720p@ 1080p@ 24fps H.264 BP/MP/HP 8 3 2 2 (VPU >300MHz) On2 VP8 -- 4 1 VC1 SP/MP/AP 2 (VPU=352MHz) MPEG4 SP/ASP H.263 P0/P3 HW Encoder BP 6 MPEG4-SP/H.263 MPEG4-SP H.263-P0/P3 1* 2* (VPU=352MHz) *: MPEG4/H263 1080fps is doable in HW but may not enabled in SW

Performance Overview—iMX6Q/D (Transcoding) Source Resolution (decoded streaming) Max # Streams @ 30fps Target Resolution (encoded streaming) SD (720x480) HD720p (1280x720) HD1080p@24fps (1920x1080) HD1080p@30fps 4 -- 2 2 (24fps, VPU=266MHz) 2 (30fps, VPU=352MHz) HD1080p 1 ( VPU = 352MHz)

i.MX6x VPU vs i.MX53 VPU iMX53 Enhancements iMX6x Clock rate 200MHz 266MHz (Will change the VPU spec to 350MHz) Video Decoder Perf. 1080i/p@30fps, No 3D support 1080i/p@60fps, 3D support at 30fps per view Video Encoder Perf. 720p@30fps 1080p@30fps VP8 decoder No Supported AVS decoder Theora decoder Partial HW support (no plan to enable it yet) JPEG Decoder Performance 40MPel/sec at YUV444 format (400, 420, 422, 444) 120MPel/sec at YUV444 format (400, 420, 422, 444) JPEG Encoder Performance 80MPel/sec at YUV422 format (400, 420, 422) 160MPel/sec at YUV444 format (400, 420, 422, 444) Decoding of H.264-MVC S3D and other S3D streams Supported at 1080i/p@30fps for each view Encoding of H.264 MVC S3D video Support ed at 720p@30fps for each view (no interview prediction) Tiled format Supported (for bandwidth reduction) 2D cache Cache used for bandwidth reduction for encoder (motion estimation) and decoder (motion compensation)

Measured Performance Outline Measured performance for video playback Measured performance for video encoder Measured performance for transcoding

Measured Performance (Decoder) Video clips Video content complexity Measured perf. at 264MHz (linear format) Measured perf. at 264MHz (tiled format) Measured perf. at 352MHz (tiled format) Sunflower (self-generated) H264-HP, 40Mbps, 1080p Effective Dec : 41fps Actual Display: 40fps Effective Dec : 57fps Actual Display: 56fps Effective Dec : 68fps Actual Display: 60fps A Blu-ray clip H264-HP, 37Mbps, 1080p Effective Dec : 56 fps Actual Display : 55fps Effective Dec : 59fps Actual Display : 59fps Effective Dec : 74ps Actual Display : 60fps Avatar (Youtube) H264-HP, 3.5Mbps, 1080p Effective Dec: 77fps Effective Dec: 80fps Effective Dec: 100fps Sherlock (A movie Trailer) H264-HP, 11Mbps, 1080p Effective Dec: 64fps Actual Display: 59fps Effective Dec: 70fps Effective Dec: 86fps A Freescale demo clip H264-HP, 10Mbps, 1080p Effective Dec: 73fps Effective Dec: 89fps Parkrun (A 1080i test clip) H264-HP, 20Mbps, 1080i Effective Dec: 38fps Actual Display: 37fps Effective Dec: 52fps Actual Display: 45fps Effective Dec: Actual Display: Note: Performance measured in VPU unit test (without multimedia framework). SabreSD board Tile format used VPU = 264 MHz, 352MHz DDR = 532 MHz Performance for Interlaced video not measured yet

Measured Busload (VPU Decoder + 1080p Display) Video clips Video content complexity Measured bandwidth at 264MHz (linear format) at 30fps display rate Measured bandwidth at 264MHz (tiled format) at 30fps display rate Sunflower (self-generated Blu-ray quality clip) H264-HP, 40Mbps, 1080p Total bus load: ~77% (53 dec fps) Bus utilization efficiency: ~17.5% Total bus load: ~67% (59 dec fps) Bus utilization efficiency: ~19% A Blu-ray quality clip H264-HP, 37Mbps, 1080p Total bus load: ~66% (56 dec fps) Bus utilization efficiency: ~17% Total bus load:~65% (60 dec fps) Bus utilization efficiency: 19% Avatar (Youtube) H264-HP, 3.5Mbps, 1080p Total bus load: ~55% (76 dec fps) Bus utilization efficiency: ~16% Total bus load:~62% (80 dec fps) Bus utilization efficiency: 18% Sherlock (A movie Trailer) H264-HP, 11Mbps, 1080p Total bus load: ~60% (60 dec fps) Total bus load:~62% (69 dec fps) Bus utilization efficiency: ~18% A Freescale demo clip H264-HP, 10Mbps, 1080p Total bus load: ~58% (66 dec fps) Total bus load: ~63% (74 dec fps) Parkrun (A 1080i test clip) H264-HP, 20Mbps, 1080i Total bus load: ~83% (57 dec fps) Total bus load: ~80% (57 dec fps) Note: Performance measured in VPU unit test (without multimedia framework) SabreSD board Tile format used VPU = 264 MHz DDR = 532 MHz Performance for Interlaced video not measured yet

Measured Performance (Encoder) Video clips (250 frames) Bitrate Maximum measured perf. Riverbed H264-BP, 32Mbps, 1080p@30fps 40fps (VPU=264MHz) 50fps (VPU=352MHz) H264-BP, 20Mbps, 1080p@30fps 41fps (VPU=264MHz) 51fps (VPU=352MHz) H264-BP, 10Mbps, 1080p@30fps 42fps (VPU=264MHz) 52fps (VPU=352MHz) Note: Performance measured in VPU unit test (without multimedia framework). SabreSD board Tile format used VPU = 264 MHz, 352MHz DDR = 532 MHz YUV source data in file in SD card (file reading time not count in performance)

Measured Performance and Busload (Transcoding) Transcoding description Maximum measured performance Observed bus-load with display MPEG2H264 Tiled format; Decode 1080p MPEG2 video, and display it via HDMI without resizing, and meanwhile re-encode to H264-BP, 1080p. 27fps (VPU=266MHz) 35fps (VPU=352MHz) ~72% (with ~17% for GUI and others) ~78% (with ~17% for GUI and others) Linear format; Decode 1080p MPEG2 video, and display it via HDMI with resizing to 720p, and meanwhile re-encode to H264-BP, 720p. 22fps (VPU=266MHz) 24fps (VPU=352MHz) Linear format ~82% (with ~17% for GUI and others) ~84% (with ~17% for GUI and others) -Not too much improvement for VPU=352MHz MPEG4H264 VC1H264 Decode 720p MPEG4/VC1, etc, and display it via HDMI without resizing, and meanwhile re-encode to H264-BP, 720p, in <20Mbps >=50fps(VPU=266MHz) >= 60fps(VPU=352MHz) Note: Performance measured in VPU unit test (without multimedia framework). SabreSD board Linear format used (tile format not ready for transcoding) VPU = 264 MHz, 352MHz DDR = 532 MHz Measured in linear & tiled format

VPU Architecture Overview Outline Video coding standard algorithm and process Architecture in top-level view Architecture in software view VPU-IPU interface

Top-view of video encoding/decoding algorithm (An H.264 example) (b) decoder Transform & Quantization Motion Estimation Picture Buffering Compensation Entropy Coding Intra Prediction Intra/Inter Mode Decision Inverse Quantization & Inverse Transform Deblocking Filtering + - Video Input Bitstream Output (a) encoder Decoding Selection Input Video

VPU Architecture Overview (Top-level view) Freescale drives this IP development in terms of roadmap, API’s and Firmware. This is our 6th instantiation of this IP from our vendor (mature technology). Very flexible solution that allows us to customize the feature set of each product to the market requirements without compromising power or die size. Freescale works closely with our VPU vendor to optimally integrate VPU into all i.MX devices . Host processor BIT Proc. Core 6KB Data Mem 20KB Prog Mem On-chip RAM Secondary AXI Decoding process Encoding Pre- & Post-proc APB3 interface Enc, dec, and post process Bitstream parser & control VPU External memory Primary AXI Flexible and optimized area-power accelerator architecture Embedded DSP core providing a certain level of flexibility & programmability (e.g., be able to support H.264- MVC-3D by only firmware change and currently being done in i.MX6x VPU) Shared logic & SRAM for encoder and decoder for optimizing area and minimizing power with clock gating (vs separate enc and dec HW in other vendors ) On-chip RAM with secondary AXI option for reducing memory bandwidth (makes it competitive in performance).

VPU Architecture Overview (SW view) SDRAM Firmware on VPU Host Application Shared Buffer ( Bit - stream Buffers , Frame Buffers etc .) VPU Host I / F Reg D a t C M R S P V U API ’ s VPU Buffer Work Buffer u Code Buffers . Parameter Buffers API Calls with Args Return Codes with Output Info INTERRUPT System Manager Codec Library

VPU-IPU Interface Video Processing Unit Image Processing Unit (IPU) Image Sensor Display Audio Combining with Audio Image Processing Unit (IPU) Camera Preview Memory Communication Network Separation from Audio Conversions Compression Decompression Display Enhancement with post processing Video/Graphics Graphics Generation Video Processing Unit Frame Buffers VDOA HW Accelerated

VPU Programmable Engine Outline Features of the embedded DSP Examples of programmability & capability Programmability is limited to video slice level or above Not encouraged down to macroblock-level for programmability except for some particular reasons. Note: Normally, Freescale does not provide VPU firmware source code to customers. Freescale can work with VPU vendor for implementing the customer’s needs if necessary.

VPU Programmable Engine & Features A highly optimized DSP processor for handling bit streams in various video codecs Supports special instructions for bitstream packing/unpacking with variable length code and Exp- Golomb code Supports program memory up to 128KB address space (20KB in i.MX6x) Supports data memory up to 128KB address space (6KB in i.MX6x)

VPU Programmable Engine Capability & Example Capability of VPU Programmable Engine In general, programmable on slice level and above (e.g., MVC codec implementation) Specifically, programmability can be extended to macroblock-level for some codecs, e.g., VP8, macroblock-level encoder rate control, etc Examples implemented by only firmware change: Implemented MVC-Stereo High Profile for both encoder and decoder Enhanced encoder rate control for achieving better visual quality Enhanced error handling capability for robust streaming and video playback

VPU Decoder Outline Decoder pipeline Decoder API and process flow Decoding operation steps Major differences between i.MX6x VPU and i.MX5x VPU

Video Decoder Process Pipeline (example of H264/RV/AVS/VP8 decoder) Parsing bitstream for macroblocks Loading reference data for inter-prediction Generating inter-predicted macroblock Intra-prediction & Reconstructing macroblocks Deblocking filtering Writing decoded macroblocks Inverse transform/quantization

VPU driver API (decoder) vpu_init() vpu_DecOpen() vpu_DecUpdateStreamBuffer() vpu_DecSetEscSeqInit(handle, 1) vpu_DecGetInitialInfo( ) Vpu_DecSetEscSeqInit(handle, 0) vpu_DecRegisterFrameBuffer() vpu_DecStartOneFrame( ) Vpu_IsBusy() No vpu_DecGetOutputInfo( ) Decode_finished Process output picture vpu_DecClrDisp( ) Exit vpu_DecClose() vpu_UnInit End vpu_DecGetBitStreamBuf( ) Lack of bitstream Fill bitstream ring buffer End of stream vpu_decUpdateBitstreamBuffer (handle, size) vpu_decUpdateBitstreamBuffer (handle, 0) Yes Example VPU Decode Flow for a single instance vpu_Init() vpu_Deinit() vpu_IsBusy(), jpu_IsBusy() vpu_SleepWake() vpu_GetVersionInfo(); vpu_WaitForInt() vpu_SWReset() IOGetPhyMem(),IOFreePhyMem() IOGetVirtMem(), IOFreeVirtMem() IOGetIramBase() vpu_DecOpen(); vpu_DecClose(); vpu_DecSetEscSeqInit(); vpu_DecGetInitialInfo(); vpu_DecRegisterFrameBuffer(); vpu_DecGetBitstreamBuffer(); vpu_DecUpdateBitstreamBuffer(); vpu_DecStartOneFrame(); vpu_DecGetOutputInfo(); vpu_DecBitBufferFlush(); vpu_DecGiveCommand() vpu_DecClrDispFlag()

Decoder Operation Steps Call vpu_Init() to initialize the VPU Open a decoder instance using vpu_DecOpen() To provide the proper amount of bitstream, get the bitstream buffer address using vpu_DecGetBitstreamBuffer() After transferring the decoder input stream, inform the amount of bits transferred into the bitstream buffer using vpu_DecUpdateBitstreamBuffer() Before starting a picture decoder operation, get the crucial parameters for decoder operations such as picture size, frame rate, required frame buffer size using vpu_DecGetInitialInfo() Using the returned frame buffer requirement, allocate the proper size of the frame buffers and convey this data to the i.MX 6Dual/Quad VPU using vpu_DecRegisterFrameBuffer() Start a picture decoder operation picture-by-picture using vpu_DecStartOneFrame() Wait for the completion of the picture decoder operation interrupt event Check the results of the decoder operation using vpu_DecGetOutputInfo() After displaying nth frame buffer, clear the buffer display flag using vpu_DecClrDispFlag() If there is more bitstream to decode, go to Step 7, otherwise go to the next step Terminate the sequence operation by closing the instance using vpu_DecClose() Call vpu_UnInit() to release the system resources

Major API differences from iMX5x VPU “Streaming mode with prescan” in i.MX5x VPU is replaced by “rollback” mode in i.MX6x VPU. Reason: Simplify the firmware Improve the performance Add “MvcPicInfo” for S3D in DecOutputInfo typedef struct { int viewIdxDisplay; //view index of display frame buffer int viewIdxDecoded; //view index of decoded frame buffer } MvcPicInfo Add “AvcFpaSei” for frame-packing for S3D in DecOutputInfo. …… unsigned frame_packing_arrangement_id;; unsigned frame_packing_arrangement_type unsigned frame_packing_arrangement_repetition_period; } AvcFpaSei

VPU Encoder Outline Encoder pipeline Encoder API and process flow Encoder operation steps Encoder visual quality Encoder rate control concept Encoder configuration example

Video Process Flow (example pipeline of H264 encoder) Loading a source MB to encode Loading reference data Motion estimation (1st step) Loading reference data (luma only) Motion estimation (2nd step) Motion estimation (3rd step) Loading reference data (Chroma only) Motion Compensation Intra/Inter-prediction for Q/T Entropy Coding (bitstream packing) Intra/Inter-prediction for inv. Q/T Write-back unfiltered data Deblock filtering Write-back data Intra mode decision (1st step) Intra mode decision (2nd step) Reconstruction path

VPU driver API (encoder) vpu_Init() vpu_Deinit() vpu_IsBusy(), jpu_IsBusy() vpu_SleepWake() vpu_GetVersionInfo() vpu_WaitForInt() vpu_SWReset() IOGetPhyMem(),IOFreePhyMem() IOGetVirtMem(), IOFreeVirtMem() IOGetIramBase() vpu_EncOpen(); vpu_EncClose(); vpu_EncGetInitialInfo(); vpu_EncRegisterFrameBuffer() vpu_EncGetBitstreamBuffer(); vpu_EncUpdateBitstreamBuffer(); vpu_EncStartOneFrame(); vpu_EncGetOutputInfo(); vpu_EncGiveCommand(); vpu_init() vpu_EncOpen() vpu_EncGetInitialInfo( ) vpu_DecRegisterFrameBuffer() vpu_EncGiveCommand( ) Lack of source data ? No Copy source data to source frame buffer vpu_EncGetOutputInfo() Exit vpu_EncClose() vpu_UnInit End Yes Example VPU Encode Flow for a single instance vpu_EncStartOneFrame( ) vpu_IsBusy() ? Process output data

Encoder Operation Steps Call vpu_Init() to initialize the VPU Open an encoder instance using vpu_EncOpen() Before starting a picture encoder operation, get crucial parameters for encoder operations such as required frame buffer size using vpu_EncGetInitialInfo() Using the returned frame buffer requirement, allocate size of frame buffers and convey this information to the VPU using vpu_EncRegisterFrameBuffer() Generate high-level header syntaxes using vpu_EncGiveCommand() Start picture encoder operation picture-by-picture using vpu_EncStartOneFrame() Wait the completion of picture encoder operation interrupt event After encoding a frame is complete, check the results of encoder operation using vpu_EncGetOutputInfo() If there are more frames to encode, go to Step 4, otherwise go to the next step Terminate the sequence operation by closing the instance using vpu_EncClose() Call vpu_UnInit() to release the system resources

i.MX6x VPU encoder—Visual quality At the same frame rate, bitrate, and resolution, the visual quality of i.MX6x VPU has similar visual quality to the i.MX5x VPU. Visual quality can be measured as: Objective such as PSNR, SSIM, etc Subjective Encoder visual quality is determined by: Rate control algorithm for CBR Prediction algorithms Entropy coding methods Encoder configurations Visual quality and rate control accuracy can be improved by fine-tuning the VPU encoder rate control algorithm and parameters in firmware.

i.MX6x VPU encoder— Rate control basic concept, leaking bucket Leaking in constant rate Pouring in variable rate (adjusted by QP) Buffer level QP bits Visual quality Buffer level

i.MX6x VPU encoder— Encoder configuration example //General setting vpu_enc->width=704 // picture width vpu_enc->height=480 // picture height vpu_enc->tgt_framerate=30 // frame rate avc_constrainedIntraPredFlag=0 // constrained_intra_pred_flag encOP->gopSize = vpu_enc->gopsize //e.g., 30, GOP picture number (0 : only first I, 1 : all I, 3 : I,P,P,I,) //DEBLKING FILTER avc_disableDeblk = 0 // disable_deblk (0 : enable, 1 : disable, 2 : disable at slice boundary) avc_deblkFilterOffsetAlpha = 0 // deblk_filter_offset_alpha (-6 ~ 6) avc_deblkFilterOffsetBeta = 0 // deblk_filter_offset_beta (-6 ~ 6) avc_chromaQpOffset = 0 // chroma_qp_offset (-12 ~ 12) //SLICE STRUCTURE slicemode.sliceMode = 0 // slice mode (0 : one slice, 1 : multiple slice) slicemode.sliceSizeMode = 0 // slice size mode (0 : slice bit number, 1 : slice mb number) slicemode.sliceSize = 0 // slice size number (bit count or mb number) //RATE CONTROL vpu_enc->bitrate=1024 //bit rate in kbps (ignored if rate control disable) vpu_enc->encOP->initialDelay=0 // delay in ms (initial decoder buffer delay) (0 : ignore) vpu_enc->encOP->vbvBufferSize=0 // VBV buffer size in bits (0 : ignore) vpu_enc->encOP->rcIntraQp=40 // rcIntraQp, qp value for constant intra frame QP function, vpu_enc->max_qp=45 // userQpMax, maximum qp (13 ~ 51) vpu_enc->min_qp =10 // userQpMin, vpu_enc->gamma=0 // encOP->userGamma, gamma value in RC (0 ~ 0.99999) x 32768 vpu_enc->rc_interval_mode=0 // rate control interval mode (0 - default mode, 1 - frame based, 2 - slice based, 3 - MB interval) Vpu_enc->rc_mb_interval=100 // rate control interval, This value is only valid when mode is 3 // ERROR RESILIENCE vpu_enc->intraRefresh=0 // Intra MB Refresh (0 - None, 1 ~ MbNum-1 //Intra mode selection vpu_enc->intra16x16_mode_only // For enabling or disaling Intra4x4 mode

Tile format support Outline Tiled format concept and benefits Tiled format handling in Video Data Order Adapter (VDOA)

VPU tiled format support 15 1 14 16 30 17 31 47 46 33 32 Linear format storage 15 1 14 16 30 17 31 47 46 33 32 Tiled format storage 15 1 14 16 30 17 31 47 46 33 32 Why tiled format: Much more efficient DDR access i.MX6x VPU supports the following tile format: Linear map (Type 0 ) Tiled macroblock raster frame map (Type 1 ) Tiled macroblock raster field map (Type 2 )

Tiled format handling in Video Data Order Adapter (VDOA) The decoded data from VPU is stored in system memory in tiled format (each tile is 16x16 macroblock) VDOA converts the tiled data (16x16 tile) into raster-scan format VDOA outputs the raster-scan format data to IPU for display 15 1 14 16 30 17 31 47 46 33 32 Tiled format storage 15 1 14 16 30 17 31 47 46 33 32 Linear format storage VDOA

JPEG Processing Unit (JPU) Outline JPU overview and facts JPU process and pipeline

JPEG Processing Unit (JPU) An enhanced JPEG codec with higher performance compared to i.MX5x VPU A separate JPU hardware module without firmware control Decoding up to 120Mpixels/sec at YUV444 format Support YUV4:0:0, 4:2:0, 4:2:2, and 4:4:4 formats Performance will be doubled for the input format of YUV4:2:0 Encoding up to 160Mpixels/sec at YUV444 format JPEG API consideration Linux libjpeg compatible API i.MX5x VPU compatible API

JPEG Processing Unit (JPU)—encoder example Raw image DMA JPEG decoding process is the inverse of encoding process Pre/Post processor (1) Read multiple MCUs for 4:2:0, 4:2:2 and 4:4:4 format (2) Quantized Coefficient memory generate non-zero flag T/Q Quantized Coeff. memory VLC BitStream DMA Getting YUV image data Pre-processing Transform & Quantization

VPU Software Structure The VPU software can be divided into two parts: 1) Kernel driver: takes responsibility for system control and reserving resources(memory/IRQ). It provides an IOCTL interface for the application layer in user-space as a path to access system resources. 2) User space library: the application in user-space calls related IOCTLs and codec library functions to implement a complex codec system. VPU library (e.g., libvpu.so) is located in: /usr/lib/ VPU firmware binary (e.g., vpu_fw_imx6q.bin) is located in: /lib/firmware/vpu/

Source Code Structure (Kernel Driver) The table below lists the kernel space source files available in the following directories: <yocto_dir>/linux/drivers/mxc/vpu/

Source Code Structure (User space) The table below lists the user space library source files available in the following directory: <yocto_dir>/imx-vpu/driver-version/imx-vpu-driver-version/vpu

Stereo 3D Outline S3D coding methods Simulcast method Combined Frame (Frame Packing, or Frame compatible) H.264-MVC S3D

Stereo 3D Coding Methods Name Coding method Simulcast Method Left view and right view coded separately in a simulcast way Frame-packing Method Combination of two views into one frame in various frame packing methods MPEG-2 Multiview profile using temporal L/R interleaving for stereo video H.264 Stereo SEI message and Frame Packing Arrangement SEI message allow various methods of L/R packing (Frame Compatible S3D) Temporal interleaving spatial row/column, spatial side-by-side, Spatial up-and-bottom, checkerboard (quincunx), H.264-MVC S3D Method Coded in H.264-MVC Stereo High Profile with base view and enhanced view, with the exploitation of interview prediction

Simulcast Method Left view video input Encode Decode Left view output Bitstream left and right view Channel Right view video input Encode Decode Right view output

Frame-Packing Method top-bottom row-interleaving side-by-side column-interleaving Temporal interleaving checkerboard t 1st view 2nd view

H.264-MVC-S3D Method S3D can be generated using only two views, S0 and S1

VPU with Multimedia Framework Outline Multimedia Framework Supported Multimedia Format

Video playback using VPU VPU driver Multimedia framework (Gstreamer, Dshow) source plug-in parser audio decoder video sink pipeline architecture Other multimedia frameworks (OpenCORE/stageFright, Flash10, etc) OpenMax-IL VPU HW Video players (Totem, WMP) (Samsung Muon, Android Gallery)

Supported Streaming Containers MP4: Playback: MPEG4, H.264, H.263 Capture, MPEG4, H.264 AVI: Playback: MPEG4, Divx/Xvid, H.264, WMV/VC1 Capture: MPEG4, H.264 MPEG2-TS: Playback: MPEG2, H.264, VC1 MPEG2-PS: Playback: MPEG2, H.264, MPEG4, AVS FLV: Playback: H.264, Sorenson, VP6 ASF: Playback: WMV/VC1 WebM: Playback: VP8 RMVB: Playback: RV8/9/10 Matroska (MKV): Playback: MPEG4, Divx/Xvid, H.264, WMV/VC1 3GP: Playback: MPEG4, H.264 Ogg: Playback: Theora

Streaming Protocol Support File format Supported OS HTTP .mp4/.3gp/.mov, .flv/ .f4v, .avi, .wmv/.asf, .mpg/.vob/.ts, .mp3, .aac, .wma, .mkv Android Linux RTSP .mp4 HTTPLive .m3u8 RTP .ts UDP

Plug-in Support Summary Audio Decoder beepdec: unified audio decoder plugin Supports MP3, AAC, AAC+, WMA, AC3, Vorbis, DD+, AMR Audio Encoder imxmp3enc : MP3 encoder plugin Video Decoder vpudec: VPU-based video decoder plugin Software video deocder plugins: use gst-libav plugins Video Encoder Vpuenc : VPU-base video encoder plugin Demux aiurdemux: aiur universal demuxer plugin supporting Supports AVI, MKV, MP4, MPEG2, ASF, OGG, FLV, WebM Video Render imxv4l2sink: v4l2 based video sink plugin overlaysink : g2d based video sink plugin Video Source imxv4l2src: v4l2 based camera /TVin source plugin AVB streaming avbpcmsrc : avb audio source, de-packetize LPCM data from AVTP package, follow IEC 61883-6. avbpcmsink : avb audio sink, packetize LPCM data into AVTP package, follow IEC 61883-6 avbmpegtssrc : avb video source, de-packetize compressed video TS package from AVTP package, follow IEC 61883-4 avbmpegtssink : avb video sink, Packetize compressed video TS package into AVTP package, follow IEC 61883-4 NOTE: To support WMA, WMV, AAC+, AC3, DD+ decoding and WMA encoding needs to install special and excluded packages. vpudec plugins are only for SoCs with VPU hardware.

VPU encode/decode with IPU pre-/post process Outline The Display Ports In i.MX6 D/Q Max Display Port Resolutions The Video Input Ports In i.MX6 D/Q IPU Internal Structure and Process Flow

The Display Ports In i.MX6 D/Q Displays LVDS Bridge i.MX6Q/D IPU #0 split MIPI/DSI LVDS 1 LVDS 0 Parallel 0 DI00 DI01 HDMI Transmitter Parallel 1 IPU #1 DI10 DI11 In1 In0 DCIC #1 mux DCIC #0 Six ports Two parallel - driven directly by the IPU Two LVDS channels - driven by the LVDS bridge One HDMI – driven by the HDMI transmitter One MIPI/DSI – driven by the MIPI/DSI transmitter Four simultaneous outputs Each IPU has two display ports (DI0 and DI1) Therefore, only up to four external ports can be active at any given time. Additional asynchronous data flows can be sent through the parallel ports and the MIPI/DSI port

Max Display Port Resolutions MIPI DSI, 2 lanes WXGA (1366 x 768) or 720p (1280 x 720) RGB Port 1 – 4XGA (2048 x 1536) Port 2 – 4XGA (2048 x 1536) LVDS Single channel – WXGA (1366 x 768) or 720p (1280 x 720) Dual channel – UXGA (1600 x 1200) or 1080p (1920 x 1080) HDMI 1080p (1920 x 1080) or 4XGA (2048 x 1536) Note: Assuming 30% blanking intervals overhead, 24bpp, 60fps

The Video Input Ports In i.MX6 D/Q Three ports; up to six input channels Two parallel – connected directly to the IPUs One MIPI/CSI-2 – connected to the MIPI/DSI receiver, can transfer up to four concurrent channels Four concurrent channels Each IPU has two input ports (CSI0 and CSI1), each can process an input channel from one of the external ports. The MIPI/CSI-2 bridge sends all its channels to all the IPU input ports and each port can select for processing a different channel, identified by its DI (Data Identifier). Additional channels can be transferred through a CSI transparently – as generic data – directly to the system memory. i.MX6 D/Q IPU #0 CSI0 Parallel 0 mux Video Sources MIPI/CSI-2 Receiver CSI1 MIPI/CSI-2 IPU #1 CSI1 Parallel 1 mux CSI0 Formats supported: - BT.656 - BT.1120 - BT 1358 (not validated) - YUV422, RGB888, YUV444 = over an 8 bit bus RAW format up to 16bpp which will be translated to 8 bit using companding - Generic data up to 20bit

IPUv3H – Internal Structure and Process Flow Cameras CSI (Camera Sensor I/F) SMFC (Sensor Multi FIFO Ctrl.) IDMAC (Image DMA Controller) IPUv3H VDI (Video De-Interlacer) 64-bit DI (Display I/F) IC (Image Converter) Memory AXI Displays DP (Display Processor) DC (Display Contr.) DMFC (Display Multi FIFO Ctrl.) CM (Control Module) IRT (Image Rotator) 32-bit AHB MCU

Use-case Demos and Bandwidth Profiling Tool Outline Bandwidth profiling tool, MMDC2 in unit test Demo for unit test (linear format vs tile format) Single-stream playback Transcoding Demo for Gstreamer (linear format vs tile format) Dual-stream playback 3D demo (with 3D TV and glasses)

Bandwidth Profiling Tool MMDC2 =================MMDCv1.3==================== Usage: mmdc [ARM:DSP1:DSP2:GPU2D:GPU2D1:GPU2D2:GPU3D:GPUVG:VPU:M4:PX P:SUM] [...] export MMDC_SLEEPTIME can be used to define profiling duration.1 by default means 1s export MMDC_LOOPCOUNT can be used to define profiling times. 1 by default. -1 means infinite loop. export MMDC_CUST_MADPCR1 can be used to customize madpcr1. Will ignore it if defined master Note1: More than 1 master can be inputted. They will be profiled one by one. Note2: MX6DL can't profile master GPU2D, GPU2D1 and GPU2D2 are used instead.

Bandwidth Profiling Tool MMDC2 Run command “/unit_tests/mmdc2 VPU”: i.MX6Q detected. MMDC VPU MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528077600 Busy cycles count: 359091030 Read accesses count: 7009051 Write accesses count: 2027040 Read bytes count: 299036224 Write bytes count: 96798720 Avg. Read burst size: 42 Avg. Write burst size: 47 Read: 285.18 MB/s / Write: 92.31 MB/s Total: 377.50 MB/s Utilization: 6% Overall Bus Load: 67% Bytes Access: 43

Bandwidth Profiling Tool MMDC2 Run command “/unit_tests/mmdc2 VPU”: i.MX6Q detected. MMDC VPU MMDC new Profiling results: *********************** Measure time: 1000ms Total cycles count: 528077600; (madpsr0, total cycle count in the measuring time period, e.g., 1000ms) Busy cycles count: 359091030; (madpsr1, total cycles during which the MMDC state machines were busy for both W/R) Read accesses count: 7009051; (madpsr2, total # of read accesses) Write accesses count: 2027040; (madpsr3, total # of write accesses) Read bytes count: 299036224; (madpsr4, total # of bytes that were transferred during read access) Write bytes count: 96798720; (madpsnr5, total # of bytes that were transferred during write access) Avg. Read burst size: 42; (read-bytes-count/read-accesses-count) Avg. Write burst size: 47; (write-bytes-count/write-accesses-count) Read: 285.18 MB/s / Write: 92.31 MB/s Total: 377.50 MB/s Utilization: 6%; ((read-byte-count + write-byte-count)/(busy-cycles-count*8*2)*100) with 64-bit DDR3 Overall Bus Load: 67%; (busy-cycles-count/total-cycles-count * 100) Bytes Access: 43; (or access utilization: (read-bytes-count+write-bytes-count)/read-accesses-count+write-accesses-count)

#####Demo Case 1, Unit Test, Playback, VPU=264MHz################# #####Unit test with tiled format and display to 1080p hdmi display####### cp /home/FAE/sunflower_2B_2ref_WP_40Mbps.264 /dev/shm/tmp_video /unit_tests/mxc_vpu_test.out -D "-i /dev/shm/tmp_video -f 2 -t 1 -a 60 -y 1" /unit_tests/mxc_vpu_test.out -D "-i /dev/shm/tmp_video -f 2 -t 1 -a 60 -y 0" /unit_tests/mxc_vpu_test.out -D "-i balloons_view01_3d.264 -f 2 -l 2" #####Demo Case 2, Unit Test, Transcoding, VPU=264MHz#################### #####Unit test with linear format and display to 264p or 1080p hdmi display##### #####Unit test with tile format is not ready yet############################## /unit_tests/mxc_vpu_test.out -T "-i /home/FAE/Coral_Reef_Adventure_720_video.wmv3 -f 3 -t 1 -x 0 -y 0 -a 60 -w 1280 -h 720 -o /dev/shm/tmp_video -q 25 -g 30" cp /home/FAE/mpeg2_1080p25_video1.mpv /dev/shm/tmp_video /unit_tests/mxc_vpu_test.out -T "-i /dev/shm/tmp_video -f 4 -t 1 -x 0 -y 0 -a 60 -w 1920 -h 1088 -o /dev/shm/transcode.264 -q 25 -g 30” #####Demo Case 3, Unit Test, video capture, VPU=264MHz#################### #####Unit test with linear format and display to 264p or 1080p hdmi display##### /unit_tests/mxc_vpu_test.out -E "-x 0 -f 2 -w 1280 -h 720 -b 0 -q 20 -g 0 -a 30 -c 600 -y 0 -o /dev/shm/out.264"

#####Demo Case 4: Gstreamer, Single stream decoding########## #####Decoding H264 1080p@10Mbps, and display to 1080p hdmi display###### time gst-launch-1.0 filesrc location= /home/FAE/Container_clips/Avatar_1920x1080_30fpsH264_2x44100AAC_3.6Mbps_246sec.mp4 typefind=true ! video/quicktime ! aiurdemux ! queue max-size-time=0 ! vpudec frame-drop=false output-format=0 ! overlaysink ##### Demo Case 5: Gstreamer, Single stream decoding, <60fps VPU=264MHz########### #####Decoding H264 1080p@37Mbps, display to 1080p hdmi display###### sudo cp /home/FAE/Container_clips/00017.m2ts /dev/shm/tmp_video time gst-launch filesrc location=/dev/shm/tmp_video typefind=true ! aiurdemux ! vpudec output-format=4 framedrop=false ! queue max-size-buffers=2 ! imxv4l2sink sync=false #####Demo Case 6: Gstreamer, Overlay with dual streams####### #####Decoding two H264 1080p@10Mbps, linear format , resized to 720p and overlaid to another 1080p######### sudo cp /home/FAE/Container_clips/FTF20033_1920x1080_30fpsH264_2x44100AAC_9.7mpbs_137sec.mp4 /dev/shm/tmp_video time gst-launch-1.0 playbin uri=file:///dev/shm/tmp_video video-sink="overlaysink overlay-width=1280 overlay-height=720" \ playbin uri=file:///home/FAE/Container_clips/Mosaic_1920x1080_H.264_10mbps_video1_repeat.mp4 video-sink="overlaysink overlay-width=1920 overlay-height=1080" #####Demo Case 7: Gstreamer, dual streams with dual display####### #####Decoding two H264 1080p for HDMI+LVDS, linear format , resized one to XGA display gst-launch-1.0 playbin uri=file:///dev/shm/ava.mp4 video-sink="imxv4l2sink device=/dev/video17" & \ gst-launch-1.0 playbin uri=file:///dev/shm/ftf.mp4 video-sink="imxv4l2sink device=/dev/video18"

#####Demo Case 8, Gstreamer,Transcoding################### #####1080p-MPEG2VGA-H264 with linear format##### time gst-launch-1.0 filesrc location=/home/FAE/Container_clips/mpeg2_1080p25_new.ts typefind=true ! video/mpegts ! aiurdemux ! vpudec ! vpuenc ! matroskamux ! filesink location=/dev/shm/tmp_video #####Demo Case 9, Gstreamer, Transcoding, 27fps for VPU=264MHz################### #####1080p-MPEG21080p-H264 with tiled format and display to 1080p hdmi display##### time gst-launch filesrc location=/home/FAE/Container_clips/mpeg2_1080p25_new.ts typefind=true ! aiurdemux ! vpudec output-format=4 framedrop=false ! queue max-size-buffers=2 ! tee name=t ! vpuenc codec=avc bitrate=20000000 ! matroskamux ! filesink location=/dev/shm/tmp_video t. ! queue max-size-buffers=2 ! imxv4l2sink sync=false #####Demo Case 10, 3D demo 30fps for each view, VPU=264MHz################### gplay /home/FAE/Container_clips/FLIGHT_3D_sideByside.mkv.mp4 #####Demo Case 11, 3D demo 30fps for each view, VPU=264MHz################### gst-launch imxv4l2src device=/dev/video0 fps-n=30 capture-mode=4 ! queue ! vpuenc codec=6 ! matroskamux ! filesink location=/dev/shm/output.mkv sync=false