Presentation on theme: "Multiplexing H.264/AVC Video with MPEG-AAC Audio Harishankar Murugan University of Texas at Arlington."— Presentation transcript:
Multiplexing H.264/AVC Video with MPEG-AAC Audio Harishankar Murugan University of Texas at Arlington
Outline : Multiplexing: Areas of applications Why H.264 and AAC? Multiplexing De-multiplexing Synchronization and Playback Results Conclusions Future work References
Multiplexing : Areas of applications DVB : DVB-C, DVB-T ATSC IPTV
Multiplexing : Areas of applications
Why H.264 Video? Up to 50% in bit rate savings: Compared to H.263v2 (H.263+) or MPEG-2 Simple Profile. High quality video: H.264 offers consistently good video quality at high and low bit rates. Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit errors in error-prone wireless networks. Wide areas of application streaming mobile TV, HDTV, and storage options for the home user
Important features of H.264 IDR (Instantaneous decoder refresh) picture: Anchor picture with only I-slices. Sequence parameter set: profile and level indicator. decoding or playback order. number of reference frames. aspect ratio or color space details. Picture parameter set: entropy coding mode used. slice data partitioning and macroblock reordering. Flags indicating the usage of weighted (bi) prediction. Quantization parameter details.
AAC Audio Advanced Audio Coding is a standardized, lossy compression scheme for audio. Encoder Block diagram of AAC
AAC Audio Profiles : Low Complexity (LC) - the simplest and most widely used; Main Profile (MAIN) - LC profile with backwards prediction; Sample-Rate Scalable (SRS) – LC profile with gain control tool ; Bit stream Formats: ADIF - Audio Data Interchange Format: Only one header in the beginning of the file followed by raw data blocks ADTS - Audio Data Transport Stream Separate header for each frame enabling decoding from any frame
Why AAC Audio? Supports Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz) Higher coding efficiency and simpler filterbank (pure MDCT ) as compared to mp3 (hybrid filter bank ) Improved compression provides higher-quality audio with smaller bit rates. Superior performance at bit rates > 64 kbps and at bit rates reaching as low as 16 kbps.
Factors to be considered for Multiplexing and Transmission Split the video and audio coded bit streams into smaller data packets Multiplex with equal priority given to all elementary streams Detect packet losses and errors Additional information to help synchronize audio and video
Packetization H264 Encoder AAC Encoder Packetize r MultiplexerTransport Stream Video Source Audio Source MPEG encoded stream Data Source Packetizer PES 2 layers of packetization : PES - Packetized Elementary stream : Transport Stream :
Packetized Elementary stream (PES) Elementary streams (ES): Encoded video stream Encoded audio stream Data stream (Optional) PES contains access units that are sequentially separated and packetized PES headers distinguish different ES and contain timestamp information Packet size varies with the size of access units
Packetized Elementary stream (PES) AUDIO OR VIDEO ELEMENTARY STREAM PES HeaderPayload
PES Header Description 3 bytes of start code – 0x byte of stream ID 2 bytes of packet length 2 bytes of time stamp (Frame number)
Frame number as time stamp Video frame rate : constant (25/30/.. fps) time = frame number/fps Audio sampling rate : constant (8 – 96 kHz) Number of samples/frame (AAC) : 1024 time = 1024*frame number/(sampling rate)
Advantages over the method that uses clock samples as time stamps Saves the extra header bytes used for sending program clock reference (PCR) information periodically No synchronization problem due to clock jitters No propagation of delay between audio and video Less complex and more suitable for software implementation
Transport Packets PES from various elementary sources are broken into smaller packets called transport packets Transport packets have a fixed length of 188 bytes Constraints Each packet can have data from only one PES PES header should be the first byte of the transport packet payload. Stuffing bytes are added if the above constraints are not met
Transport stream PES Payload PES Header Transport Header Transport Stream Packet Stuffing bytes
Packet Header SyntaxNumber of bits Sync byte8 PID10 Payload unit start indicator1 Adaptation field control1 Continuity counter4 if adaptation field control ==1 payload byte offset8 stuffing bytes or additional header payload
PID (Packet identifier) : Each elementary stream has a unique PID. Some are reserved for NULL packets and PSI (Program Specific Information). PSI (Program specific information) : Sequence parameter set and picture parameter set are sent as PSI at frequent intervals. Payload unit start indicator : 1 bit flag to indicate presence of PES header in the payload. Adaptation field control : 1 bit flag to indicate presence of any data other than PES data in payload. Packet Header
Continuity counter : 4 bit rolling counter which is incremented by 1 for each consecutive TS packet of the same PID. To detect packet loss. Payload Byte offset : If adaptation field control bit is ‘1’, byte offset value of the start of the payload or the length of adaptation field is mentioned here. Adaptation field : Stuffing bytes, if PES data < TS packet size Additional header information
Multiplexing method adopted Multiplexing method affects buffer fullness at the de-multiplexer and in turn playback Video and audio timing counters are used to ensure proper multiplexing Timing counters are incremented according to the playback time of each packet multiplexed PES with the least timing counter value is always given preference during packet allocation
Multiplexing method adopted Video PES => 1/25 = 40 ms fps = 25 4 TS packets => 40/4 = 10 ms PES length = 570 # of TS = round(570/185)
Multiplexed transport stream P1 V 0x2 P1 A 0x4 P1 A 0x5 P1 A 0x6 P1 V 0x3 NNNN P1 A 0x7 PID Video PES Audio PES Transport stream
Buffer fullness at demultiplexer Test criteria Buffer details Video buffer 100kB Video buffer 600kB Video buffer 600kB Start TS packet number End TS packet number Video buffer fullness (kB) Audio buffer fullness (kB) Number of video frames Number of audio frames Video buffer content playback time1.72 sec8.44 sec6.92 sec Audio buffer content playback time1.75 sec8.47 sec6.94 sec
Synchronization and playback During playback, data is loaded from the buffer IDR frame is searched from the top of the video buffer Frame number of IDR frame is extracted Corresponding audio frame number is calculated as follows Aframe number = ( Vframe number * sampling rate) / (1024*fps)
Synchronization and playback If a non-integer value, frame number is rounded off and the corresponding audio frame is searched. The audio and video contents from the corresponding frame numbers are decoded with PSI and played back. Then the audio and video buffers are emptied and incoming data gets buffered and the process continues. If corresponding audio frame is not found, next IDR frame is searched and same process is repeated.
Results Test Clip DetailsClip 1 Duration of clip (sec)125 YUV file size (kB)468,168 WAVE file size (kB)25,624 H.264 file size (kB)6,566 AAC file size (kB)2,066 Video encoder Compression ratio71.30 Audio encoder Compression ratio12.40 H.264 encoder bit rate (kBps)52.53 AAC encoder bit rate (kBps)16.53
Results Test Clip DetailsClip 1 H.264 encoder bit rate (kBps)52.53 AAC encoder bit rate (kBps)16.53 Number of TS packets50,577 Total transport stream size (kB)9,508 Compression ratio with header overhead51.93 Bit rate for transmission (kBps)76.06 AVI (*.avi) file size (kB)19,456 MPEG-2 movie (*.mpg) file size (kB)15,666
Synchronization results Start TS packet number Synchronized frame numbers chosen Video frame playback time (sec) Audio frame playback time (sec) Delay (msec) Visual delay VideoAudio Not perceptible Not perceptible Not perceptible Not perceptible Not perceptible
Conclusions Synchronization of audio and video is achieved by starting de-multiplexing from any TS packet. Visually there is absolutely no lag between video and audio Bit rate can be changed by using rate control module in the H.264 encoder
Test Conditions Single program Transport stream is generated Input raw video : YUV format Input raw audio : WAVE format Profiles used : H.264 : Main profile AAC : Low complexity profile (ADTS format) GOP : IBBPBB (IDR forced) Video frame rate: 25fps Audio sampling frequency : 48 kHz
Future work Extension of the algorithm to multiplex multiple program streams Error correction method Reduce initial buffering time
References Books and Papers: MPEG–2 advanced audio coding, AAC. International Standard IS 13818–7, ISO/IEC JTC1/SC29 WG11, MPEG. Information technology — generic coding of moving pictures and associated audio information, part 3: Audio.International Standard IS 13818–3, ISO/IEC JTC1/SC29 WG11, MPEG. Information technology — generic coding of moving pictures and associated audio information, part 4: Conformance testing.International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, Information technology—Generic coding of moving pictures and associated audio—Part 1: Systems, ISO/IEC :2005, International Telecommunications Union.  MPEG-4: ISO/IEC JTC1/SC : Information technology – Coding of audio- visual objects - Part 10: Advanced Video Coding, ISO/IEC,  P. V. Rangan, S. S. Kumar, and S. Rajan, “Continuity and Synchronization in MPEG,” IEEE Journal on Selected Areas in Communications, Vol. 14, pp , Jan  B.J. Lechner et. al “The ATSC Transport Layer, Including Program and System Information Protocol (PSIP)”, Proc of the IEEE, vol. 94, no. 1,pp , January 2006
References  Hari Kalva et. al “Implementing Multiplexing, Streaming,and Server Interaction for MPEG-4”, IEEE transactions on circuits and systems for video technology, vol 9, No.8, pp ,december  M. Bosi and M. Goldberg “Introduction to digital audio coding and standards”, Boston : Kluwer Academic Publishers, c2003.  D. K. Fibush, “Timing and Synchronization Using MPEG-2 Transport Streams,” SMPTE Journal, pp ,July, K. Brandenburg, “MP3 and AAC Explained”, AES 17th International Conference, Florence, Italy, September  S-k. Kwon, A. Tamhankar and K.R. Rao ”Overview of H.264 / MPEG-4 Part 10”, J. Visual Communication and Image Representation, vol. 17, pp , April A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal Processing: Image Communication, vol. 19, issue 9, pp , Oct  T. Wiegand et. al “Overview of the H.264/AVC Video Coding Standard,” IEEE Trans. CSVT, Vol. 13, pp , July 2003.
Reference  R. Hopkins, “United States digital advanced television broadcasting standard,” SPIE/IS & T, Photonics West, vol. CR61,pp , San Jose, CA, Feb  Z. Cai et. al “A RISC Implementation of MPEG-2 TS Packetization”, in the proceedings of IEEE HPC conference, pp , May  M.Fieldler, “Implementation of basic H.264/AVC Decoder”, seminar paper at Chemnitz university of technology, June 2004  R.Linneman, “Advanced audo coding on FPGA”, BS honours thesis, October 2002, School of Information Technology, Brisbane.  J. Watkinson, “The MPEG Handbook”, Second Edition, Oxford ; Burlington, MA : Elsevier/Focal Press,  I.E.G.Richardson, “H.264 and MPEG-4 Video Compression: Video Coding for Next Generation Multimedia”, John Wiley & Sons, Proceedings of the IEEE, Special issue on Global Digital Television: Technology and Emerging Services, vol.94,pp 5-7, Jan  P.D Symes “Digital video compression“, McGraw-Hill, c2004  C. Wootton, “Practical guide to video and audio compression : from sprockets and rasters to macro blocks”, Oxford : Focal, 2005.
References  “FAAC and FAAD AAC software, website  MPEG official website  Alternative AAC software from  H.264 software JM (10.2) from  Bauvigne G. “MPEG-2/MPEG-4 AAC”, MP3 Tech Website,  Whittle R., “Comparing AAC and MP3”, Website mp3-vq.html  Public discussion forum website for a/v containers:  JVT documents website: Audio test files website Reference for H.264 website
Demultiplexer Video Buffer Audio buffer H.264 Decoder AAC Decoder Timestamp information Synchronized playback Transport stream