Presentation on theme: "Analysis of Audio Compression Algorithms Sanjeev Sharma."— Presentation transcript:
Analysis of Audio Compression Algorithms Sanjeev Sharma
What will be covered? What are the audio file formats? Why so many? History of the most popular format (MP3) NeoAudio Transcoder MP3 file format explained MP3 Algorithm/ Features/ Issues VQF vs. MP3 Ogg Vorbis vs. MP3
Why so many formats? Different hardware/ operating systems need different file structure/ device drivers – Apple plays AIFF (uncompressed) AIFC (compressed) – Sun or DEC (Unix) play ‘au’, ‘snd’ – PCs (Windows) play ‘RIFF’/‘wav’ (uncompressed), ‘wma’, ‘wmv’ (compressed)
Why so many formats? (Cont’d) Several companies came out with their own Proprietary Technologies – InterWave by VocalTec (www.vocaltec.com)www.vocaltec.com – TrueSpeech by DSP Group, Inc (www.dspg.com)www.dspg.com – RealAudio by Real Networks (www.real.com)www.real.com – ToolVox by VoxWare (www.voxware.com)www.voxware.com – Perceptual Audio Coder (PAC) by Lucent (www.lucent.com)www.lucent.com
Why so many formats? (Cont’d) Proprietary Technologies (Cont’d) – Adaptive Transform Audio Coding (ATRAC) by Sony (http://www.sony.net/Products/ATRAC3)http://www.sony.net/Products/ATRAC3 – TwinVQ or VQF from NTT/ Yamaha (http://www.yamaha-xg.com)http://www.yamaha-xg.com – Windows Media Audio by Microsoft (http://www.microsoft.com/windows/windowsmedia)http://www.microsoft.com/windows/windowsmedia
Why so many formats? (Cont’d) Several companies collaborated to define non proprietary open standards – Specification available to all – But different economics involved In general, MP3 Encoder is not free, has IPR restrictions Ogg Vorbis Encoder is free, and open source
MPEG Stands for Moving Pictures Experts Group MPEG-1 – First phase, started in 1988, finalized in 1992 – Three operating mode with increasing complexity and performance Layer 1, Layer 2, Layer 3 MPEG-2 – Originally (1994) only added two extensions to MPEG-1 Backwards compatible multi-channel coding Coding at lower sampling frequencies – Later gave up backwards compatibility in favor of Advanced Audio Coding (AAC)
MPEG (Cont’d) MPEG-3 – Created to define High Definition Television (HDTV) video coding – Later rolled into MPEG-2 itself MPEG-4 – Finished in late 1998 – Emphasis on new functionalities rather than compression efficiency Mobile/ Stationary User Terminal Database Access Communications Interactive Services
MPEG (Cont’d) MPEG-7 – Does NOT define compression algorithm – Content representation standard for multimedia information search, filtering, management and processing
MPEG Layers Layer 1 – possesses the lowest complexity – specifically targeted to applications where the complexity of the encoder plays an important role. Layer 2 – requires a more complex encoder as well as a slightly more complex decoder. – is able to suppress more redundancy in the signal and applies the psychoacoustic model in more efficient way.
MPEG Layers (Cont’d) Layer 3 – increased complexity – targeted to applications needing the lowest data rates, by its suppression of the redundant signal and its improved extraction of feebly audible frequencies using its filter – MP3 stands for MPEG-1/2 Layer 3 and not MPEG- 3!!
Personal Car Stereo Installed Sony CDX-MP450X in my car hoping I would be able to enjoy my MP3’s while driving Burnt an mp3 CD to play on car stereo (~150 songs) Most of the mp3’s were skipped, only some actually played Investigated to find the difference Turned out that player was able to decode only high bit rate files Installed free software (NeoAudio) on computer to do the ‘transcoding’ – Conversion from one sampling rate and/or bit rate to another ‘On the fly’ converted files play, but with ‘clicks’ Intermediate conversion to wav and then transcoding to mp3 gave perfect results!
MP3 File Format File itself split into frames – One frame is and audio clip of 24 ms at 48 KHz sampling Each frame has a 4 byte frame header Constant Bit Rate files have similar frame headers Variable Bit Rate (VBR) files have different info in each frame header – Lower bitrates may be used in frames where it will not affect quality
MP3 Frame Header AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM – A - Frame sync (all 11 bits set) – B - MPEG Audio version ID (2 bit) – C - Layer description (2 bit) – D - Protection bit (1 bit) – E - Bitrate index (4 bit) – F - Sampling rate frequency index (2 bit) – G - Padding bit (1 bit) – H - Private bit (1 bit) – I - Channel Mode (2 bit) – J - Mode Extension (2 bit) – K - Copyright (1 bit) – L - Original (1 bit) – M - Emphasis (2 bit)
MPEG Audio version ID (B) 00 - MPEG Version 2.5 (unofficial) 01 - reserved 10 - MPEG Version 2 (ISO/IEC ) 11 - MPEG Version 1 (ISO/IEC )
Layer Description (C) 00 – reserved 01 - Layer III 10 - Layer II 11 - Layer I
Protection Bit (D) 0 - Protected by CRC (16bit crc follows header) 1 - Not protected
Bitrate Index (E)
Sampling rate frequency index (F)
Padding Bit (G), Private Bit (H) Padding Bit (G) – 0 - frame is not padded – 1 - frame is padded with one extra slot – Padding is used to fit the bit rates exactly Private Bit (H) – May be freely used for specific needs of an application
Mode Extension (J) Applicable to Joint Stereo only Complete frequency range of MPEG file is divided into 32 subbands For Layer I & II these two bits determine frequency range (bands) where intensity stereo is applied. For Layer III these two bits determine which type of joint stereo is used (intensity stereo or Middle/Side stereo).
Copyright (K), Original (L), Emphasis (M) Copyright (K) – 0 - Audio is not copyrighted – 1 - Audio is copyrighted Original (L) – 0 - Copy of original media – 1 - Original media Emphasis (M) – It is used to sort of 're-equalize' the sound after a Dolby-like noise supression – 00 – none – /15 ms – 10 - reserved – 11 - CCIT J.17
Perceptual Audio Coder (PAC) Original work attributed to Lucent (http://www.bell- labs.com/org/1133/Research/SpeechAudioCod ing/audio.html)http://www.bell- labs.com/org/1133/Research/SpeechAudioCod ing/audio.html Became the framework of MPEG-2 encoders
MP3 Encoder/ Decoder
MP3 Encoder/ Decoder (Cont’d) Filter Bank – Encoder decomposes input signal into subsampled spectral components (time/ frequency domain) – Forms an Analysis/ Synthesis system in combination with the decoder filterbank Perceptual Model – For either time domain signal or the analysis filterbank output Computes an estimate of the actual (time and frequency dependent) masking Uses rules known from psychoacoustics – Psychoacoustics: Relationship between what arrives at the ear and what we hear
MP3 Encoder/ Decoder (Cont’d) Quantization and coding – Spectral components are quantized and coded keeping the quantization noise below the masking threshold Encoding of bitstream – Bitstream formatter assembles the bitstream – Bitstream consists of Quantized and coded spectral coefficients Side information like bit allocation information
MPEG Flexibility Flexibility needed to fit into several applications Flexibility achieved with – Different Operating Modes Single channel Dual channel (two independent channels) Stereo (no joint stereo coding) Joint stereo – Different Sampling frequencies 32 KHz, 44.1 KHz, 48 KHz (MPEG-1) Half of above (MPEG-2) ¼ th of MPEG-1 (MPEG-2.5, proprietary Fraunhofer extension)
MPEG Flexibility (Cont’d) Flexibility achieved with – Different Bit rates Bitrate defines the compression ratio Min 32 kpbs to Max 320 kbps for MPEG-1 Min 8 kpbs to Max 160 kbps for MPEG-2 Low Sampling Frequencies extension (LSF) Variable bit rate also possible (each segment has its own bit rate) Sweet spot – 128 Kbps for stereo signal at 48 KHz sampling rate – Bit rates higher than this, improve quality very slowly – Bit rate lower than this, degrade quality very fast
MP3 Quality Not all encoders are created equal Quantization and encoding block forms – Inner control loop to adjust the quantization step with the available Huffman codes (rate loop) – outer control loop with the perceptual block to keep quantization noise under masking threshold (noise control loop) Hence encoder needs to be ‘tuned’ for different bitrates
MP3 IPR Issues MPEG is an open standard But it is informative only The ISO approved standard is based on work by Fraunhofer Institute, which is protected by several patents. In September 98, Fraunhofer Institute, sent a letter to several developers of "free" ISO-source based encoders saying that all developers and publishers of MPEG-audio layer 3 (MP3) encoders based on ISO-source must pay a license fee to Fraunhofer. Fraunhofer joined with Thomson Multimedia (AKA RCA) in order to create a joint patents portfolio: mp3licensing.commp3licensing.com
Sample MP3/MP3 Patents Digital coding process Digital adaptive transformation coding method Process for the detecting of errors in the transmission of frequency-coded digital signals Process for reducing frequency interlacing during acoustic or optical signal transmission and/or recording Method for reducing data in the transmission and/or storage of digital signals of several dependent channels Process for reducing data in the transmission and/or storage of digital signals of several interdependent channels Etc…etc..
LAME LAME Ain’t an Mp3 Encoder LAME is an educational tool to be used for learning about MP3 encoding The goal of the LAME project is to use the open source model to improve the psycho acoustics, noise shaping and speed of MP3
Free Software? Several free software like NeoAudio use LAME plug-in, despite the cryptic note on the official homepage (http://www.mp3dev.org) – “Using the LAME encoding engine (or other mp3 encoding technology) in your software may require a patent license in some countries.”patent license NeoAudio and LAME are open source software under the GNU General Public Licenseopen source
VQF or TwinVQ Started by NTT/ Yamaha Corp Some claim that VQF produces audio files with better compression and better sound quality than MP3. Others say, the sound quality of a VQF file is not better nor worse than a MP3 file, it is just different. Needs more processing power for encoding/ decoding Supported in MPEG-4 Support for VQF has waned as of late
MP3 vs. VQF MP3 128Kbps Original 1411Kpbs VQF 96Kbps
MP3 vs. VQF (Cont’d) Colors vary from red (peaks in power spectra) to blue and violet (the lowest signal power). - VIBGYOR
MP3 vs. VQF (Cont’d) 1. MP3 psychoacoustic model excludes completely some high frequencies (colored blue) when it decides that they are irrelevant. Clearly, VQF designers have decided not to exclude any part of the spectrum. 2. MP3 preserves power spectra peaks (colored red) very good, but it has its problems with the "green" and "yellow" parts; this can be heard by a careful listener. VQF does not preserve the peaks at the highest frequencies that good, but it beats MP3 at everything else (especially at mid-frequencies).
MP3 vs. VQF (Cont’d)
VQF vs. MP3 (Cont’d) Conclusion? – It seems that MP3 has a better psychoacoustic model. – VQF sounds (and looks) more natural.
Ogg Vorbis Started in 1993 Development picked up in fall, 1998 after Fraunhofer started asking royalties for MP3 projects Ogg is a container format for audio, video, and metadata Vorbis is the name of a specific audio compression scheme that's designed to be contained in Ogg – other formats are capable of being embedded in Ogg such as FLAC and SpeexFLACSpeex
MP3 vs. Ogg
Frequencies over 16 KHz are lost in both Cutoff more severe for MP3 around 15 KHz Ogg does maintain, although diminishing, some of higher frequencies