Analysis of Audio Compression Algorithms Sanjeev Sharma.

Slides:



Advertisements
Similar presentations
Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Advertisements

MP3 Overview John Ehrhardt Elena Silenok CSE228 – Spring 03.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7) Klara Nahrstedt Spring 2012.
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
Developement and Implementation of an MPEG1 Layer III Decoder on x86 and TMS320C6711 platforms Braidotti Enrico (Farina Simone)
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
Digital Audio Teppo Räisänen LIIKE/OAMK. General Information Auditive information is transmitted by vibrations of air molecules The speed of sound waves.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Multimedia Authoring1 Introduction to Garageband Garageband is both a: MIDI sequencer Digital audio recorder Garageband: Real Instruments Tracks displayed.
Audio Coding Team Member: ChungMing Yan, Chun Tong.
1 CMSHN1114/CMSCD1011 Introduction to Computer Audio Lecture 5: Digital audio formats Dr David England School of Computing and Mathematical Sciences
Pro Tools 7 Session Secrets Chapter 6: After the Bounce or Life Outside of Pro Tools Life Outside of Pro Tools.
A stereo audio file 1. Audio Channels Number of audio channels determines number of waveforms in a recording Two relevant types of recording Stereo recording.
4.1Different Audio Attributes 4.2Common Audio File Formats 4.3Balancing between File Size and Audio Quality 4.4Making Audio Elements Fit Our Needs.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Dale & Lewis Chapter 3 Data Representation Analog and digital information The real world is continuous and finite, data on computers are finite  need.
Digital Audio Compression
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
Technology ICT Option: Audio.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Image and Sound Editing Raed S. Rasheed Sound What is sound? How is sound recorded? How is sound recorded digitally ? How does audio get digitized.
Audiovisual digital documents Adolf Knoll National Library of the Czech Republic
ALCATEL-LUCENT V. MICROSOFT Samuel Zats, IEOR 190G, April 9, 2008.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
Carnegie Mellon. Carnegie Mellon Video II Carnegie Mellon Moving Picture Experts Group - MPEG The source code is not publicly available from the ISO.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
AUDIO VIDEO FLASH DIGITAL MEDIA: COMMUNICATION AND DESIGN
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
Audio CompressiontMyn1 Audio Compression Audio compression has become well entrenched in consumer and professional digital audio products such as the compact.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
1 Kingdom of Saudi Arabia Prince Norah bint Abdul Rahman University College of Computer Since and Information System NET201.
MPEG-2 Standard By Rigoberto Fernandez. MPEG Standards MPEG (Moving Pictures Experts Group) is a group of people that meet under ISO (International Standards.
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
MPEG: (Moving Pictures Expert Group) A Video Compression Standard for Multimedia Applications Seo Yeong Geon Dept. of Computer Science in GNU.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
Sound or Audio, whichever you prefer –MIDI Files.midi or.mid (Musical Instrument Digital Interface) use for instrumental music. –This format is supported.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
Dhatchaini Rajendran Student ID: Date :
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
AUDIO AND VIDEO COMPRESSION AND IT’S IMPORTANCE ON THE INTERNET Brian Dillinger May 3, 2010.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
MPEG-1Standard By Alejandro Mendoza. Introduction The major goal of video compression is to represent a video source with as few bits as possible while.
Guerino Mazzola (Fall 2015 © ): Introduction to Music Technology IIIDigital Audio III.5 (F Oct 30) MP3 and other digital audio file formats.
Guerino Mazzola (Fall 2015 © ): Introduction to Music Technology IIIDigital Audio III.7 (M Nov 04) The MP3 frame format.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
Audio Streaming © Nanda Ganesan, Ph.D.. Audio File Features Audio file is a record of captured sound that can be played back –The WAV File is an example.
EE5359 Multimedia Processing Project Study and Comparison of AC3, AAC and HE-AAC Audio Codecs Dhatchaini Rajendran Student ID: Date :
How to Create a Podcast. Podcasting “is the distribution of audio or video files, such as radio programs or music videos, over the Internet using either.
Introduction to MPEG  Moving Pictures Experts Group,  Geneva based working group under the ISO/IEC standards.  In charge of developing standards for.
MP3 and MP4 Audio By: Krunal Tailor
III Digital Audio III.7 (W Nov 04) The MP3 frame format.
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
III Digital Audio III.5 (W Oct 18) MP3 and other digital audio file formats.
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
III Digital Audio III.7 (F Oct 20) The MP3 frame format.
Multimedia: Digitised Sound Data
Audio Henning Schulzrinne Dept. of Computer Science
Technology ICT Option: Audio.
III Digital Audio III.7 (Mo Oct 22) The MP3 frame format.
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Technology ICT Option: Audio.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Analysis of Audio Compression Algorithms Sanjeev Sharma

What will be covered? What are the audio file formats? Why so many? History of the most popular format (MP3) NeoAudio Transcoder MP3 file format explained MP3 Algorithm/ Features/ Issues VQF vs. MP3 Ogg Vorbis vs. MP3

Some Audio Formats Uncompressed – RIFF: Resource Interchange File Format (Windows) – AIFF: Audio Interchange File Format (Mac) – AU: Audio (Unix) Compressed – MP3 : MPEG-I/II Layer 3 – VQF: [Transform-domain Weighted Interleave] Vector Quantization Format – Ogg Vorbis

Why so many formats? Different hardware/ operating systems need different file structure/ device drivers – Apple plays AIFF (uncompressed) AIFC (compressed) – Sun or DEC (Unix) play ‘au’, ‘snd’ – PCs (Windows) play ‘RIFF’/‘wav’ (uncompressed), ‘wma’, ‘wmv’ (compressed)

Why so many formats? (Cont’d) Several companies came out with their own Proprietary Technologies – InterWave by VocalTec ( – TrueSpeech by DSP Group, Inc ( – RealAudio by Real Networks ( – ToolVox by VoxWare ( – Perceptual Audio Coder (PAC) by Lucent (

Why so many formats? (Cont’d) Proprietary Technologies (Cont’d) – Adaptive Transform Audio Coding (ATRAC) by Sony ( – TwinVQ or VQF from NTT/ Yamaha ( – Windows Media Audio by Microsoft (

Why so many formats? (Cont’d) Several companies collaborated to define non proprietary open standards – Specification available to all – But different economics involved In general, MP3 Encoder is not free, has IPR restrictions Ogg Vorbis Encoder is free, and open source

MPEG Stands for Moving Pictures Experts Group MPEG-1 – First phase, started in 1988, finalized in 1992 – Three operating mode with increasing complexity and performance Layer 1, Layer 2, Layer 3 MPEG-2 – Originally (1994) only added two extensions to MPEG-1 Backwards compatible multi-channel coding Coding at lower sampling frequencies – Later gave up backwards compatibility in favor of Advanced Audio Coding (AAC)

MPEG (Cont’d) MPEG-3 – Created to define High Definition Television (HDTV) video coding – Later rolled into MPEG-2 itself MPEG-4 – Finished in late 1998 – Emphasis on new functionalities rather than compression efficiency Mobile/ Stationary User Terminal Database Access Communications Interactive Services

MPEG (Cont’d) MPEG-7 – Does NOT define compression algorithm – Content representation standard for multimedia information search, filtering, management and processing

MPEG Layers Layer 1 – possesses the lowest complexity – specifically targeted to applications where the complexity of the encoder plays an important role. Layer 2 – requires a more complex encoder as well as a slightly more complex decoder. – is able to suppress more redundancy in the signal and applies the psychoacoustic model in more efficient way.

MPEG Layers (Cont’d) Layer 3 – increased complexity – targeted to applications needing the lowest data rates, by its suppression of the redundant signal and its improved extraction of feebly audible frequencies using its filter – MP3 stands for MPEG-1/2 Layer 3 and not MPEG- 3!!

Personal Car Stereo Installed Sony CDX-MP450X in my car hoping I would be able to enjoy my MP3’s while driving Burnt an mp3 CD to play on car stereo (~150 songs) Most of the mp3’s were skipped, only some actually played Investigated to find the difference Turned out that player was able to decode only high bit rate files Installed free software (NeoAudio) on computer to do the ‘transcoding’ – Conversion from one sampling rate and/or bit rate to another ‘On the fly’ converted files play, but with ‘clicks’ Intermediate conversion to wav and then transcoding to mp3 gave perfect results!

NeoAudio Transcoder

Transcoder Settings

Transcoding Options Choose Encoder Choose MPEG Version Choose Bitrate Choose Mode Choose Quality Choose Samplerate

Choose Encoder

Choose Encoder Version

Choose Bitrate

Choose Mode and Quality

Choose Samplerate

MP3 File Format File itself split into frames – One frame is and audio clip of 24 ms at 48 KHz sampling Each frame has a 4 byte frame header Constant Bit Rate files have similar frame headers Variable Bit Rate (VBR) files have different info in each frame header – Lower bitrates may be used in frames where it will not affect quality

MP3 Frame Header AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM – A - Frame sync (all 11 bits set) – B - MPEG Audio version ID (2 bit) – C - Layer description (2 bit) – D - Protection bit (1 bit) – E - Bitrate index (4 bit) – F - Sampling rate frequency index (2 bit) – G - Padding bit (1 bit) – H - Private bit (1 bit) – I - Channel Mode (2 bit) – J - Mode Extension (2 bit) – K - Copyright (1 bit) – L - Original (1 bit) – M - Emphasis (2 bit)

MPEG Audio version ID (B) 00 - MPEG Version 2.5 (unofficial) 01 - reserved 10 - MPEG Version 2 (ISO/IEC ) 11 - MPEG Version 1 (ISO/IEC )

Layer Description (C) 00 – reserved 01 - Layer III 10 - Layer II 11 - Layer I

Protection Bit (D) 0 - Protected by CRC (16bit crc follows header) 1 - Not protected

Bitrate Index (E)

Sampling rate frequency index (F)

Padding Bit (G), Private Bit (H) Padding Bit (G) – 0 - frame is not padded – 1 - frame is padded with one extra slot – Padding is used to fit the bit rates exactly Private Bit (H) – May be freely used for specific needs of an application

Channel Mode (I) 00 - Stereo 01 - Joint stereo (Stereo) 10 - Dual channel (2 mono channels) 11 - Single channel (Mono)

Mode Extension (J) Applicable to Joint Stereo only Complete frequency range of MPEG file is divided into 32 subbands For Layer I & II these two bits determine frequency range (bands) where intensity stereo is applied. For Layer III these two bits determine which type of joint stereo is used (intensity stereo or Middle/Side stereo).

Copyright (K), Original (L), Emphasis (M) Copyright (K) – 0 - Audio is not copyrighted – 1 - Audio is copyrighted Original (L) – 0 - Copy of original media – 1 - Original media Emphasis (M) – It is used to sort of 're-equalize' the sound after a Dolby-like noise supression – 00 – none – /15 ms – 10 - reserved – 11 - CCIT J.17

Perceptual Audio Coder (PAC) Original work attributed to Lucent ( labs.com/org/1133/Research/SpeechAudioCod ing/audio.html) labs.com/org/1133/Research/SpeechAudioCod ing/audio.html Became the framework of MPEG-2 encoders

MP3 Encoder/ Decoder

MP3 Encoder/ Decoder (Cont’d) Filter Bank – Encoder decomposes input signal into subsampled spectral components (time/ frequency domain) – Forms an Analysis/ Synthesis system in combination with the decoder filterbank Perceptual Model – For either time domain signal or the analysis filterbank output Computes an estimate of the actual (time and frequency dependent) masking Uses rules known from psychoacoustics – Psychoacoustics: Relationship between what arrives at the ear and what we hear

MP3 Encoder/ Decoder (Cont’d) Quantization and coding – Spectral components are quantized and coded keeping the quantization noise below the masking threshold Encoding of bitstream – Bitstream formatter assembles the bitstream – Bitstream consists of Quantized and coded spectral coefficients Side information like bit allocation information

MPEG Flexibility Flexibility needed to fit into several applications Flexibility achieved with – Different Operating Modes Single channel Dual channel (two independent channels) Stereo (no joint stereo coding) Joint stereo – Different Sampling frequencies 32 KHz, 44.1 KHz, 48 KHz (MPEG-1) Half of above (MPEG-2) ¼ th of MPEG-1 (MPEG-2.5, proprietary Fraunhofer extension)

MPEG Flexibility (Cont’d) Flexibility achieved with – Different Bit rates Bitrate defines the compression ratio Min 32 kpbs to Max 320 kbps for MPEG-1 Min 8 kpbs to Max 160 kbps for MPEG-2 Low Sampling Frequencies extension (LSF) Variable bit rate also possible (each segment has its own bit rate) Sweet spot – 128 Kbps for stereo signal at 48 KHz sampling rate – Bit rates higher than this, improve quality very slowly – Bit rate lower than this, degrade quality very fast

MP3 Quality Not all encoders are created equal Quantization and encoding block forms – Inner control loop to adjust the quantization step with the available Huffman codes (rate loop) – outer control loop with the perceptual block to keep quantization noise under masking threshold (noise control loop) Hence encoder needs to be ‘tuned’ for different bitrates

MP3 IPR Issues MPEG is an open standard But it is informative only The ISO approved standard is based on work by Fraunhofer Institute, which is protected by several patents. In September 98, Fraunhofer Institute, sent a letter to several developers of "free" ISO-source based encoders saying that all developers and publishers of MPEG-audio layer 3 (MP3) encoders based on ISO-source must pay a license fee to Fraunhofer. Fraunhofer joined with Thomson Multimedia (AKA RCA) in order to create a joint patents portfolio: mp3licensing.commp3licensing.com

Sample MP3/MP3 Patents Digital coding process Digital adaptive transformation coding method Process for the detecting of errors in the transmission of frequency-coded digital signals Process for reducing frequency interlacing during acoustic or optical signal transmission and/or recording Method for reducing data in the transmission and/or storage of digital signals of several dependent channels Process for reducing data in the transmission and/or storage of digital signals of several interdependent channels Etc…etc..

LAME LAME Ain’t an Mp3 Encoder LAME is an educational tool to be used for learning about MP3 encoding The goal of the LAME project is to use the open source model to improve the psycho acoustics, noise shaping and speed of MP3

Free Software? Several free software like NeoAudio use LAME plug-in, despite the cryptic note on the official homepage ( – “Using the LAME encoding engine (or other mp3 encoding technology) in your software may require a patent license in some countries.”patent license NeoAudio and LAME are open source software under the GNU General Public Licenseopen source

VQF or TwinVQ Started by NTT/ Yamaha Corp Some claim that VQF produces audio files with better compression and better sound quality than MP3. Others say, the sound quality of a VQF file is not better nor worse than a MP3 file, it is just different. Needs more processing power for encoding/ decoding Supported in MPEG-4 Support for VQF has waned as of late

MP3 vs. VQF MP3 128Kbps Original 1411Kpbs VQF 96Kbps

MP3 vs. VQF (Cont’d) Colors vary from red (peaks in power spectra) to blue and violet (the lowest signal power). - VIBGYOR

MP3 vs. VQF (Cont’d) 1. MP3 psychoacoustic model excludes completely some high frequencies (colored blue) when it decides that they are irrelevant. Clearly, VQF designers have decided not to exclude any part of the spectrum. 2. MP3 preserves power spectra peaks (colored red) very good, but it has its problems with the "green" and "yellow" parts; this can be heard by a careful listener. VQF does not preserve the peaks at the highest frequencies that good, but it beats MP3 at everything else (especially at mid-frequencies).

MP3 vs. VQF (Cont’d)

VQF vs. MP3 (Cont’d) Conclusion? – It seems that MP3 has a better psychoacoustic model. – VQF sounds (and looks) more natural.

Ogg Vorbis Started in 1993 Development picked up in fall, 1998 after Fraunhofer started asking royalties for MP3 projects Ogg is a container format for audio, video, and metadata Vorbis is the name of a specific audio compression scheme that's designed to be contained in Ogg – other formats are capable of being embedded in Ogg such as FLAC and SpeexFLACSpeex

Why Ogg?

MP3 vs. Ogg

Frequencies over 16 KHz are lost in both Cutoff more severe for MP3 around 15 KHz Ogg does maintain, although diminishing, some of higher frequencies

That’s All Folks!