A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.

Slides:



Advertisements
Similar presentations
Alex Chen Nader Shehad Aamir Virani Erik Welsh
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
MP3 Overview John Ehrhardt Elena Silenok CSE228 – Spring 03.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Psycho-acoustics and MP3 audio encoding
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 11 – MP3 and MP4 Audio (Part 7) Klara Nahrstedt Spring 2012.
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
PAC/AAC audio coding standard A. Moreno Georgia Institute of Technology ECE8873-Spring/2004
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG Further.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
Department of Computer Engineering University of California at Santa Cruz Data Compression (3) Hai Tao.
Lecture 14: Spring 2007 MPEG Audio Compression
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
Losslessy Compression of Multimedia Data Hao Jiang Computer Science Department Sept. 25, 2007.
T.Sharon-A.Frank 1 Multimedia Image Compression 2 T.Sharon-A.Frank Coding Techniques – Hybrid.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 15 – MP3 and MP4 Audio Klara Nahrstedt Spring 2014.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري
DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY * By: Ricardo A. Garcia *Research done at: University.
DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY By: Ricardo A. Garcia University of Miami School.
Psycho- acoustics and MP3 audio encoding Physics of Music PHY103.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression.
CMPT 365 Multimedia Systems
Media Representations - Audio
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
CIS679: Multimedia Basics r Multimedia data type r Basic compression techniques.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
8. 1 MPEG MPEG is Moving Picture Experts Group On 1992 MPEG-1 was the standard, but was replaced only a year after by MPEG-2. Nowadays, MPEG-2 is gradually.
CS Spring 2011 CS 414 – Multimedia Systems Design Lecture 11 – MP3 Audio & Introduction to MPEG-4 (Part 6) Klara Nahrstedt Spring 2011.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
MPEG-1Standard By Alejandro Mendoza. Introduction The major goal of video compression is to represent a video source with as few bits as possible while.
Digital Audio III. Sound compression (I) Compression of sound data requires different techniques from those for graphical data Requirements are less stringent.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Subband Coding Jennie Abraham 07/23/2009. Overview Previously, different compression schemes were looked into – (i)Vector Quantization Scheme (ii)Differential.
IntroductiontMyn1 Introduction MPEG, Moving Picture Experts Group was started in 1988 as a working group within ISO/IEC with the aim of defining standards.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Introduction to JPEG m Akram Ben Ahmed
Sub-Band Coding Multimedia Systems and Standards S2 IF Telkom University.
Fundamentals of Multimedia 2 nd ed., Chapter 14 Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Audio Codecs 14.4 MPEG-7.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
MP3 and AAC Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD
MP3 and MP4 Audio By: Krunal Tailor
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
Introduction to Audio Watermarking Schemes N. Lazic and P
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
Spread Spectrum Audio Steganography using Sub-band Phase Shifting
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004

Outline Introduction Technical Overview Polyphase Filter Bank Psychoacoustic Model Coding and Bit Allocation Conclusions and Future Work

Introduction What does MPEG-1 Audio provide? A transparently lossy audio compression system based on the weaknesses of the human ear. Can provide compression by a factor of 6 and retain sound quality. One part of a three part standard that includes audio, video, and audio/video synchronization.

Technical Overview

MPEG-I Audio Features PCM sampling rate of 32, 44.1, or 48 kHz Four channel modes: Monophonic and Dual-monophonic Stereo and Joint-stereo Three modes (layers in MPEG-I speak): Layer I: Computationally cheapest, bit rates > 128kbps Layer II: Bit rate ~ 128 kbps, used in VCD Layer III: Most complicated encoding/decoding, bit rates ~ 64kbps, originally intended for streaming audio

Human Audio System (ear + brain) Human sensitivity to sound is non-linear across audible range (20Hz – 20kHz) Audible range broken into regions where humans cannot perceive a difference called the critical bands

MPEG-I Encoder Architecture [1]

MPEG-I Encoder Architecture Polyphase Filter Bank: Transforms PCM samples to frequency domain signals in 32 subbands Psychoacoustic Model: Calculates acoustically irrelevant parts of signal Bit Allocator: Allots bits to subbands according to input from psychoacoustic calculation. Frame Creation: Generates an MPEG-I compliant bit stream.

The Polyphase Filter Bank

Polyphase Filter Bank Divides audio signal into 32 equal width subband streams in the frequency domain. Inverse filter at decoder cannot recover signal without some, albeit inaudible, loss. Based on work by Rothweiler [2]. Standard specifies 512 coefficient analysis window, C[n]

Polyphase Filter Bank Buffer of 512 PCM samples with 32 new samples, X[n], shifted in every computation cycle Calculate window samples for i=0…511: Partial calculation for i=0…63: Calculate 32 subsamples:

Polyphase Filter Bank Visualization of the filter [1] :

Polyphase Filter Bank The net effect: Analysis matrix: Requires x64 = 2560 multiplies. Each subband has bandwidth π/32T centered at odd multiples of π/64T

Polyphase Filter Bank Shortcomings: Equal width filters do not correspond with critical band model of auditory system. Filter bank and its inverse are NOT lossless. Frequency overlap between subbands.

Polyphase Filter Bank Comparison of filter banks and critical bands [1]:

Polyphase Filter Bank Frequency response of one subband [1] :

Psychoacoustic Model

The Weakness of the Human Ear Frequency dependent resolution: We do not have the ability to discern minute differences in frequency within the critical bands. Auditory masking: When two signals of very close frequency are both present, the louder will mask the softer. A masked signal must be louder than some threshold for it to be heard  gives us room to introduce inaudible quantization noise.

MPEG-I Psychoacoustic Models MPEG-I standard defines two models: Psychoacoustic Model 1: Less computationally expensive Makes some serious compromises in what it assumes a listener cannot hear Psychoacoustic Model 2: Provides more features suited for Layer III coding, assuming of course, increased processor bandwidth.

Psychoacoustic Model Convert samples to frequency domain Use a Hann weighting and then a DFT Simply gives an edge artifact (from finite window size) free frequency domain representation. Model 1 uses 512 (Layer I) or 1024 (Layers II and III) sample window. Model 2 uses a 1024 sample window and two calculations per frame.

Psychoacoustic Model Need to separate sound into “tones” and “noise” components Model 1: Local peaks are tones, lump remaining spectrum per critical band into noise at a representative frequency. Model 2: Calculate “tonality” index to determine likelihood of each spectral point being a tone based on previous two analysis windows

Psychoacoustic Model “Smear” each signal within its critical band Use either a masking (Model 1) or a spreading function (Model 2). Adjust calculated threshold by incorporating a “quiet” mask – masking threshold for each frequency when no other frequencies are present.

Psychoacoustic Model Calculate a masking threshold for each subband in the polyphase filter bank Model 1: Selects minima of masking threshold values in range of each subband Inaccurate at higher frequencies – recall how subbands are linearly distributed, critical bands are NOT! Model 2: If subband wider than critical band: Use minimal masking threshold in subband If critical band wider than subband: Use average masking threshold in subband

Psychoacoustic Model The hard work is done – now, we just calculate the signal-to-mask ratio (SMR) per subband SMR = signal energy / masking threshold We pass our result on to the coding unit which can now produce a compressed bitstream

Psychoacoustic Model (example) Input [1] :

Psychoacoustic Model (example) Transformation to perceptual domain [1] :

Psychoacoustic Model (example) Calculation of masking thresholds [1] :

Psychoacoustic Model (example) Signal-to-mask ratios [1] :

Psychoacoustic Model (example) What we actually send [1] :

Coding and Bit Allocation

Layer Specific Coding Layer specific frame formats [1] :

Layer Specific Coding Stream of samples is processed in groups [1] :

Layer I Coding Group 12 samples from each subband and encode them in each frame (=384 samples) Each group encoded with 0-15 bits/sample Each group has 6-bit scale factor

Layer II Coding Similar to Layer I except: Groups are now 3 of 12 samples per-subband = 1152 samples per frame Can have up to 3 scale factors per subband to avoid audible distortion in special cases Called scale factor selection information (SCFSI)

Layer III Coding Further subdivides subbands using Modified Discrete Cosine Transform (MDCT) – a lossless transform Larger frequency resolution => smaller time resolution possibility of pre-echo Layer III encoder can detect and reduce pre-echo by “borrowing bits” from future encodings

Bit Allocation Determine number of bits to allot for each subband given SMR from psychoacoustic model. Layers I and II: Calculate mask-to-noise ratio: MNR = SNR – SMR (in dB) SNR given by MPEG-I standard (as function of quantization levels) Now iterate until no bits to allocate left: Allocate bits to subband with lowest MNR. Re-calculate MNR for subband allocated more bits.

Bit Allocation Layer III: Employs “noise allocation” Quantizes each spectral value and employs Huffman coding If Huffman encoding results in noise in excess of allowed distortion for a subband, encoder increases resolution on that subband Whole process repeats until one of three specified stop conditions is met.

Conclusions and Future Work

Conclusions MPEG-I provides tremendous compression for relatively cheap computation. Not suitable for archival or audiophile grade music as very seasoned listeners can discern distortion. Modifying or searching MPEG-I content requires decompression and is not cheap!

Future Work MPEG-1 audio lays the foundation for all modern audio compression techniques Lots of progress since then (1994!) MPEG-2 (1996) extends MPEG audio compression to support 5.1 channel audio MPEG-4 (1998) attempts to code based on perceived audio objects in the stream Finally, MPEG-7 (2001) operates at an even higher level of abstraction, focusing on meta-data coding to make content searchable and retrievable

References [1] D. Pan, “A Tutorial on MPEG/Audio Compression”, IEEE Multimedia Journal, [2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New Subband Coding Technique”, Proc of the Int. Conf. IEEE ASSP, 27.2, pp , Boston 1983.