CS :: Fall 2003 Audio Coding Ketan Mayer-Patel
CS :: Fall 2003 Overview of Today PCM –Linear – -LaW DPCM ADPCM MPEG-1 Vocoding Sampling Techniques Generic Coding Techniques Psychoacoutic Coding Speech Specific Techniques
CS :: Fall 2003 Audio Signals Analog audio is basically voltage as a continuous function of time. Unlike video which is 3D, audio is a 1D signal. –Can capture without having to discretize the higher dimensions. Audio sampling basically boils down to quantizing signal level to a set of values. Digital audio parameters: –bits per sample –sampling rate –number of channels.
CS :: Fall 2003 Sampling Pulse Amplitude Modulation (PAM) –Each sample’s amplitude is represented by 1 analog value Sampling theory (Nyquist) –If input signal has maximum frequency (bandwidth) f, sampling frequency must be at least 2f –With a low-pass filter to interpolate between samples, the input signal can be fully reconstructed
CS :: Fall 2003 PCM Pulse Code Modulation (PCM) –Each sample’s amplitude represented by an integer code-word –Each bit of resolution adds 6 dB of dynamic range –Number of bits required depends on the amount of noise that is tolerated Quantization error (“noise”) n = SNR –
CS :: Fall 2003 Linear PCM Uses evenly spaced quantization levels. Typically 16-bits per sample. Provides a large dynamic range. Difficult for humans to perceive quantization noise. Compact Disks –16-bit linear sampling –44.1 KHz sampling rate –2 channels
CS :: Fall 2003 Non-linear Sampling If we try to use 8 bits per sample, dynamic range is reduced significantly and quantization noise can be heard. In particular, we end up with not enough levels for the lower amplitudes. Solution is to sample more densely in the lower amplitudes and less densely for the higher amplitudes. Sort of like a log scale.
CS :: Fall 2003 Non-linear Sampling Illustrated Input Output
CS :: Fall 2003 -law and A-law Non-linear sampling called “companding” 8-bits companded provides dynamic range equivalent to 12-bits. U-law and A-law are companding standards defined in G.711 Difference is in exact shape of piece-wise linear companding function.
CS :: Fall 2003 f(x) = 127 x sign(x) x ln(1 + |x|) ln(1 + ) (x normalized to [-1, 1]) -Law companding Provides 14-bit quality (dynamic range) with an 8-bit encoding Used in North American & Japanese ISDN voice service Simple to compute encoding
CS :: Fall 2003 High-resolution PCM encoding (12, 14, 16 bits) 8-bit -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Input Amplitude Step Size Segment Quanti- zation Code Value -Law Encoding
CS :: Fall 2003 High-resolution PCM encoding (12, 14, 16 bits) 8-bit -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Multiplier -Law Endoding Decode Amplitude -Law Decoding
CS :: Fall Difference Encoding Differential-PCM (DPCM) –Exploit temporal redundancy in samples –Difference between 2 x-bit samples can be represented with significantly fewer than x-bits –Transmit the difference (rather than the sample)
CS :: Fall “Slope Overload” Slope Overload Problem Differences in high frequency signals near the Nyquist frequency cannot be represented with a smaller number of bits! –Error introduced leads to severe distortion in the higher frequencies
CS :: Fall 2003 Adaptive DPCM (ADPCM) Use a larger step-size to encode differences between high-frequency samples & a smaller step- size for differences between low-frequency samples Use previous sample values to estimate changes in the signal in the near future
CS :: Fall Predictor + – + y-bit PCM sample x-bit ADPCM “difference” Difference Quantizer Step-Size Adjuster Dequantizer + Predicted PCM Sample n+1 ADPCM To ensure differences are always small... –Adaptively change the step-size (quanta) –(Adaptively) attempt to predict next sample value
CS :: Fall Register + – + 16-bit PCM sample 4-bit ADPCM difference Difference Quantizer Step-Size Adjuster Dequantizer + PCM Sample n–1 IMA’s proposed ADPCM Predictor is not adaptive and simply uses the last sample value Quantization step-size increases logarithmically with signal frequency
CS :: Fall Register + – + 16-bit PCM sample PCM sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer + difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Multiples Quantizer Output IMA Difference Quantization
CS :: Fall Index Step Size Index Step Size Index Step Size Index Step Size Index Step Size IMA Step-size Table
CS :: Fall Register + – + 16-bit PCM Sample PCM Sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer + Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index Quantizer Output New Step-Size Adaptive Step-size Selection
CS :: Fall 2003 Step-Size Table Index Adjustment Quantizer Output Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index New Step-Size Difference Quantizer Difference Quantizer difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Adjustment X 0.91 X 1.21 X 1.46 X 1.77 X Adaptive Step-size Selection
CS :: Fall 2003 X Step Q Adj I M Decode Input Difference Quantizer output Step Size Index Adjustment Step-Size table index Predicted value Step-size multiplier Reconstituted difference + Register + – + Difference Quantizer Difference Quantizer Step-Size Adjuster Step-Size Adjuster Dequantizer + XnXn X n–1 + IMA ADPCM Example
CS :: Fall Step-Size Adjuster + PCM sample n–1 difference < step_size step_size < difference < step_size step_size < difference Quantization Step-Size Table Index Adjustment Quantizer Output Dequantizer Register Networking Considerations The IMA codec is reasonably robust to errors An interval with a low-level signal will correct any step- size error
CS :: Fall 2003 Psychoacoustic Properties Human perception of sound is a function of frequency and signal strength –(MPEG exploits this relationship.) Sound Level (dB) Frequency (kHz) Inaudible Audible
CS :: Fall Sound Level (dB) Frequency (kHz) Inaudible Audible Masking tone Masked tone Auditory Masking The presence of tones at certain frequencies makes us unable to perceive tones at other “nearby” frequencies –Humans cannot distinguish between tones within 100 Hz at low frequencies and 4 kHz at high frequencies
CS :: Fall 2003 MPEG Encoder Block Diagram MappingQuantizerCoding Frame Packing Psycho- acoutstic Model PCM Audio Samples (32, 44.1, 48 kHz) Encoded Bitstream Ancillary Data
CS :: Fall 2003 Subband Filter Transforms signal from time domain to frequency domain. –32 PCM samples yields 32 subband samples. Each subband corresponds to a freq. band evenly spaced from 0 to Nyquist freq. –Filter actually works on a window of 512 samples that is shifted over 32 samples at a time. Subband coefficients are analyzed with psychoacoustic model, quantized, and coded.
CS :: Fall 2003 Layer samples per frame. Iterative bit allocation process: –For each subband, determine MNR. –Increase number of quantization bits for subband with smallest MNR. –Iterate until all bits used. Fixed allocation of bits among subbands for a particular frame. Up to 448 kb/s
CS :: Fall 2003 Layer samples per frame. Iterative bit allocation. Subband allocation is dynamic. Up to 384 kb/s
CS :: Fall 2003 Layer samples –Up to 320 kb/s Each subband further analyzed using MDCT to create 576 frequency lines. –4 different windowing schemes depending on whether samples contain “attack” of new frequencies. Lots of bit allocation options for quantizing frequency coefficients. Quantized coefficients Huffman coded.
CS :: Fall 2003 Vo-coding Concept: Develop a mathematical model of the vocal cords & throat –Derive/compute model parameters for a short interval and transmit to the decoder –Use the parameters to synthesize speech at the decoder So what is a good model? –A “buzzer” in a “tube”! –The buzzer is characterized by its intensity & pitch –The tube is characterized by its formants
CS :: Fall Amplitude Frequency (kHz) Vocoding - Basic Concepts Formant — frequency maxima & minima in the spectrum of the speech signal Vocoders group and code portions of the signal by amplitude
CS :: Fall 2003 “yadda yadda yadda” y(n) = a k y(n – k) + G x x(n) k=1 p Linear Predictive Coding (LPC) –A sample is represented as a linear combination of p previous samples “Buzzer” and “Tube” Model Vocoding principles: –voice = formants + buzz pitch & intensity –voice – estimated formants = “residue”
CS :: Fall 2003 LPC Decoder artificially generates speech via formant synthesis –A mathematical simulation of the vocal tract as a series of bandpass filters –Encoder codes & transmit filter coefficients, pitch period, gain factor, & nature of excitation Standards: –Regular Pulse Excited Linear Predictive Coder (RPE-LPC) Digital cellular standard GSM 6.1 (13 kbps) –Code Excited Linear Predictive Coder (CELP) US Federal Standard 1016 (4.8 kbps) –Linear Predictive Coder (LPC) US Federal Standard 1015 (2.4 kbps)
CS :: Fall 2003 Networking Concerns Audio bandwidth is actually quite small. But human sensitivity to loss and noise is quite high. Netwoking concerns: –Loss concealment –Jitter control Especially for telephony applications.