Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.

Similar presentations


Presentation on theme: "COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel."— Presentation transcript:

1 COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel

2 COMP 249 :: Spring 2005 Slide: 2 Overview of Today PCM –Linear –  -LaW DPCM ADPCM MPEG-1 Vocoding Sampling Techniques Generic Coding Techniques Psychoacoustic Coding Speech Specific Techniques

3 COMP 249 :: Spring 2005 Slide: 3 Audio Signals Analog audio is basically voltage as a continuous function of time. Unlike video which is 3D, audio is a 1D signal. –Can capture without having to discretize the higher dimensions. Audio sampling basically boils down to quantizing signal level to a set of values. Digital audio parameters: –bits per sample –sampling rate –number of channels.

4 COMP 249 :: Spring 2005 Slide: 4 Sampling Pulse Amplitude Modulation (PAM) –Each sample’s amplitude is represented by 1 analog value Sampling theory (Nyquist) –If input signal has maximum frequency (bandwidth) f, sampling frequency must be at least 2f –With a low-pass filter to interpolate between samples, the input signal can be fully reconstructed

5 COMP 249 :: Spring 2005 Slide: 5 PCM Pulse Code Modulation (PCM) Each sample’s amplitude represented by an integer code-word Each bit of resolution adds 6 dB of dynamic range Number of bits required depends on the amount of noise that is tolerated 0100 0011 0010 0001 0000 1001 1010 1011 1100 Quantization error (“noise”) n = SNR – 4.77 6.02

6 COMP 249 :: Spring 2005 Slide: 6 Linear PCM Uses evenly spaced quantization levels. Typically 16-bits per sample. Provides a large dynamic range. Difficult for humans to perceive quantization noise. Compact Disks –16-bit linear sampling –44.1 KHz sampling rate –2 channels

7 COMP 249 :: Spring 2005 Slide: 7 Non-linear Sampling If we try to use 8 bits per sample, dynamic range is reduced significantly and quantization noise can be heard. In particular, we end up with not enough levels for the lower amplitudes. Solution is to sample more densely in the lower amplitudes and less densely for the higher amplitudes. Sort of like a log scale.

8 COMP 249 :: Spring 2005 Slide: 8 Non-linear Sampling Illustrated Input Output

9 COMP 249 :: Spring 2005 Slide: 9  -law and A-law Non-linear sampling called “companding” 8-bits companded provides dynamic range equivalent to 12-bits. U-law and A-law are companding standards defined in G.711 Difference is in exact shape of piece-wise linear companding function.

10 COMP 249 :: Spring 2005 Slide: 10 f(x) = 127 x sign(x) x ln(1 +  |x|) ln(1 +  ) (x normalized to [-1, 1])  -Law companding Provides 14-bit quality (dynamic range) with an 8-bit encoding Used in North American & Japanese ISDN voice service Simple to compute encoding

11 COMP 249 :: Spring 2005 Slide: 11 High-resolution PCM encoding (12, 14, 16 bits) 8-bit  -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Input Amplitude 0-1 1-3 29-31 31-35 91-95 95-103 215-223 223-239 463-479 Step Size 1 2 4 8 16 Segment 000 001 010 011 Quanti- zation 0000 0001 1111 0000 1111 0000 1111 0000 1111 Code Value 0 1 15 16 31 32 47 48 63...  -Law Encoding

12 COMP 249 :: Spring 2005 Slide: 12 High-resolution PCM encoding (12, 14, 16 bits) 8-bit  -Law encoding 14-bit decoding Sender Table Lookup Inverse Table Lookup Receiver Multiplier 1 2 4 8 16  -Law Endoding 0000000 0000001 0001111 0010000 0011111 0100000 0101111 0110000 0111111 Decode Amplitude 0 2 30 33 93 99 219 231 471...  -Law Decoding

13 COMP 249 :: Spring 2005 Slide: 13 0100 0011 0010 0001 0000 1001 1010 1011 1100 Difference Encoding Differential-PCM (DPCM) –Exploit temporal redundancy in samples –Difference between 2 x-bit samples can be represented with significantly fewer than x-bits –Transmit the difference (rather than the sample)

14 COMP 249 :: Spring 2005 Slide: 14 0100 0011 0010 0001 0000 1001 1010 1011 1100 “Slope Overload” Slope Overload Problem Differences in high frequency signals near the Nyquist frequency cannot be represented with a smaller number of bits! –Error introduced leads to severe distortion in the higher frequencies

15 COMP 249 :: Spring 2005 Slide: 15 Adaptive DPCM (ADPCM) Use a larger step-size to encode differences between high-frequency samples & a smaller step- size for differences between low-frequency samples Use previous sample values to estimate changes in the signal in the near future

16 COMP 249 :: Spring 2005 Slide: 16 + + Predictor + – + y-bit PCM sample x-bit ADPCM “difference” Difference Quantizer Step-Size Adjuster Dequantizer + Predicted PCM Sample n+1 ADPCM To ensure differences are always small... –Adaptively change the step-size (quanta) –(Adaptively) attempt to predict next sample value

17 COMP 249 :: Spring 2005 Slide: 17 + Register + – 16-bit PCM sample 4-bit ADPCM difference Difference Quantizer Step-Size Adjuster Dequantizer PCM Sample n–1 IMA’s proposed ADPCM Predictor is not adaptive and simply uses the last sample value Quantization step-size increases logarithmically with signal frequency

18 COMP 249 :: Spring 2005 Slide: 18 + + Register + – + 16-bit PCM sample PCM sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer + difference < step_size 000 0.0 step_size < difference < step_size 001 0.25 step_size < difference < step_size 010 0.50 step_size < difference < step_size 011 0.75 step_size < difference < step_size 100 1.0 step_size < difference < step_size 101 1.25 step_size < difference < step_size 110 1.5 step_size < difference 111 1.75 1 4 1 2 3 4 5 4 3 2 7 4 1 4 1 2 3 4 5 4 3 2 7 4 Quantization Step-Size Multiples Quantizer Output IMA Difference Quantization

19 COMP 249 :: Spring 2005 Slide: 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 7 8 9 10 11 12 13 14 16 17 19 21 23 25 28 31 34 37 Index Step Size 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 41 45 50 55 60 66 73 80 88 97 107 118 130 143 157 173 190 209 Index Step Size 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 230 253 279 307 337 371 408 449 494 544 598 658 724 796 876 963 1060 1166 Index Step Size 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 1282 1411 1552 1707 1878 2066 2272 2499 2749 3024 3327 3660 4026 4428 4871 5358 5894 6484 Index Step Size 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 7132 7845 8630 9493 10442 11487 12635 13899 15289 16818 18500 20350 22358 24623 27086 29794 32767 Index Step Size IMA Step-size Table

20 COMP 249 :: Spring 2005 Slide: 20 + Register + – 16-bit PCM Sample PCM Sample n–1 4-bit ADPCM difference (in step-size units) Difference Quantizer Step-Size Adjuster Dequantizer Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index Quantizer Output New Step-Size Adaptive Step-size Selection

21 COMP 249 :: Spring 2005 Slide: 21 Step-Size Table Index Adjustment Quantizer Output 000 001 010 011 100 101 110 111 Step-Size Table Lookup Range Limit (0 to 88) Register Step-Size Table Index Adjustment Lookup + Index Adjustment Previous Index New Step-Size Difference Quantizer Difference Quantizer difference < step_size step_size < difference < step_size step_size < difference 1 4 1 2 3 4 5 4 3 2 7 4 1 4 1 2 3 4 5 4 3 2 7 4 Quantization Step-Size Adjustment X 0.91 X 1.21 X 1.46 X 1.77 X 2.14 2 4 6 8 Adaptive Step-size Selection

22 COMP 249 :: Spring 2005 Slide: 22 X  Step Q Adj I M  Decode 150 7 0 150 155 5 7 010 -1 0 0.5 3.5 154 167 13 7 111 8 8 1.75 12 166 170 4 16 001 -1 7 0.25 4 170 250 80 14 111 8 15 1.75 24.5 195 250 55 31 111 8 23 1.75 54 249 250 1 66 000 -1 22 0.0 0 249 250 1 60 000 -1 21 0.0 0 249 200 -49 55 011 -1 20 0.75 -41 208 200 Input Difference Quantizer output Step Size Index Adjustment Step-Size table index Predicted value Step-size multiplier Reconstituted difference + Register + – + Difference Quantizer Difference Quantizer Step-Size Adjuster Step-Size Adjuster Dequantizer + XnXn X n–1 + IMA ADPCM Example

23 COMP 249 :: Spring 2005 Slide: 23 + + Step-Size Adjuster + PCM sample n–1 difference < step_size step_size < difference < step_size step_size < difference 1 4 1 2 3 4 5 4 3 2 7 4 1 4 1 2 3 4 5 4 3 2 7 4 Quantization Step-Size Table Index Adjustment Quantizer Output 000 -1 001 -1 010 -1 011 -1 100 2 101 4 110 6 111 8 Dequantizer Register Networking Considerations The IMA codec is reasonably robust to errors An interval with a low-level signal will correct any step- size error

24 COMP 249 :: Spring 2005 Slide: 24 Psychoacoustic Properties Human perception of sound is a function of frequency and signal strength –(MPEG exploits this relationship.) 100 80 60 40 20 0 Sound Level (dB) 0.02 0.05 0.1 0.2 0.5 1 2 5 10 20 Frequency (kHz) Inaudible Audible

25 COMP 249 :: Spring 2005 Slide: 25 100 80 60 40 20 0 Sound Level (dB) 0.02 0.05 0.1 0.2 0.5 1 2 5 10 20 Frequency (kHz) Inaudible Audible Masking tone Masked tone Auditory Masking The presence of tones at certain frequencies makes us unable to perceive tones at other “nearby” frequencies –Humans cannot distinguish between tones within 100 Hz at low frequencies and 4 kHz at high frequencies

26 COMP 249 :: Spring 2005 Slide: 26 MPEG Encoder Block Diagram MappingQuantizerCoding Frame Packing Psycho- acoutstic Model PCM Audio Samples (32, 44.1, 48 kHz) Encoded Bitstream Ancillary Data

27 COMP 249 :: Spring 2005 Slide: 27 Subband Filter Transforms signal from time domain to frequency domain. –32 PCM samples yields 32 subband samples. Each subband corresponds to a freq. band evenly spaced from 0 to Nyquist freq. –Filter actually works on a window of 512 samples that is shifted over 32 samples at a time. Subband coefficients are analyzed with psychoacoustic model, quantized, and coded.

28 COMP 249 :: Spring 2005 Slide: 28 Layer 1 384 samples per frame. Iterative bit allocation process: –For each subband, determine SMR. –Increase number of quantization bits for subband with largest SMR. –Iterate until all bits used. Up to 448 kb/s 19ms theoretical minimum delay

29 COMP 249 :: Spring 2005 Slide: 29 Layer 2 1152 samples per frame. Iterative bit allocation. Analysis/synthesis a bit more complicated. –More efficient Up to 384 kb/s 34ms theoretical minimum delay

30 COMP 249 :: Spring 2005 Slide: 30 Layer 3 1152 samples –Up to 320 kb/s Each subband further analyzed using MDCT to create 576 frequency lines. Lots of bit allocation options for quantizing frequency coefficients. Quantized coefficients Huffman coded. Encoded frame sizes are variable. 59 ms theoretical minimum delay.

31 COMP 249 :: Spring 2005 Slide: 31 Vo-coding Concept: Develop a mathematical model of the vocal cords & throat –Derive/compute model parameters for a short interval and transmit to the decoder –Use the parameters to synthesize speech at the decoder So what is a good model? –A “buzzer” in a “tube”! –The buzzer is characterized by its intensity & pitch –The tube is characterized by its formants

32 COMP 249 :: Spring 2005 Slide: 32 75 60 45 30 15 0 Amplitude Frequency (kHz) Vocoding - Basic Concepts Formant –Resonance frequencies of the vocal tract. –Shapes and filters the sound of vocal cords.

33 COMP 249 :: Spring 2005 Slide: 33 “yadda yadda yadda” y(n) = a k y(n – k) + G x x(n)  k=1 p Linear Predictive Coding (LPC) A sample is represented as a linear combination of p previous samples “Buzzer” and “Tube” Model Vocoding principles: –voice = formants + buzz pitch & intensity –voice – estimated formants = “residue”

34 COMP 249 :: Spring 2005 Slide: 34 LPC Decoder artificially generates speech via formant synthesis –A mathematical simulation of the vocal tract as a series of bandpass filters –Encoder codes & transmit filter coefficients, pitch period, gain factor, & nature of excitation Standards: –Regular Pulse Excited Linear Predictive Coder (RPE-LPC) Digital cellular standard GSM 6.1 (13 kbps) –Code Excited Linear Predictive Coder (CELP) US Federal Standard 1016 (4.8 kbps) –Linear Predictive Coder (LPC) US Federal Standard 1015 (2.4 kbps)

35 COMP 249 :: Spring 2005 Slide: 35 Networking Concerns Audio bandwidth is actually quite small. But human sensitivity to loss and noise is quite high. Netwoking concerns: –Loss concealment –Jitter control Especially for telephony applications.


Download ppt "COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel."

Similar presentations


Ads by Google