Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3
The steps of MP3 Input signal in digital form (sampled). Split signal into separate frequency bands corresponding to the ear’s critical bands. Separately calculate the ear’s response to ear band. Increase the quantisation step in bands were we can mask quantisation noise.
The steps of MP3 Code resultant bit stream using Huffman coding. Output the result.
Sampling Sample rate of 44.1 kHz using 16 bits per channel in stereo This gives 1,411,2000 bps blocks of 512 samples are taken Converted to frequency domain using modified DCT Split into 32 equal width bands
Psycho-acoustics and critical bands The human ear can hear from 20 Hz –20,000 Hz in frequency. Y=chirp([0:1/44100:4], 20,4, 20000, 'logarithmic'); sound(Y, 44100) The ear’s sensitivity changes with frequency. Most sensitive about Hz speech MW radio.
Psycho-acoustics and critical bands More importantly sensitivity levels are shifted by sounds which are close in frequency. This gives rise to critical bands. These are bands or areas around sounds where the sensitivity to nearby frequencies is reduced. There can be many (20-30) critical bands in the sound spectrum at any time. The bands vary in width across the audio spectrum, but are typically Hz wide between 1000 and 4000 Hz.
Psycho-acoustics and critical bands It is the exploitation of the critical bands which allows mp3 to achieve its compression. (Principles of digital Audio, Ken Pohlmann)
Masking Critical bands are exploited. Frequencies are removed if they are not audible Sounds are inaudible if masked by either louder sounds or specific frequencies
Audible Masking A psycho-acoustic model is applied Frequencies are removed if they are not audible Sounds are inaudible if masked by either louder sounds or specific frequencies
Temporal Masking A psycho-acoustic model is applied Frequencies are removed if they are not audible Sounds are inaudible if masked by either louder sounds or specific frequencies
Post Masking Around 200ms Sounds are inaudible if masked by either louder sounds or specific frequencies
Pre Masking Around 20ms Sounds are inaudible if masked by either louder sounds or specific frequencies.
Masking Demo setup Fs=44100 sig1=1000 sig2=1020 sig3=200 S1=0.9*sin(2*pi*sig1*(1:(5*F s))/Fs); S2=0.9*sin(2*pi*sig2*(1:(5*F s))/Fs); S3=0.9*sin(2*pi*sig3*(1:(5*F s))/Fs);
Masking Demo setup Play sounds – sound(S1, 44100) – sound(S2, 44100) – sound(S3, 44100) Make mixed sounds –S12=0.5*S1+0.5*S2; –S13=0.5*S1+0.5*S3; Play mixed sounds – sound(S12, 44100) – sound(S13, 44100)
Audible Masking Demo Sounds are inaudible if masked by either louder sounds or specific frequencies. Make two masked sounds S12mask=0.9*S1+0.03*S2; S13mask=0.9*S1+0.03*S3; Play them –sound(S12mask, 44100) – sound(S13mask, 44100)
Post Masking Demo Make 100ms snippets of S2 and S3. –S2sht=S2(1:round(100/1000*Fs)) –S3sht=S3(1:round(100/1000*Fs)) Play them Add them to end of S1 and play them. –sound([S1 S2sht ], 44100) –sound([S1 S3sht ], 44100) Reduce level of “snippets” by 100 and play them: sound([ S1 0.01*S2sht ], ) sound([ S1 0.01*S3sht ], 44100)
Quantisation The number of bits per channel is fixed in advance. For mp3 this ranges from 32 to 160 kbps depending on the amount of compression required Bits are allocated to the remaining frequencies in each channel
Quantisation Noise As the quantisation steps are increased, noise level increases. This is because the error between the actual signal and the quantisation step may be regarded as a separate (unwanted noise) signal added to the actual signal.
Quantisation Noise However, if we can mask this unwanted noise we can use fewer quantisation steps (less bits).
Quantisation Noise Demonstration. Input a high quality wav file. Increase the quantisation step. –Aquant=round(A(:)*(2^4- 1))/(2^4-1) –For 4 bit Listen to the sound. Keep increasing quantisation step while listening to the quantisation noise.
Huffman coding. Resultant bitstream is now reduced, because of coarser quantisation, but can be further reduced by the use of Huffman coding. Due to the nature of sound high level components at certain frequencies are less likely than low level ones and vice versa. This statistical bias can be exploited using Huffman coding.
Further reading comp.pdf Principles of digital Audio, Ken Pohlmann