Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression.

Similar presentations


Presentation on theme: "Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression."— Presentation transcript:

1 Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

2 2 Course Map Numbers correspond to course weeks 2,5 6 11 13 12 Week 7 Psychoacoustic Compression Today: audio signal processing – putting it all together ESE 250 – S’12 Kod & DeHon

3 Week 7 Psychoacoustic Compression3ESE 250 – S’12 Kod & DeHon Today’s Agenda ? How do we compress from WAV Bit Rate (per channel): ~ 700 kbps @ 44.1 kHz Down to MP3 Target (per channel) : ~ 60 kbps @ 44.1 kHz ?

4 Where are we ? Week 2  Received signal is sampled & quantized  q = PCM[ r ] Week 3  Quantized Signal is Coded  c =code[ q ] Week 4  Sampled signal first transformed into frequency domain  Q = DFT[ q ] Week 5  signal oversampled & low pass filtered  Q = LPF[ DFT(q+n) ] Week 6  Transformed signal analyzed  Using human psychoaoustic models Week 7  Acoustically Interesting signal is “perceptually coded”  C = MP3[ Q] Over Sample DFT LPF DecodeProduce r(t)r(t) p(t)p(t) q + n C Perceptual Coding Store / Transmit Q + N Q Week 4 Week 6 Week 5Week 3 [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000] Week 7 Psychoacoustic Compression ESE 250 – S’12 Kod & DeHon4

5 Week 7 Psychoacoustic Compression5ESE 250 – S’12 Kod & DeHon - T 0 /2T 0 /2T0T0 -T 0 TATA - T 0 /2T 0 /2 TNTN - T 0 /2T 0 /2 Week 5 Review (Oversampling) audio-relevant signal  q(t) q(t)  Time window, T 0 sec per “block”  Nyquist sample rate, T A  ideally n A = T 0 / T A samples per “block”: q = (q n A …, q -2, q -1, q 0, q 1, q 2, …, q n A,, ) = PCM[ q(t) ] ambient noise  n(t) n(t)  Nyquist sample rate, T N << T A receive signal in a block  n S = T 0 / T N >> n A = T 0 / T A  ultimately, record r = (r -n N, …, r -2, r -1, r 0, r 1, r 2, …, r n N ) = PCM[ r(t) ] = PCM[ q(t) + n(t)] = q + n = (q -n N + n -n N, …, q -2 + n -2, q -1 + n -1, q 0 + n 0, q 1 + n 1, q 2 + n 2, …, q n N + n -n N ) q(t)q(t) n(t)n(t) r(t) = q(t) + n(t)

6 Week 7 Psychoacoustic Compression6ESE 250 – S’12 Kod & DeHon - T 0 /2T 0 /2 r(t) = q(t) + n(t) Given r, compute frequency domain representation  R = DFT[ r ] = (R -n N, …, R -2, R -1, R 0, R 1, R 2, …, R n N ) = (Q -n N + N -n N, …, Q -2 + N -2, Q -1 + N -1, Q 0 + N 0, Q 1 + N 1, Q 2 + N 2, …, Q n N + N -n N ) introduce assumptions about frequency content:  k > n A =  A /  0 ) Q k = 0  k < n A =  A /  0 ) N k = 0 to realize R = (0,…, 0, Q n A …, Q -2, Q -1, Q 0, Q 1, Q 2, …, Q n A,,, 0,…, 0 ) + (N -n M,…, N -n A, 0, …, 0, 0, 0, 0, 0, …, 0, N n A,…, N n M ) Low Pass Filter Q = (Q n A …, Q -2, Q -1, Q 0, Q 1, Q 2, …, Q n A,, ) Bit count = n A 32  T 0 ¼ 1/44 sec ) n A ¼ 1000  T 0 ¼ 1/88 sec ) n A ¼ 500 Week 5 Review (Anti-Aliasing) …… nAnA nSnS - n S - n A …… nAnA nSnS - n S - n A DFT LPF

7 Week 7 Psychoacoustic Compression7ESE 250 – S’12 Kod & DeHon Week 6 Review (Hearing Model) Power Spectrum Model of Hearing:  Critical Bands: Auditory system contains finite array of adaptively tunable, overlapping bandpass filters  Frequency Bins: humans process a signal’s component (against noisy background) in the one filter with closest center frequency  Masking: certain signal components in a given band are “favored” and others are filtered out Established through decades of psychoacoustic experiments Model underlying today’s algorithmic thinking B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.

8 Week 7 Psychoacoustic Compression8ESE 250 – S’12 Kod & DeHon acquire & transform “frame” assign frequencies to bands  use psychoacoustic model lookup  to determine frequency bandwidth  of each critical band Today: Critical Band Assignments …… nAnA nSnS - n S - n A |Q| LUT Bands QnAQnA …… Q1Q1 Q2Q2 Q3Q3 Q4Q4 Q5Q5 Q k-1 Q k+1 QkQk … …… 12k22 …… 

9 Week 7 Psychoacoustic Compression9ESE 250 – S’12 Kod & DeHon use psychoacoustic model to minimize bits per critical band C band k(j) = Round[Q band k(j), Level] by appeal to “masking” models  Q band k = C band k + D band k  where distortion (“perceptual noise”)  D band k  between retained signal  C band k  and actual signal  Q band k  should be “masked” by retained signal Today: Code (Compress) Frequencies BandsLossy Coding

10 Week 7 Psychoacoustic Compression10ESE 250 – S’12 Kod & DeHon E.g., Look at k th band  Q band k = (Q band k(1), …,, Q band k(m) )  amplitudes represented as reals Determine masking paradigm  tone-masking-noise  noise-masking-tone  noise-masking-noise E.g., for tone masker,  Pick tone frequency, band k (j)  at maximal amplitude in the band Choose quantization level and compute compressed signal  C band k(j) = Round[Q band k(j), Level]  C band k = ( 0 band k(1), …, C band k(j), …, 0 band k(m) ) Assess noise magnitude, | D band k | D band k = (Q band k(1), …, Q band k(j), …,, Q band k(m) ) - (0 band k(1), …, C band k(j), …, 0 band k(m) Use psychoacoustic model  to determine whether compressed signal  will mask the distortion noise for that band Single Band SPL frequency SPL 1 bit | Noise | frequency SPL 1 bit Signal Real Input frequency SPL 2 bit Signal frequency SPL 2 bit | Noise | SMR for 2 bits SMR for 1 bit more bits yields larger signal-to- mask-ratio

11 Week 7 Psychoacoustic Compression11ESE 250 – S’12 Kod & DeHon Overview of Perceptual Coding Goals  digitally represent a signal  minimum number of bits  “transparent” reproduction o most sensitive human o cannot distinguish between o original and generated signal Perceptual Entropy  Using psychoacoustic model to estimate information content of audio signals [J. D. Johnston. IEEE J. Sel. Ar. Comm., 6(2):314–323, 1988]  suggested transparency achievable at 2 bits per sample  or 88 kbps @ 44.1 kHz Our present (WAV) bit count = 32 bits per sample  T 0 ¼ 1/44 sec ) n A ¼ 1000 ) bits per frame ¼ 32 ¢ 44 kbps ¼ 1.4 ¢ 10 3 kbps  T 0 ¼ 1/88 sec ) n A ¼ 500 ) bits per frame ¼ 16 ¢ 88 kbps ¼ 1.4 ¢ 10 3 kbps Our leverage? [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

12 Week 7 Psychoacoustic Compression12ESE 250 – S’12 Kod & DeHon What are the “Knobs” ? “Reservoir”  Are all frames equally full of audio information? Masking  How should we exploit the perceptual model? Local decoupling  Can we exploit (rough) independence of each band? Global accounting  How should we re-impose the average frame-rate?

13 Week 7 Psychoacoustic Compression13ESE 250 – S’12 Kod & DeHon MP3 Encoder Design Strategy Target: ~ 1032 bits per frame  ~ 2 bits per sample  ~ 512 samples per frame Use masker(signal)-masking-maskee(noise) paradigm  assume 1 masker “costs” ~ 2 K bits  ) retained signal, C should consist  of ~ 2 10 – K maskers on average o specified amplitudes o at specified frequencies  allocated to some subset of the ~ 32 = 2 5 critical bands Rough algorithm for computing retained signal (i) frame bit-reservoir: supplies bits per each critical band (ii) commit to masker(s) within each band (iii) attempt to mask within-band distortion with available bits (iv) each band: give bits back to or take more from reservoir (v) iterate

14 Week 7 Psychoacoustic Compression14ESE 250 – S’12 Kod & DeHon Emerging Picture Bits Retained Audio Quality Knob #1 Knob #2 Knobs #1 & #2

15 Week 7 Psychoacoustic Compression15ESE 250 – S’12 Kod & DeHon Interlude: Guitar Hero

16 Week 7 Psychoacoustic Compression16ESE 250 – S’12 Kod & DeHon The Ultimate Boss Subjective Quality Scales International Standards (a) Absolute impairment (b) Differential grades [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

17 Week 7 Psychoacoustic Compression17ESE 250 – S’12 Kod & DeHon General Dimensions of Merit Bit Rate:  bits per sample  samples per second Complexity:  computational effort required  to encode and decode Delay:  time required  to encode and decode

18 Week 7 Psychoacoustic Compression18ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [ Raissi. Technical report, MP3’ Tech, December 2002]

19 Week 7 Psychoacoustic Compression19ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm [ Raissi. Technical report, MP3’ Tech, December 2002] Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits

20 Week 7 Psychoacoustic Compression20ESE 250 – S’12 Kod & DeHon Resolution: Time vs. Frequency Example: two sample plots in time- frequency plane  Masking Thresholds for (a) castanets (b) piccolo  Recall Masking Threshold def’n: o lower volume signals o in relation to specified critical band (simultaneous; just before; soon after) o are inaudible Affects Choice of Observation Window (“frame”) (a) Castanets: prefer ~ 10 ms time resolution  Implies blurrier frequency resolution (b) Piccolo: prefer ~ 2 Critical-band frequency resolution  Implies blurrier temporal resolution [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

21 Week 7 Psychoacoustic Compression21ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

22 Week 7 Psychoacoustic Compression22ESE 250 – S’12 Kod & DeHon Use of Perceptual Entropy Transform Coding  Trade vector quantization for scalar quantization  By transforming to frequency domain (decoupled) Critical Band Analysis  Compute spectral power in each band, P f = | S f |  Determine Masker for each band via SFM  Determine JND (“just noticeable distortion”) threshold for each band Bit Assignment

23 Week 7 Psychoacoustic Compression23ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]

24 Week 7 Psychoacoustic Compression24ESE 250 – S’12 Kod & DeHon Spectral Quantization/Coding Loop Inner loop  Global gain – overall bit rate control  Shared across all spectral values  Larger value o increased quantizer step size o increased quantization noise (“distortion”) Outer Loop  Adjusts scale factors – reallocating bits to bands  Affects only the spectral values within a critical band  Insures the masker for that band will mask the distortion o Larger mask signal relative to threshold implies fewer bits needed o Smaller mask relative to threshold implies more bits needed Loop Termination  when bit rate constraint is satisfied with no audible distortion  or after a set number of iterations (with excessive bits spent) Freq. (Hz) SPL (dB) Critical.Band k-1 Critical.Band k Critical.Band k+1 …… Larger mask-to-noise ratio Smaller mask-to-noise ratio

25 Week 7 Psychoacoustic Compression25ESE 250 – S’12 Kod & DeHon Typical Quantization Control [You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.] Quantized Output amplitude Inner loop bit rate control Outer loop band specific Distortion control Frequency Input masker amplitude Critical Band Index GlobalGain = 220 ScaleFactor = 2 GlobalGain = 230 ScaleFactor = 2 Output Amplitude [quantized SPL] Input Amplitude [real SPL] Output Amplitude [quantized SPL] Input Amplitude [real SPL]

26 Week 7 Psychoacoustic Compression26ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” (complex sound; frequency resolution)  “Short” (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [ Raissi. Technical report, MP3’ Tech, December 2002]

27 Week 7 Psychoacoustic Compression27ESE 250 – S’12 Kod & DeHon To Probe Further Tutorials on Psychoacoustic Coding (in increasing order of abstraction and generality)  D. Pan, M. Inc, and I. L. Schaumburg. A tutorial on MPEG/audio compression. IEEE multimedia, 2(2):60–74, 1995.  Nikil Jayant, James Johnston, and Robert Safranek. Signal compression based on models of human perception. Proceedings of the IEEE, 81(10):1385–1422, 1993.  V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, 2001. Lightweight Overview of MP3  Rassol Raissi. The theory behind mp3. Technical report, MP3’ Tech, December 2002. Scientific Basis of MP3 Coding Standard J. D. Johnston. Transform coding of audio signals using perceptual noise criteria. IEEE Journal on selected areas in communications, 6(2):314– 323, 1988.

28 Week 7 Psychoacoustic Compression28ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression


Download ppt "Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression."

Similar presentations


Ads by Google