Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression.

Slides:



Advertisements
Similar presentations
Alex Chen Nader Shehad Aamir Virani Erik Welsh
Advertisements

Audio Compression ADPCM ATRAC (Minidisk) MPEG Audio –3 layers referred to as layers I, II, and III –The third layer is mp3.
Department of Computer Engineering University of California at Santa Cruz MPEG Audio Compression Layer 3 (MP3) Hai Tao.
Introduction to MP3 and psychoacoustics Material from website by Mark S. Drew
Guerino Mazzola (Fall 2014 © ): Introduction to Music Technology IIIDigital Audio III.6 (Fr Oct 24) The MP3 algorithm with PAC.
MPEG/Audio Compression Tutorial Mike Blackstock CPSC 538a January 11, 2004.
CS335 Principles of Multimedia Systems Audio Hao Jiang Computer Science Department Boston College Oct. 11, 2007.
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Digital Audio Compression
August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.
Digital Audio Coding – Dr. T. Collins Standard MIDI Files Perceptual Audio Coding MPEG-1 layers 1, 2 & 3 MPEG-4.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya.
Motivation Application driven -- VoD, Information on Demand (WWW), education, telemedicine, videoconference, videophone Storage capacity Large capacity.
Digital Representation of Audio Information Kevin D. Donohue Electrical Engineering University of Kentucky.
PAC/AAC audio coding standard A. Moreno Georgia Institute of Technology ECE8873-Spring/2004
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Commercial Audio Codecs 14.4 The Future: MPEG-7 and MPEG Further.
Week 6 – Psychoacoustics ESE 250 S’13 DeHon Kadric Kod Wilson-Shah 1 ESE250: Digital Audio Basics Week 6 February 19, 2013 Human Psychoacoustics.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
MPEG-3 For Audio Presented by: Chun Lui Sunjeev Sikand.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Spatial and Temporal Data Mining
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
MPEG Audio Compression by V. Loumos. Introduction Motion Picture Experts Group (MPEG) International Standards Organization (ISO) First High Fidelity Audio.
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
Fundamentals of Perceptual Audio Encoding Craig Lewiston HST.723 Lab II 3/23/06.
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Image Compression - JPEG. Video Compression MPEG –Audio compression Lossy / perceptually lossless / lossless 3 layers Models based on speech generation.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
Formatting and Baseband Modulation
Ni.com Data Analysis: Time and Frequency Domain. ni.com Typical Data Acquisition System.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Week 5 – Nyquist-Shannon ESE 250 – S’12 Kod & DeHon 1 ESE250: Digital Audio Basics Week 5 Feb. 9, 2012 Nyquist-Shannon Theorem.
Digital Audio Watermarking: Properties, characteristics of audio signals, and measuring the performance of a watermarking system نيما خادمي کلانتري
Lecture 5: Signal Processing II EEN 112: Introduction to Electrical and Computer Engineering Professor Eric Rozier, 2/20/13.
Audio Compression Usha Sree CMSC 691M 10/12/04. Motivation Efficient Storage Streaming Interactive Multimedia Applications.
CSC361/661 Digital Media Spring 2002
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
Multiresolution STFT for Analysis and Processing of Audio
Media Representations - Audio
A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
ESE 250: Digital Audio Basics Week 4 February 5, 2013 The Frequency Domain 1ESE Spring'13 DeHon, Kod, Kadric, Wilson-Shah.
ESE 250 – S'12 Kod & DeHon 1 ESE250: Digital Audio Basics Week 4 February 2, 2012 Time-Frequency.
MPEG Audio coders. Motion Pictures Expert Group(MPEG) The coders associated with audio compression part of MPEG standard are called MPEG audio compressor.
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
The Physical Layer Lowest layer in Network Hierarchy. Physical transmission of data. –Various flavors Copper wire, fiber optic, etc... –Physical limits.
Image Processing Architecture, © 2001, 2002, 2003 Oleh TretiakPage 1 ECE-C490 Image Processing Architecture MP-3 Compression Course Review Oleh Tretiak.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
4.2 Digital Transmission Pulse Modulation Pulse Code Modulation
AUDIOFILES Harika Basana ), Elizabeth Chan ), Nikolai ), Frank Zhang ) 6100.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
1 What is Multimedia? Multimedia can have a many definitions Multimedia means that computer information can be represented through media types: – Text.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Fundamentals of Multimedia 2 nd ed., Chapter 14 Chapter 14 MPEG Audio Compression 14.1 Psychoacoustics 14.2 MPEG Audio 14.3 Other Audio Codecs 14.4 MPEG-7.
Lifecycle from Sound to Digital to Sound. Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre Hearing: [20Hz – 20KHz] Speech: [200Hz.
Fletcher’s band-widening experiment (1940)
III Digital Audio III.6 (Fr Oct 20) The MP3 algorithm with PAC.
Digital Communications Chapter 13. Source Coding
MPEG-1 Overview of MPEG-1 Standard
III Digital Audio III.6 (Mo Oct 22) The MP3 algorithm with PAC.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Week 7 Psychoacoustic Compression1ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression

2 Course Map Numbers correspond to course weeks 2, Week 7 Psychoacoustic Compression Today: audio signal processing – putting it all together ESE 250 – S’12 Kod & DeHon

Week 7 Psychoacoustic Compression3ESE 250 – S’12 Kod & DeHon Today’s Agenda ? How do we compress from WAV Bit Rate (per channel): ~ kHz Down to MP3 Target (per channel) : ~ kHz ?

Where are we ? Week 2  Received signal is sampled & quantized  q = PCM[ r ] Week 3  Quantized Signal is Coded  c =code[ q ] Week 4  Sampled signal first transformed into frequency domain  Q = DFT[ q ] Week 5  signal oversampled & low pass filtered  Q = LPF[ DFT(q+n) ] Week 6  Transformed signal analyzed  Using human psychoaoustic models Week 7  Acoustically Interesting signal is “perceptually coded”  C = MP3[ Q] Over Sample DFT LPF DecodeProduce r(t)r(t) p(t)p(t) q + n C Perceptual Coding Store / Transmit Q + N Q Week 4 Week 6 Week 5Week 3 [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000] Week 7 Psychoacoustic Compression ESE 250 – S’12 Kod & DeHon4

Week 7 Psychoacoustic Compression5ESE 250 – S’12 Kod & DeHon - T 0 /2T 0 /2T0T0 -T 0 TATA - T 0 /2T 0 /2 TNTN - T 0 /2T 0 /2 Week 5 Review (Oversampling) audio-relevant signal  q(t) q(t)  Time window, T 0 sec per “block”  Nyquist sample rate, T A  ideally n A = T 0 / T A samples per “block”: q = (q n A …, q -2, q -1, q 0, q 1, q 2, …, q n A,, ) = PCM[ q(t) ] ambient noise  n(t) n(t)  Nyquist sample rate, T N << T A receive signal in a block  n S = T 0 / T N >> n A = T 0 / T A  ultimately, record r = (r -n N, …, r -2, r -1, r 0, r 1, r 2, …, r n N ) = PCM[ r(t) ] = PCM[ q(t) + n(t)] = q + n = (q -n N + n -n N, …, q -2 + n -2, q -1 + n -1, q 0 + n 0, q 1 + n 1, q 2 + n 2, …, q n N + n -n N ) q(t)q(t) n(t)n(t) r(t) = q(t) + n(t)

Week 7 Psychoacoustic Compression6ESE 250 – S’12 Kod & DeHon - T 0 /2T 0 /2 r(t) = q(t) + n(t) Given r, compute frequency domain representation  R = DFT[ r ] = (R -n N, …, R -2, R -1, R 0, R 1, R 2, …, R n N ) = (Q -n N + N -n N, …, Q -2 + N -2, Q -1 + N -1, Q 0 + N 0, Q 1 + N 1, Q 2 + N 2, …, Q n N + N -n N ) introduce assumptions about frequency content:  k > n A =  A /  0 ) Q k = 0  k < n A =  A /  0 ) N k = 0 to realize R = (0,…, 0, Q n A …, Q -2, Q -1, Q 0, Q 1, Q 2, …, Q n A,,, 0,…, 0 ) + (N -n M,…, N -n A, 0, …, 0, 0, 0, 0, 0, …, 0, N n A,…, N n M ) Low Pass Filter Q = (Q n A …, Q -2, Q -1, Q 0, Q 1, Q 2, …, Q n A,, ) Bit count = n A 32  T 0 ¼ 1/44 sec ) n A ¼ 1000  T 0 ¼ 1/88 sec ) n A ¼ 500 Week 5 Review (Anti-Aliasing) …… nAnA nSnS - n S - n A …… nAnA nSnS - n S - n A DFT LPF

Week 7 Psychoacoustic Compression7ESE 250 – S’12 Kod & DeHon Week 6 Review (Hearing Model) Power Spectrum Model of Hearing:  Critical Bands: Auditory system contains finite array of adaptively tunable, overlapping bandpass filters  Frequency Bins: humans process a signal’s component (against noisy background) in the one filter with closest center frequency  Masking: certain signal components in a given band are “favored” and others are filtered out Established through decades of psychoacoustic experiments Model underlying today’s algorithmic thinking B.C.J. Moore. Int.Rev.Neurobiol., 70:49–86, 2005.

Week 7 Psychoacoustic Compression8ESE 250 – S’12 Kod & DeHon acquire & transform “frame” assign frequencies to bands  use psychoacoustic model lookup  to determine frequency bandwidth  of each critical band Today: Critical Band Assignments …… nAnA nSnS - n S - n A |Q| LUT Bands QnAQnA …… Q1Q1 Q2Q2 Q3Q3 Q4Q4 Q5Q5 Q k-1 Q k+1 QkQk … …… 12k22 …… 

Week 7 Psychoacoustic Compression9ESE 250 – S’12 Kod & DeHon use psychoacoustic model to minimize bits per critical band C band k(j) = Round[Q band k(j), Level] by appeal to “masking” models  Q band k = C band k + D band k  where distortion (“perceptual noise”)  D band k  between retained signal  C band k  and actual signal  Q band k  should be “masked” by retained signal Today: Code (Compress) Frequencies BandsLossy Coding

Week 7 Psychoacoustic Compression10ESE 250 – S’12 Kod & DeHon E.g., Look at k th band  Q band k = (Q band k(1), …,, Q band k(m) )  amplitudes represented as reals Determine masking paradigm  tone-masking-noise  noise-masking-tone  noise-masking-noise E.g., for tone masker,  Pick tone frequency, band k (j)  at maximal amplitude in the band Choose quantization level and compute compressed signal  C band k(j) = Round[Q band k(j), Level]  C band k = ( 0 band k(1), …, C band k(j), …, 0 band k(m) ) Assess noise magnitude, | D band k | D band k = (Q band k(1), …, Q band k(j), …,, Q band k(m) ) - (0 band k(1), …, C band k(j), …, 0 band k(m) Use psychoacoustic model  to determine whether compressed signal  will mask the distortion noise for that band Single Band SPL frequency SPL 1 bit | Noise | frequency SPL 1 bit Signal Real Input frequency SPL 2 bit Signal frequency SPL 2 bit | Noise | SMR for 2 bits SMR for 1 bit more bits yields larger signal-to- mask-ratio

Week 7 Psychoacoustic Compression11ESE 250 – S’12 Kod & DeHon Overview of Perceptual Coding Goals  digitally represent a signal  minimum number of bits  “transparent” reproduction o most sensitive human o cannot distinguish between o original and generated signal Perceptual Entropy  Using psychoacoustic model to estimate information content of audio signals [J. D. Johnston. IEEE J. Sel. Ar. Comm., 6(2):314–323, 1988]  suggested transparency achievable at 2 bits per sample  or kHz Our present (WAV) bit count = 32 bits per sample  T 0 ¼ 1/44 sec ) n A ¼ 1000 ) bits per frame ¼ 32 ¢ 44 kbps ¼ 1.4 ¢ 10 3 kbps  T 0 ¼ 1/88 sec ) n A ¼ 500 ) bits per frame ¼ 16 ¢ 88 kbps ¼ 1.4 ¢ 10 3 kbps Our leverage? [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Week 7 Psychoacoustic Compression12ESE 250 – S’12 Kod & DeHon What are the “Knobs” ? “Reservoir”  Are all frames equally full of audio information? Masking  How should we exploit the perceptual model? Local decoupling  Can we exploit (rough) independence of each band? Global accounting  How should we re-impose the average frame-rate?

Week 7 Psychoacoustic Compression13ESE 250 – S’12 Kod & DeHon MP3 Encoder Design Strategy Target: ~ 1032 bits per frame  ~ 2 bits per sample  ~ 512 samples per frame Use masker(signal)-masking-maskee(noise) paradigm  assume 1 masker “costs” ~ 2 K bits  ) retained signal, C should consist  of ~ 2 10 – K maskers on average o specified amplitudes o at specified frequencies  allocated to some subset of the ~ 32 = 2 5 critical bands Rough algorithm for computing retained signal (i) frame bit-reservoir: supplies bits per each critical band (ii) commit to masker(s) within each band (iii) attempt to mask within-band distortion with available bits (iv) each band: give bits back to or take more from reservoir (v) iterate

Week 7 Psychoacoustic Compression14ESE 250 – S’12 Kod & DeHon Emerging Picture Bits Retained Audio Quality Knob #1 Knob #2 Knobs #1 & #2

Week 7 Psychoacoustic Compression15ESE 250 – S’12 Kod & DeHon Interlude: Guitar Hero

Week 7 Psychoacoustic Compression16ESE 250 – S’12 Kod & DeHon The Ultimate Boss Subjective Quality Scales International Standards (a) Absolute impairment (b) Differential grades [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Week 7 Psychoacoustic Compression17ESE 250 – S’12 Kod & DeHon General Dimensions of Merit Bit Rate:  bits per sample  samples per second Complexity:  computational effort required  to encode and decode Delay:  time required  to encode and decode

Week 7 Psychoacoustic Compression18ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [ Raissi. Technical report, MP3’ Tech, December 2002]

Week 7 Psychoacoustic Compression19ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm [ Raissi. Technical report, MP3’ Tech, December 2002] Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits

Week 7 Psychoacoustic Compression20ESE 250 – S’12 Kod & DeHon Resolution: Time vs. Frequency Example: two sample plots in time- frequency plane  Masking Thresholds for (a) castanets (b) piccolo  Recall Masking Threshold def’n: o lower volume signals o in relation to specified critical band (simultaneous; just before; soon after) o are inaudible Affects Choice of Observation Window (“frame”) (a) Castanets: prefer ~ 10 ms time resolution  Implies blurrier frequency resolution (b) Piccolo: prefer ~ 2 Critical-band frequency resolution  Implies blurrier temporal resolution [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Week 7 Psychoacoustic Compression21ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [Painter & Spanias. Proc.IEEE, 88(4):451–512, 2000]

Week 7 Psychoacoustic Compression22ESE 250 – S’12 Kod & DeHon Use of Perceptual Entropy Transform Coding  Trade vector quantization for scalar quantization  By transforming to frequency domain (decoupled) Critical Band Analysis  Compute spectral power in each band, P f = | S f |  Determine Masker for each band via SFM  Determine JND (“just noticeable distortion”) threshold for each band Bit Assignment

Week 7 Psychoacoustic Compression23ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” frame (complex sound; frequency resolution)  “Short” frame (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.]

Week 7 Psychoacoustic Compression24ESE 250 – S’12 Kod & DeHon Spectral Quantization/Coding Loop Inner loop  Global gain – overall bit rate control  Shared across all spectral values  Larger value o increased quantizer step size o increased quantization noise (“distortion”) Outer Loop  Adjusts scale factors – reallocating bits to bands  Affects only the spectral values within a critical band  Insures the masker for that band will mask the distortion o Larger mask signal relative to threshold implies fewer bits needed o Smaller mask relative to threshold implies more bits needed Loop Termination  when bit rate constraint is satisfied with no audible distortion  or after a set number of iterations (with excessive bits spent) Freq. (Hz) SPL (dB) Critical.Band k-1 Critical.Band k Critical.Band k+1 …… Larger mask-to-noise ratio Smaller mask-to-noise ratio

Week 7 Psychoacoustic Compression25ESE 250 – S’12 Kod & DeHon Typical Quantization Control [You & Chen. Multim.Tools & Appl., 40(3):341–359, 2008.] Quantized Output amplitude Inner loop bit rate control Outer loop band specific Distortion control Frequency Input masker amplitude Critical Band Index GlobalGain = 220 ScaleFactor = 2 GlobalGain = 230 ScaleFactor = 2 Output Amplitude [quantized SPL] Input Amplitude [real SPL] Output Amplitude [quantized SPL] Input Amplitude [real SPL]

Week 7 Psychoacoustic Compression26ESE 250 – S’12 Kod & DeHon MP3 Perceptual Coding Algorithm Commit to Observation Window  “Long” (complex sound; frequency resolution)  “Short” (transient sound; temporal resolution) Estimate Perceptual Entropy  Analyze Each Critical Band  Characterize Masker  Estimate Mask-to-Noise Threshold  Update “bit reservoir” Spectral Quantization/Coding Loop  Allocate bits per critical band  Quantize band to bits-allowed levels  Run Huffmann & count actual bits [ Raissi. Technical report, MP3’ Tech, December 2002]

Week 7 Psychoacoustic Compression27ESE 250 – S’12 Kod & DeHon To Probe Further Tutorials on Psychoacoustic Coding (in increasing order of abstraction and generality)  D. Pan, M. Inc, and I. L. Schaumburg. A tutorial on MPEG/audio compression. IEEE multimedia, 2(2):60–74,  Nikil Jayant, James Johnston, and Robert Safranek. Signal compression based on models of human perception. Proceedings of the IEEE, 81(10):1385–1422,  V. K. Goyal. Theoretical foundations of transform coding. IEEE Signal Processing Magazine, 18(5):9–21, Lightweight Overview of MP3  Rassol Raissi. The theory behind mp3. Technical report, MP3’ Tech, December Scientific Basis of MP3 Coding Standard J. D. Johnston. Transform coding of audio signals using perceptual noise criteria. IEEE Journal on selected areas in communications, 6(2):314– 323, 1988.

Week 7 Psychoacoustic Compression28ESE 250 – S’12 Kod & DeHon ESE250: Digital Audio Basics Week 7 February 23, 2012 Psychoacoustic Compression