3 Psycho-acoustic models A psycho-acoustic model tells how humans perceive the sound. The main feature of the psycho-acoustic model in the compression context is that it tells what parts that we can remove.
4 Hearing Threshold dB Will not be heard anyway; discard! kHz
5 Frequency Masking Energy Frequency
6 Frequency Masking Energy Frequency
7 Temporal Masking Energy Time Strong sound (”masker”) Forward (post) masking Approx. 100 ms Backward (pre) masking < 10 ms
8 Psycho-acoustic Model: Demo Music without distortion Music with white noise Music with perceptually distributed noise
11 Natural Audio Coders Quality Cellular Telephone AM FM CD kbit/s Parametric speech (HVXC) High quality speech (CELP) General audio (AAC, TwinVQ) Parametric audio (HILN)
12 MPEG-2/4 AAC: Advanced Audio Coder DCT-based time/frequency coder. Typically 16 – 64 kbit/s/channel. ”Expert listener quality” at 128 kbit/s. Added to MPEG-2, but without MPEG-4 features. Half the bitrate compared to mp3, mainly due to improved psycho-acoustic model. kbits/skHzHaydnTracy Chapman Mono16 Stereo3216 Stereo6432
13 MPEG-4 Extensions to the AAC TwinVQ (Transform-domain Weighted Interleave) – Improves performance for low bitrates (6-18 kbit/s). PNS (Perceptual Noise Substituion) – Allows coding ”noise-like” parts parametrically. LTP (Long-term prediction) – Allows ”tone-like” parts to be coded with higher accuracy to a lower bitrate.
14 MPEG-4 Extensions to the AAC BSAC (Bit-sliced Arithmetic Coder) – Adds scaleability to the bitstream. – 16 – 64 kbit/s in steps of 1 kbit/s. Demo: kbit/s
15 Other MPEG-4 Natural Audio Coders Speech coders – High bitrate speech coder (CELP) – Low bitrate speech coder (HVXC) HILN low bitrate parametric coder – Harmonic and Individual Lines plus Noise – kbit/s – Subband coder that codes each subband as a tone or as shaped noise.
16 MPEG-4 High Bitrate Speech Coder High quality CELP coder. 8 or 16 kHz sampling (NB or WB mode). 4 – 24 kbit/s. PCM (uncompressed) 16 kbit/s24 kbit/s Codebook index k LPC filter Perceptual w. filter e(n) gkgk x k (n) s(n) Basic principle of CELP coder
17 MPEG-4 Low Bitrate Speech Coder HVXC – Harmonic Vector eXcitation Coder. 8 kHz sampling, 2 – 4 kbit/s. Down to 1.2 kbit/s in variable rate mode. Sinusoidal coding for voiced parts and CELP coding for unvoiced part. HVXC can be combined with HILN. – Automatic switching between the coders – Produces one bitstream.
18 MPEG-4 Natural Audio Coders: Demo Original audio Music coder (TwinVQ) Music coder (HILN) Speech coder (CELP) Speech coder (HVXC) 6 kbit/s 2 kbit/s Speech Simple music Complex music
19 Speed Change Possibility to decode to arbitrary speed, without changing the pitch. Original Music ~20% faster
20 Synthetic Audio TTS – Text-To-Speech – MPEG-4 defines an interface, not the TTS itself SAOL - Structured Audio Orchestra Language – SAOL describes how to generate instruments SASL - Structured Audio Score Language – SASL describes which instruments to play when – MIDI is a subset of SASL Demo: – Orchestra: Initially 80 kB instrument descriptions (SAOL) – While playing: 1 kbit/s (SASL)
21 BIFS –Binary Format for Scene Description All the sound you hear is coded at 16 kbit/s. Initial voice coded using TTS. Current voice coded using parametric speech coder (HVXC). Background ”music” coded using Structured Audio. Post-production specified using BIFS, using the Structured Audio tools.
22 A Scene Graph AudioMix AudioFX AudioSource Mix the sounds Add reverb Hand claps (SA decoder) Speech (CELP-coder)