Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg

2 MPEG-4 Audio - Outline Psycho-acoustic models Overview of MPEG-4 Audio AAC - Advanced Audio Codec Specialized coders Synthetic (structured) audio

3 Psycho-acoustic models A psycho-acoustic model tells how humans perceive the sound. The main feature of the psycho-acoustic model in the compression context is that it tells what parts that we can remove.

4 Hearing Threshold dB 0 10 20 30 40 24681012 Will not be heard anyway; discard! kHz

5 Frequency Masking Energy Frequency

6 Frequency Masking Energy Frequency

7 Temporal Masking Energy Time Strong sound (”masker”) Forward (post) masking Approx. 100 ms Backward (pre) masking < 10 ms

8 Psycho-acoustic Model: Demo Music without distortion Music with white noise Music with perceptually distributed noise

9 Parts of MPEG-4 Audio General natural audio – AAC BSAC TwinVQ – HILN (parametric) Natural speech – CELP – HVXC (parametric) Synthetic audio – TTS – SAOL – SASL Composition – Mixing – Re-sampling – 3D-rendering

10 Parts of MPEG-4 Audio (cont.) Error Protection – CRC – FEC Block code Convolution code – Interleaving Error Resilience – Error resilient bitstreams – Error concealment

11 Natural Audio Coders Quality Cellular Telephone AM FM CD 248163264 kbit/s Parametric speech (HVXC) High quality speech (CELP) General audio (AAC, TwinVQ) Parametric audio (HILN)

12 MPEG-2/4 AAC: Advanced Audio Coder DCT-based time/frequency coder. Typically 16 – 64 kbit/s/channel. ”Expert listener quality” at 128 kbit/s. Added to MPEG-2, but without MPEG-4 features. Half the bitrate compared to mp3, mainly due to improved psycho-acoustic model. kbits/skHzHaydnTracy Chapman Mono16 Stereo3216 Stereo6432

13 MPEG-4 Extensions to the AAC TwinVQ (Transform-domain Weighted Interleave) – Improves performance for low bitrates (6-18 kbit/s). PNS (Perceptual Noise Substituion) – Allows coding ”noise-like” parts parametrically. LTP (Long-term prediction) – Allows ”tone-like” parts to be coded with higher accuracy to a lower bitrate.

14 MPEG-4 Extensions to the AAC BSAC (Bit-sliced Arithmetic Coder) – Adds scaleability to the bitstream. – 16 – 64 kbit/s in steps of 1 kbit/s. Demo: 60 40 20 kbit/s

15 Other MPEG-4 Natural Audio Coders Speech coders – High bitrate speech coder (CELP) – Low bitrate speech coder (HVXC) HILN low bitrate parametric coder – Harmonic and Individual Lines plus Noise – 4 - 16 kbit/s – Subband coder that codes each subband as a tone or as shaped noise.

16 MPEG-4 High Bitrate Speech Coder High quality CELP coder. 8 or 16 kHz sampling (NB or WB mode). 4 – 24 kbit/s. PCM (uncompressed) 16 kbit/s24 kbit/s Codebook index k LPC filter Perceptual w. filter e(n) gkgk x k (n) s(n) Basic principle of CELP coder

17 MPEG-4 Low Bitrate Speech Coder HVXC – Harmonic Vector eXcitation Coder. 8 kHz sampling, 2 – 4 kbit/s. Down to 1.2 kbit/s in variable rate mode. Sinusoidal coding for voiced parts and CELP coding for unvoiced part. HVXC can be combined with HILN. – Automatic switching between the coders – Produces one bitstream.

18 MPEG-4 Natural Audio Coders: Demo Original audio Music coder (TwinVQ) Music coder (HILN) Speech coder (CELP) Speech coder (HVXC) 6 kbit/s 2 kbit/s Speech Simple music Complex music

19 Speed Change Possibility to decode to arbitrary speed, without changing the pitch. Original Music ~20% faster

20 Synthetic Audio TTS – Text-To-Speech – MPEG-4 defines an interface, not the TTS itself SAOL - Structured Audio Orchestra Language – SAOL describes how to generate instruments SASL - Structured Audio Score Language – SASL describes which instruments to play when – MIDI is a subset of SASL Demo: – Orchestra: Initially 80 kB instrument descriptions (SAOL) – While playing: 1 kbit/s (SASL)

21 BIFS –Binary Format for Scene Description All the sound you hear is coded at 16 kbit/s. Initial voice coded using TTS. Current voice coded using parametric speech coder (HVXC). Background ”music” coded using Structured Audio. Post-production specified using BIFS, using the Structured Audio tools.

22 A Scene Graph AudioMix AudioFX AudioSource Mix the sounds Add reverb Hand claps (SA decoder) Speech (CELP-coder)

23 AudioMix AudioFX AudioDelay AudioFX AudioSource PianoBass (SA)Finger snaps

That was the last slide!

Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Similar presentations

Presentation on theme: "Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.

Similar presentations

Presentation on theme: "Part II (MPEG-4) Audio TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg."— Presentation transcript:

Similar presentations

About project

Feedback