Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003.

Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003

Common narrowband audio codecs Codecrate (kb/s) delay (ms) multi-rateem- bedd ed VBRbit-robust/ PLC remarks iLBC15.2 13.3 20 30 --/Xquality higher than G.729A no licensing Speex2.15-- 24.6 30XXX--/Xno licensing AMR-NB4.75-- 12.2 20XX/X3G wireless G.729815X/XTDMA wireless GSM-FR1320GSM wireless (Cingular) GSM-EFR12.220X/X2.5G G.72816 12.8 2.5X/XH.320 (ISDN videconferencing) G.723.15.3 6.337.5 X/--H.323, videoconferences

Common wideband audio codecs Codecrate (kb/s) delay (ms) multi-rateem- bedd ed VBRbit-robust/ PLC remarks Speex4— 44.4 34XXX--/Xno licensing AMR-WB6.6— 23.85 20XX/X3G wireless G.72248, 56, 64 0.12 5 (1.5) X/--2 sub-bands now dated

iLBC – MOS behavior with packet loss

Recent audio codecs iLBC: optimized for high packet loss rates (frames encoded independently) AMR-NB – 3G wireless codec – 4.75-12.2 kb/s – 20 ms coding delay

Speex Open-source patent-free speech codec CELP (code-excited linear prediction) codec operating modes: – narrowband (8 kHz sampling rate) 2.15 – 24.6 kb/s delay of 30 ms – wideband (16 kHz sampling rate) 4-44.2 kb/s delay of 34 ms – ultra-wideband (32 kHz sampling rate) intensity stereo encoding variable bit rate (VBR) possible voice activity detection (VAD)

Ogg Vorbis Similar in application to AAC, MP3, VQF, …, but claims to be free of patents Ogg = container format file (also for Speex, FLAC) Vorbis = music speech codec near CD quality = 160 kb/s forward-adaptive modified DCT (discrete cosine transform) – overlapping windows – floor: carries frequency representation as piecewise linear interpolated representation on a dB amplitude scale and linear frequency scale – residue: subtract out floor  cascaded (multi-pass) vector quantization – entropy (Huffman) coding carries codec parameters in header

Sound localization Human ear uses 3 metrics for stereo localization: – intensity – time of arrival (TOA) – 7 µs – direction filtering and spectral shaping by outer ear For shorter wavelengths (4 – 20 kHz), head casts an acoustical shadow giving rise to a lower sound level at the ear farthest from the sound sources At long wavelength (20 Hz - 1 KHz) the, head is very small compared to wavelengths – In this case localization is based on perceived Interaural Time Differences (ITD) UCSC CMPE250 Fall 2002

Audio samples http://www.cs.columbia.edu/~hgs/audio/code cs.html Speex: http://www.speex.org/audio/samples/http://www.speex.org/audio/samples/ – both narrowband and wideband

Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003.

Similar presentations

Presentation on theme: "Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003.

Similar presentations

Presentation on theme: "Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003."— Presentation transcript:

Similar presentations

About project

Feedback