Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund

Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund magnus.westerlund@ericsson.com

Important Features of TFRC VoIP mode Minimum packet interval 10 ms Packet rate is penalized: –X = X * S_true / (S_true + H) –H=40; Header size –S_true is complete RTP packet size, i.e. RTP+Payload Still TFRC and sending is delayed if not sufficient bit-rate available. Slow start of 4 packets, the size limitation is not an issue for the discussed codecs.

ReceiverSender System overview Contributors to system delay are: –Sampling buffering –Encoding delay –Packetization delay –Transmission delay –Transport delay (Internet) –Receiver buffering delay –Decoding delay –Playout delay Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP Codec MIC Payload Packetization DCCP Internet Codec Speaker DCCP Jitter Buffer

Problems with TFRC style packet rate penalties Varying the packetization, directly affects the system delay seen at the receiver. Requires a jitter buffer that is capable of handling the increased or decreased system delay. Frequent changes will make it more problematic for adaptive buffers to correctly parameterize the jitter. Buffer under-runs needs to be handled with little impact on voice quality. Thus insertion of audio data or invoking of error concealment becomes required.

Speech and Audio Codecs with RTP Payload formats Narrowband codecs: –G.711 (PCMA or PCMU) –G.723 –G.726 –G.728 –G.729 –GSM –GSM-EFR –AMR –EVRC –SMV –QCELP –BroadVoice 16 –iLBC Wideband codecs –AMR-WB –VMR-WB –BroadVoice 32 –G.722 Variable sampling rate –DVI4 –VDVI –L8 –L16 –PCMA –PCMU

Codec and RTP payload properties Bit-rate of encoded content Sample or frame based Frame lengths: 2.5, 5, 10, 20, 30, etc. frame lengths in milliseconds Basically all payload formats supports aggregation, however some have modes where it is restricted.

DTX and Comfort Noise DTX is Discontinuous Transmission Voice activity detector (VAD) detects if there is active speech or not. When there is no active speech different DTX procedures can be used: –No Transmission at all –Comfort Noise (CN) using RFC 3389 –Codec built CN in like AMR SID (Silence Descriptor) Frequency of Comfort Noise packets varies but is usually some fraction of normal packet rate

Sample based codecs Speech bandwidth depends on sampling rate. Sample based, and can usually handle any number of samples per packet. Usually no adaptivity other than packetization. Some can vary quantization, like G.726. Bit-rate depends on sampling rate and sample quantization. Example: G.711 uses 8 bits per sample, and 8kHz sampling. Resulting in 64 kbps audio data rate. Comfort noise may be supported using RFC 3389.

AMR 3GPP defined, mandatory speech codec in UMTS 3G networks Narrowband codec (8kHz audio sampling rate) Frame-based with 20ms frames Multi-rate: has 8 encoding modes with bit-rate between 12.2 and 4.75 kbps. Has comfort noise generation (SID) and DTX. The SID (Silence Descriptor) is sent in every 8 th frame and is 5 bytes in size.

EVRC and SMV 3GPP2 defined, required in CDMA networks Narrowband codecs (8kHz audio sampling rate) Frame-based with 20 ms frames Encodes at 3 (EVRC) or 4 (SMV) different rates, varying from 8.55 to 0.8 kbps depending on audio input. Thus highly variable packet sizes. The average bit-rate is dependent on codec modes, Each mode selects the used encoding rates differently to provide different average rates. Lacks DTX and needs to transmit all frames. One mode in the payload format requires a single frame per packet.

Broad Voice 16 Broadcom defined coded, used in voice over cable Narrowband codec (8kHz audio sampling rate) Frame-based with 5ms frames, thus needing at least 2 frames per packet aggregation for TFRC VoIP mode. No rate adaptation, fixed encoding at 16 kbps. No built in comfort noise or DTX.

Broad Voice 32 Broadcom defined coded, used in voice over cable Wideband codec (16kHz audio sampling rate) Frame-based with 5ms frames, thus needing at least 2 frames per packet aggregation for TFRC VoIP mode. No rate adaptation, fixed encoding at 32 kbps. No built in comfort noise or DTX.

AMR-WB 3GPP specified codec, mandatory in UMTS 3G if wideband supported Wideband codec (16kHz audio sampling rate) Frame-based with 20ms frames Multi-rate encoding at 9 different rates between 23.85 and 6.6 kbps Has built in support for DTX and comfort noise (SID) SID (silence descriptor) is sent every 8 th frame and is 5 bytes in size

VMR-WB 3GPP2 defined Wideband Codec (16kHz audio sampling rate) Frame-based with 20 ms frames Encodes using 4 different rates (13.3-1.0 kbps) Has compatibility mode with AMR-WB (12.6, 8.85, 6.60) Has DTX mode

Summary of codecs AMREVRCSMVBV16BV32AMR- WB VMR- WB Sampli ng rate 8k 16k Frame size 20 55 Bit-rate (kbps) 4.75- 12.2 0.8-8.8 (4.2) 16326.6- 23.85 1.0-13.3 Runtime codec adaption YYYNNYY DTX YNNNNYY

The effects of codec bit-rate adaptation Reduction of codec bit-rate always means lower quality The actual switching does affect user perceived quality: –Codec transition effects (varying) –The change in quality can be noticeable Switching to higher codec rate may not improve user experience. –Flapping between modes can be more annoying than constant lower quality

Other codec developments Audio encoding, rather than speech: –Greater bit-rate span 10-300 kbps Variable frame-rate, depending on codec mode (AMR-WB+), which is problematic in RTP Currently scalability is hot: –For audio, usually not speech –MPEG is doing something –European union research project assuming arbitrary truncation of packets

Effects of packetization The AMR codec bit-rate adaptation has less impact than the choice of packetization on total bandwidth. Calculated using IP (20) + DCCP (12) + RTP (12) headers for each packet Not unexpected considering that a speech frame including payload overhead is 13, 18 and 32 bytes. Codec Mode Frames per packet Total (kbps) 4.75311.2 6.7313.2 4.75214.2 6.7216.2 12.2318.8 12.2221.8 4.75123.2 6.7125.2 12.2130.8

ReceiverSender System Delay Overview Contributors to system delay are: –Sampling buffering –Encoding delay –Packetization delay –Transmission delay –Transport delay (Internet) –Receiver buffering delay –Decoding delay –Playout delay Sum of delays less than 200 ms for high quality conversational, less than 400 ms to be usable for conversational VoIP Codec MIC Payload Packetization DCCP Internet Codec Speaker DCCP Jitter Buffer

Delay and Robustness Effects Although it seems tempting to use 3 frames per packet to save bandwidth it will cost much delay. For optimal quality there is need to trade off quality reduction from lower bit-rate modes against the expected system delay. For a system which already have a big delay; reduce codec mode. For a system with small delays changing packetization to use more frames per packet can be done without much quality cost. More frames per packet also reduces robustness

Questions for future studies How hard is it to maintain an periodic transmission with TFRC VoIP mode? Otherwise it will introduce extra jitter, which requires more receiver buffering. What is the effects of DTX, like in the AMR case, where the packet rate drops to an 1/8 th compared to active speech.

Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund

Similar presentations

Presentation on theme: "Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund

Similar presentations

Presentation on theme: "Speech codecs and DCCP with TFRC VoIP mode Magnus Westerlund"— Presentation transcript:

Similar presentations

About project

Feedback