Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,

Similar presentations


Presentation on theme: "Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,"— Presentation transcript:

1 Speech Coding Techniques

2 Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth, the lower the quality RTP payload types Processing power The better quality (for a given bandwidth) uses a more complex algorithm A balance between quality and cost

3 Voice Quality Bandwidth is easily quantified Voice quality is subjective MOS, Mean Opinion Score ITU-T Recommendation P.800 Excellent – 5 Good – 4 Fair – 3 Poor – 2 Bad – 1 A minimum of 30 people Listen to voice samples or in conversations

4 P.800 recommendations The selection of participants The test environment Explanations to listeners Analysis of results Toll quality A MOS of 4.0 or higher

5 About Speech Speech Air pushed from the lungs past the vocal cords and along the vocal tract The basic vibrations – vocal cords The sound is altered by the disposition of the vocal tract ( tongue and mouth) Model the vocal tract as a filter The shape changes relatively slowly The vibrations at the vocal cords The excitation signal

6 Voiced sound The vocal cords vibrate open and close Quasi-periodic pulses of air The rate of the opening and closing – the pitch Unvoiced sounds Forcing air at high velocities through a constriction Noise-like turbulence Show little long-term periodicity Short-term correlations still present Plosive sounds A complete closure in the vocal tract Air pressure is built up and released suddenly Speech sounds

7 Voice Sampling Discrete Time LTI Systems: The Convolution Sum 012 010123 h[n] x[n]y[n] n nn 1 0.5 2 2.5 2

8  Nyquist sampling theorem

9 Quantization (Scalar Quantization) v1v1 v2v2 v k+1 vLvL m 0 = -A m1m1 m 2 ……mkmk m k+1 mL1mL1 m L =A · Assume | x[n] |  A divide the range [  A, A ] into L quantization levels { J 1, J 2, …… J k,….. J L } J k : [m k-1,m k ] L = 2 R each quantization level J k is represented by a value v k S = U J k, V = { v 1, v 2, …… v k, ….. v L } J k+1

10 Non-Uniform Quantization m 0 = -Am1m1 m 2 ……0m L =A Concept : small quantization levels for small x large quantization levels for large x Goal: constant SNR Q for all x

11 Companding F(x) x[n] Uniform Quantization F1(x)F1(x) x[n] Uniform Decoder ^ Compressor … 1101 … 1101 … Expandor Compressor + Expandor  Compandor F(x) is to specify the non-uniform quantization characteristics

12 Non-Uniform Quantization   -law  A-law  Typical values in practice  = 255, A = 87.6

13 Types of Speech Codecs Waveform codecs,source codecs (also known as vocoders),and hybrid codecs.

14 Speech Source Model and Source Coding unvoiced G v/u voiced N random sequence generator periodic pulse train generator  G(z) = 1 1   a k z -k P k = 1 x[n] G(z), G(  ), g[n] u[n] Excitation Vocal Tract Model Excitation parameters v/u : voiced/ unvoiced N : pitch for voiced G : signal gain  excitation signal u[n] Vocal Tract parameters {a k } : LPC coefficients  formant structure of speech signals A good approximation, though not precise enough

15 LPC Vocoder(Voice Coder) x[n] LPC Analysis { a k } N, G v/u Encoder …11011… N by pitch detection v/u by voicing detection Decoder { a k } N, G v/u receiver …11011… g[n] G(z) Ex x[n] {a k } can be non-uniform or vector quantized to reduce bit rate further

16 The most commonplace codec Used in circuit-switched telephone network PCM, Pulse-Code Modulation If uniform quantization 12 bits * 8 k/sec = 96 kbps Non-uniform quantization 65 kbps DS0 rate North America A-law Other countries, a little friendlier to lower signal levels An MOS of about 4.3 G.711

17 ADPCM(adaptive differential PCM) DPCM and ADPCM. ADPCM : Adaptive Prediction in DPCM Adaptive Quantization Adaptive Quantization Quantization level  varies with local signal level  [n] = a  x [n]  x [n] : locally estimated standard deviation of x[n] G.721:ADPCM-coded speech at 32Kbps. G.726(A-law or ) 16,24,32,40Kbps MOS 4.0, at 32Kbps

18 Analysis-by-Synthesis (AbS) Codecs Hybrid codec Fill the gap between waveform and source codecs The most successful and commonly used Time-domain AbS codecs Not a simple two-state, voiced/unvoiced Different excitation signals are attempted Closest to the original waveform is selected MPE, Multi-Pulse Excited RPE, Regular-Pulse Excited CELP, Code-Excited Linear Predictive

19 G.728 LD-CELP CELP codecs A filter; its characteristics change over time A codebook of acoustic vectors A vector = a set of elements representing various char. of the excitation Transmit Filter coefficients, gain, a pointer to the vector chosen Low Delay CELP Backward-adaptive coder Use previous samples to determine filter coefficients Operates on five samples at a time Delay < 1 ms Only the pointer is transmitted

20 1024 vectors in the code book 10-bit pointer (index) 16 kbps LD-CELP encoder Minimize a frequency-weighted mean-square error

21 LD-CELP decoder An MOS score of about 3.9 One-quarter of G.711 bandwidth

22 G.723.1 ACELP 6.3 or 5.3 kbps Both mandatory Can change from one to another during a conversation The coder A band-limited input speech signal Sampled at 8 KHz, 16-bit uniform PCM quantization Operate on blocks of 240 samples at a time A look-ahead of 7.5 ms A total algorithmic delay of 37.5 ms + other delays A high-pass filter to remove any DC component

23 G.723.1 Annex A Silence Insertion Description (SID) frames of size four octets The two lsbs of the first octet 006.3kbps24 octets/frame 015.3kbps20 10SID frame 4 An MOS of about 3.8 At least 37.5 ms delay

24 G.729 8 kbps Input frames of 10 ms, 80 samples for 8 KHz sampling rate 5 ms look-ahead Algorithmic delay of 15 ms An 80-bit frame for 10 ms of speech A complex codec G.729.A (Annex A), a number of simplifications Same frame structure Encoder/decoder, G.729/G.729.A Slightly lower quality

25 G.729.B VAD, Voice Activity Detection Based on analysis of several parameters of the input The current frames plus two preceding frames DTX, Discontinuous Transmission Send nothing or send an SID frame SID frame contains information to generate comfort noise CNG, Comfort Noise Generation G.729, an MOS of about 4.0 G.729A an MOS of about 3.7

26 Other Codecs CDMA QCELP defined in IS-733 Variable-rate coder Two most common rates The high rate, 13.3 kbps A lower rate, 6.2 kbps Silence suppression For use with RTP, RFC 2658

27 GSM Enhanced Full-Rate (EFR) GSM 06.60 An enhanced version of GSM Full-Rate ACELP-based codec The same bit rate and the same overall packing structure 12.2 kbps Support discontinuous transmission For use with RTP, RFC 1890

28 GSM Adaptive Multi-Rate (AMR) codec GSM 06.90 Eight different modes 4.75 kbps to 12.2 kbps 12.2 kbps, GSM EFR 7.4 kbps, IS-641 (TDMA cellular systems) Change the mode at any time Offer discontinuous transmission The coding choice of many 3G wireless networks

29 The MOS values are for laboratory conditions G.711 does not deal with lost packets G.729 can accommodate a lost frame by interpolating from previous frames But cause errors in subsequent speech frames Processing Power G.728 or G.729, 40 MIPS G.726 10 MIPS

30 Cascaded Codecs E.g., G.711 stream -> G.729 encoder/decoder Might not even come close to G.729 Each coder only generate an approximate of the incoming signal

31 Tones, Signal, and DTMF Digits The hybrid codecs are optimized for human speech Other data may need to be transmitted Tones: fax tones, dialing tone, busy tone DTMF digits for two-stage dialing or voice-mail G.711 is OK G.723.1 and G.729 can be unintelligible The ingress gateway needs to intercept The tones and DTMT digits Use an external signaling system

32 Easy at the start of a call Difficult in the middle of a call Encode the tones differently form the speech Send them along the same media path An RTP packet provides the name of the tone and the duration Or, a dynamic RTP profile; an RTP packet containing the frequency, volume and the duration RFC 2198 An RTP payload format for redundant audio data Sending both types of RTP payload

33 RTP Payload Format for DTMF Digits An Internet Draft Both methods described before A large number of tones and events DTMF digits, a busy tone, a congestion tone, a ringing tone, etc. The named events E: the end of the tone, R: reserved

34 Payload format

35 Finis

36 Discrete Time LTI Systems: The Convolution Sum 012 010123 h[n] x[n]y[n] n nn 1 0.5 2 2.5 2

37 Frequency-Domain Representation of Sampling

38 Speech Source Model and Source Coding Vocal Tract Model


Download ppt "Speech Coding Techniques. Introduction Efficient speech-coding techniques Advantages for VoIP Digital streams of ones and zeros The lower the bandwidth,"

Similar presentations


Ads by Google