Presentation is loading. Please wait.

Presentation is loading. Please wait.

ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.

Similar presentations


Presentation on theme: "ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004."— Presentation transcript:

1

2 ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004

3 Outline Introduction to ITU-T G.729 Overall encoder and decoder Key components - Decoder - Perceptual Weighting - LP analysis - Adaptive codebook - Fixed codebook Speech Demo

4 Introduction Proposed 03/96 by ITU, low bit rate, low complexity, toll quality (MOS 3.9) using CS-ACELP (Conjugate-Structure Algebraic-Excited Linear-Prediction) Input is band-limited, 8 kHz sampling, 16-bit PCM speech (128 kbs). Since output rate is 8kbs, the compression ratio of 16:1is achieved. Short-term synthesis filter is based on a 10th order Linear Prediction (LP) filter every 10 ms frame. Long-term pitch synthesis filter is implemented using the adaptive-code book. Low algorithmic delay (10ms current frame + 5ms lookahead) Preferred over G.711 64kbs and 32kbs ADPCM coders due to superior bandwidth utilization. Used in VoIP, wireless communications, digital satellite systems.

5 Coding bit distribution

6 Encoder (analysis-by-synthesis) Synthesis Filter Perceptual Weighting Excitation Generator x[n] Parameter encoding LP analysis Transmitted bitstream Excitation, Pitch &Gain Calculator Decoder LP info d[n] d’[n] x ~ [n]

7 Synthesis Filter 10 th order polynomial. Synthesis filter: Where are quantized LP coefs obtained from quantized LSPs. 1 st subframe: interpolated between current and previous frames values. 2 nd subframe: current frame values.

8 Decoder Excitation Generator Synthesis Filter Post- processing Parameter Decoder Output speech LP info bitstream

9 Decoder Decode codebook parameters by table lookup. LSP coefs interpolated and converted to LP coefs for 2 subframes. Excitation = sum of adaptive and fixed codebook vectors multiplied by their respective gains in each subframe. Speech = excitation through vocal tract filter. Enhanced perceived quality by adaptive post- filtering. Spectral tilt Formant sharpness Long-term postfilter

10 LP Analysis Windowing, Autocorrelation, Levinson Durbin A(z) LSP LSP quantization Preprocessed input L0,L1,L2,L3 LSP Index

11 Conversion of A(z) -> LSP LSP are the roots of two polynomials 5 unique roots of each polynomial are computed by evaluating 60 equally spaced freqs between 0 to then fine tune at sign shift intervals. The difference between these roots and the 4 th - order MA prediction of the roots are quantized. 2-stage VQ: (1) 7-bit codebook L1, (2) Split 10 bit VQ into 5-bit codebook L2 and 5-bit codebook L3. 1-bit L0 chooses which set of MA is best.

12 Perceptual Weighting LP filter Vocal Tract original weighted unit circle Flat: Tilted:

13 Adaptive Codebook Determine pitch delay and pitch gain (periodic portion of excitation) Candidate delay T_op, selected as the delay giving highest correlation from a perceptually weighted speech frame. 1 st subframe: T1 found by searching within 3 samples of T_op (range 20-85 with resolution 1/3, range 85-143 at resolution 1). Then encoded into 8 bits. 2 nd subframe: T2 found by searching within int(T1)-5.67, int(T1)+4.67 with resolution 1/3. Then encoded into 5 bits. Parity bit P0 computed in 1 st subframe XOR of 6 MSB of P1 as the bit error protection. Gain = normalized correlation of the reconstructed signal and pitch shifted reconstructed signal.

14 Fixed Codebook Algebraic codebook structure. Each vector contains 4 nonzero pulses. The possible values are: Codebook vector c(n) is 40-dim with four unit pulses at found locations with corresponding signs. PulseSignPositions i0S0 =-1,+1m0 = 0,5, 10,15, 20, 25, 30, 35 i1S1 =-1,+1m1=1,6,11,16,21, 26,31,36 i2S2 =-1,+1M2=2,7,12,17,22, 27,32,37 i3S3 =-1,+1M3=3,8,13,18,23, 28,33,38,4,9,14, 19,24,29,34,39

15 Fixed Codebook Encode the random portion of the excitation signal. The periodic portion of the weighted residual is first removed. Only the random portion is remained to be coded by fixed codebook. Codebook search by minimize error between perceptual weighted input speech and reconstructed speech. For each subframe: sign and positions of 4 nonzero pulses computed encoded into 17 bits.

16 Speech Demo female speech: G.729 decoded: male speech: G.729 decoded:

17 G.729 Addition Annex A (11/96) use ½ CPU power at minimal reduction in perceived quality (MOS 3.7) Annex B (10/96) adds discontinuous transmission (DTX), voice activation detection (VAD), background noise modeling, comfort noise generation (CNG), silence frame insertion. Annex D, E (09/98) 6.4 kbit/s and 11.8 kbit/s CS-ACELP speech coding algorithm. Annex F, G, H, I (98-2001) enhance capabilities of previous annexes (e.g. DTX/VAD/CNG) and also integrate different bit rates codecs.

18 References 1.ITU-T G.729 official recommendation, available at: http://www.itu.int/rec/recommendation.asp?type=folders&lan g=e&parent=T-REC-G.729 2. Andreas S. Spanias, “Speech Coding, A Tutorial Review”, Proc. of IEEE, Vol. 82, No 10, pp. 1541-1582, October 1994 3. GAO Research Inc. G.729 description and demo: http://www.gaoresearch.com/products/speechsoftware/ot her/g729.php http://www.gaoresearch.com/products/speechsoftware/ot her/g729.php 4. Jade Clayton, Privateline writing about G.729 and other standards for VoIP: http://www.privateline.com/clayton/clayton2.htm


Download ppt "ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004."

Similar presentations


Ads by Google