2Speech Coding, CELP Coders Implementation using C54x OutlineSpeech Coding, CELP CodersImplementation using C54x
3Outline – Speech Coding Generalities on speech and codingLinear Prediction based codersShort term and long term predictionVector QuantizationCELP codersStructure and calculationsStandards
4Applications of Speech Coding Digital TransmissionsOn wired telephone:MultiplexingIntegration of servicesOn wireless channels:Spectral efficiencyFor better protection against errorsVoice mail/messagingStorage: telephone answering machineSecure phone
5Characteristics of Coders Bit Rate D: 50 bps < D < 96 kbpsCoding Delay ~ frame delayQualityObjective measurements: SNR, PSQMSubjective measurements: MOS (excellent,good,fair,poor,unacceptable)Intelligibility:Objective measure STI or subjective DRTAcceptability: E model of ETSI standard, communicabilityImmunity to noiseComplexitySNR = Signal to Noise RatioPSQM = Pseudo Subjective Quality MeasurementMOS = Mean Opinion Score = Grade between 1 and 5STI = Speech Transmission IndexDRT = Diagnostic Rhyme Test
6Objective Evaluation of the Quality The PSQM method:Objective evaluationBased on a model of auditive perceptionTakes into account the masking effectsGood correlation with the MOS grade in « basic » conditions:Low bit rate speech coding, tandem, transmission errors, ...But sometimes not very reliable :Loss of frames, effect of the automatic controlStill under development (PSQM+)PSQM = Pseudo Subjective Quality Measurement
7Subjective Evaluation of Quality using the ACR Method yielding MOS score A great number of auditors give grades to a great number of speech sequences.Database with phonetically balanced sentencesPresentation in random orderNaive auditorsStatistical processing of results gives the MOS.MOS = Mean Opinion ScoreACR = Absolute Category Rating.
15Simplified Speech Production Model y(t)=h(t)*e(t) Y(z)= H(z)E(z)
16All Pole Model of the Spectrum Shaping Filter The filter H(z) represents the spectral envelope since the excitation has a white spectrum.
17Short Term Linear Prediction The coefficients of H(z)=1/A(z) can be obtained by linear prediction.Short term analysis on x(n) speech signalFrames of 10 to 30 ms.Least square error criterion:X(n) is approximated by a linear combination of the past samples x(n-i).
18Determination of the Spectral Envelope by Linear Prediction Prediction error e(n) = residual is nearly white,so the spectral envelope of x(n) can be approximated by Sx(f):White noise refers to a random signal, for example, that obtained when a radio tuner is off station.
19Calculation of the Prediction Coeffcients The prediction coefficients ai are the solution of the «normal equations»:Reference for the Levinson-durbin algorithm:N. Levinson, « The Wiener RMSS (Root Mean Square » eror criterion in filter design and prediction », J Maths-Phys., 25:261—278, 1945.J. Durbin, « the fitting of time series models », rev. Int. Inst. Statis., 28: , 1960The Levinson Durbin algorithm is often used to solve these equations
20Example of Linear Prediction Amplitude of the speech signalAmplitude of residual signal
21Example of Linear Prediction: Spectral Envelope Estimation FormantsThe trequencies of the maxima of the power spectral density are called formants.They correspond to resonance of the vocal tract.
22Estimation of the Pitch Period Pitch Period T0 estimated by correlation of the speech signal or residual.Other methods exist (e.g. cepstrum)F0 = fundamental frequency = 1/T0Fractional pitch estimation if the precision is better than the sampling period.
23Long Term Prediction (LTP) The idea is to predict one period of signal from the preceding one:2 unknowns: b and M.M is the pitch period (when voiced).Least square error criterion is used.
24Long Term Prediction (LTP) For a given value of M, optimal b is:The best M value maximizes:All possible values of M must be tested.
25Example of Long Term Prediction 2 curves are on the figure.The blue one represents the prediction error when using a short term predictor only.We can observe large nearly periodical values of the error due to the fact that short term prediction cannot predict the nex pitch periods.The green curve represents the prediction error when using a short term predictor plus a long term predictor.The error is smaller:the pitch period can be well predicted with the long term predictor.
26LPC 10 Vocoder One of the oldest speech coder is the LPC10 vocoder: The analysis (coder) calculates each frame:Pitch period, prediction coefficients, energy, voicing.The synthesis (decoder) uses these parameters to synthesize speech from the electrical equivalent model.
28Prediction Spectral Parameters The ai coefficients are sensitive to coding and interpolation.They are replaced by other coefficients:Reflexion coefficients ki, log area ratio LARi.Line spectrum frequencies LSFi.In the LPC10 vocoderThe pitch and voicing are coded on 7 bitsThe log of energy on 5 bitsThe 10 prediction coefficients ai (transformed in ki and LARi) are coded on 41 bits.A total of 53 bits per frame of 22,5ms = 2400bps
29Vector Quantization (2-dimensional example ) Bit rate can be decreased by applying VQ to the coefficients.
30Line Spectrum Frequencies LSF, LSP The Line Spectrum Frequencies fi and Line spectrum pairs cos(fi) have good properties for quantization and interpolation.The LSF and LSP are derived from the inverse filter A(z).Build F1(z) and F2(z) symetrical and antisymmetrical polynomials by (for order 10):
31LSF and LSPRoots of F1 and F2 on lie on the unit circle and are interleaved.5 conjugate roots exp(ji), fi= i/(2).The roots of F1 and F2 are searched by evaluating the value ofFi(z) and F2(z) on typically 60 points on the unit circle and monitoring the sign variation.In the interval with sign variation the search is refined.
32Coders using Short Term and Long Term Prediction RELP MPE LPCELP RELP = Residual Excited Linear Prediction Coder. The residual is coded in a scalar wayand sent with the spectral parameters given by LP.MPE LP = Multi Pulse Excited Linear Prediction Coder:In multipulse coders, the residual is represented by a few pulses with good positions and amplitudes.CELP = Code Excited Linear Prediction Coder: the residual is coded by vector quantization.
33RPE-LTP GSM Full Rate Coders GSM Full Rate Coder is called:RPE LTP= Regular Pulse Excited, Long Term Prediction coderThe signal u = the best down-sampled version ( 4) of the residual signal r.In CELP coders, vector quantization is applied on the signal.CELP = Code Excited Linear Prediction coderEach frame of residual signal is compared to sequences of signal stored in a codebook. The codebook sequences are white and the codebook is called stochastic codebook.Here down-sampled means sampling rate reduction by decimation.
34CELP Coder Basic Scheme Analysis by synthesis (closed loop) to find the best excitation sequence.The sequence in the codebook are normalized in energy.
35Structure of CELP Coder: Perceptual Filter Perceptual filter: the reconstruction error is spectrally weighted exploiting noise masking properties of formants.W(z)=A(z/1)/A(z/ 2), 0 1, 2 1A*(z)=A(z/) (poles towards zero)
36CELP Coder with Perceptual Filter The coder must choose the best sequence in the waveform codebook.The best sequence minimizes the perceptual distance between the original speech frame and the synthetic one.For each sequence in the codebook the coder builds a synthetic speech frame, by filtering the white codebook sequence in ordre to give it the same spectrum and the same pitch as the original speech.
37Basic CELP Structure: Perceptual Filter Inserted in the 2 Branches The perceptual filter is inserted in the 2 branches of the difference.H(z)=W(z)/A(z)
38CELP Structure: Memory of H(z) Memory of H(z) = Output for a zero inputhi= impulse response of H(z)
40CELP: Adaptive Codebook LTP can be realized by an adaptive codebookP1 corresponds to the filtering of past residuals by H.P2 corresponds to the filtering of the vectors of the stochastic codebook by H.The past residuals can be stored n a codebook that is called the « adaptive codebook » because its contents changes with time.
41CELP with Stochastic Codebook The adaptive codebook stores the past residual frames. It is called adaptive because its content changes with time.
43CELP Equations Example: Searching through Codebooks The main load is the filtering of all the codebook vectors.
44Filtering Matrix H H(n) is the impulse response corresponding to H(z). N = length of the codebook vectors.
45Finding the Best Excitation in the Coder: Equation of the Solution J least square criterionFor a set of 2 vectors cj,i(j), F is the 2 column matrix of filtered vectors fj,i(j)
46CELP Optimal SolutionOptimal algorithm finds the best combination of code vectors maximizing the norm and finds the optimal gains gj.But the number of combinations of codebook vectors is very high and the complexity is also great. Example:M=1024 for the stochastic codebookand M=256 for the adaptive codebookLeads to solutions to test and1280 vectors to filter.
47Iterative Suboptimal Algorithm for 2 Codebooks First step:Target vector = pFind the best vector in the adaptive codebook and its gain.Calculate the new target vector p1:Second step:Target vector = p1Find the best vector in the stochastic codebook and its gain.The optimal solution is too complex.There are many suboptimal algorithms designed to decrease the complexity.The iteratve approach is one of the most common.There can be more than 2 codebooks.
49Operations of the Iterative Algorithm At step j, the optimal codebook vector has index i:
50Iterative Algorithm Numerical Example : FS=8000Hz, M=256 size of the stochastic codebookMa=128 size of the adaptive codebookFrame size NT=160, 20msFrames split in 4 subframes of N=40 samplesp=10 linear prediction order10 Mips to filter the stochastic codebook.
51Iterative AlgorithmThe main processing load is the filtering of the codebooks vectors.Many algorithms have been proposed to decrease the computation load:Special structures of the codebook:VSELP: Vector SumAlgebraic codebook: ACELPLinear codebook (the adaptive codebook is linear).Structure of H avoiding the filtering:Diagonalization of HTHIn the algebraic codebook approach, the structure is based on interleaved single-pulse permutaion design.In the algebraic codebook, the codebook vectors contain only a few non-zero pulses.The non-zero pulses are equal to + or –1.The N (typically N=40) poistions in a codebook vector are divided into a small number of tracks, for example 5 in the GSM enhanced full rate coder (EFR):10 non-zero pulses out of 40:Track 1: 2 pulses i0, i5 positions: 0,5,10,15,20,25,30,35Track 2: 2 pulses i1, i6 positions:1,6,11,16,21,26,31,36Track 3: 2 pulses i2, i7 positions:2,7,12,17,2é,27,32,37Track 4: 2 pulses i3, i8 positions:3,8,13,18,23,28,33,38Track 5: 2 pulses i4, i9 positions:4,9,14,19,24,29,34,39In the VSELP coder the codebok is generated from a base of vectorsBy linear combination with coefficients +1 or –1.
52CELP Coding Standards from 4.8 kbps to 16 kbps Federal standard (DOD) (4.8 kbps)frame = 260 samples (30 ms)LPC 8 --> (LSP coding 34 bits)adaptive codebook (256 vectors (fractional pitch))stochastic codebook (512 vectors (-1,0,1))
53VSELP (Vector Sum Excitation Coding) Codebook vectors v are combinations of basis vectors (b1,b2,...,bk)v=+/- b1 +/- b2 +/ /- bkOnly the basis vectors are filteredMotorola ( 8 kbps)GSM (half rate)(5.6 kbps)
54Fractional PitchThe precision of the pitch period is a fraction of sample TS. An interpolation filter is used.B(z)=1-bz-Mf with Mf=M+ x(n-M-) can be written as:TF-1(X(f)*e(-j2f(M+)Te))= x(n-M)* TF-1(e(-j2fTe)) =x(n-M)* h(n)
61Implementation of CELP Coders on C54x Example of the G729 Annex A.Specific instruction for codebook searchSome functions of DSPLIB
62Profiling Example for G729 Annex A using C Compiler G729 is a CS-ACELP Coder (ITU 1995)8Kbps with quality of ADPCM at 32Kbps G726.DSVD: G729 Annex A voice over internet, voiceDigital Simultaneous Voice & DataCS ACELP = Conjugate Structure Agebraic CELP Coder.ADPCM = Adaptive Differential Pulse code Modulation.
63G729 Annex A Main Blocks of the Coder Algorithm Frame = 10 ms = 80 Samples.Short term LPC analysis on 40ms frameLSP derived from ai coefficients and quantized using Split VQ.Long Term LTP analysis, 2 subframes of 40 samples.LTP lag and gain. LTP fractional lag (1/3)8 bits 1rst subframe and 5 bits for the 2nd.Search fixed codebook: 2 subframes of 40 samples. Index and gainsCode length = 40 with 4 non-zero pulses 1.
66G729 Annex A Main Blocks of the Decoder Algorithm The serial received bits are converted into parameters:LSP vector, 2 fractional pitch lags and gains, 2 fixed codebook index and gains.LSP are converted to LP filter coefficients ai and interpolated at each subframe.At each subframe:The excitation is constructed and scaled.The speech is synthesized by filtering the excitation by the LP synthesis filter.Postprocessing by an adaptive postfilter.
67Using the C CompilerUse the C program of the standard and C compiler with maximum optimization.Autocorrelation = cyclesLevinson = cyclesConversion ai LSF = cyclesLSF Quantization = cyclesSynthesis filtering = cyclesPitch open loop = cyclesFractional Pitch = 2 x cyclesSearch Algebraic code = 2x cyclesGains quantization = 2x cycles
68Assembly Language Instructions for Codebook Search Better results can be obtained with assembly language than C.Specific instructions for codebook search: Conditional stores.