Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ranko Pinter Simoco Digital Systems

Similar presentations


Presentation on theme: "Ranko Pinter Simoco Digital Systems"— Presentation transcript:

1 Ranko Pinter Simoco Digital Systems
TETRA Voice Coding Ranko Pinter Simoco Digital Systems

2 Agenda Why code speech? Basic principles of TETRA voice coding
How was TETRA codec selected? Operational performance Future enhancements

3 Analogue transmission
Continuous variations in electric current In analogue transmission the analogue signals from the speaker’s microphone can be conveyed directly by the transmission medium to the listener’s earpiece.

4 Digital transmission Encoder De-coder Analogue speech to bit stream
Bit stream to analogue speech In digital transmission an Encoder and Decoder are needed to match the analogue speech information to the digital transmission bearer.

5 Types of codec Waveform codec Parametric codec
Transmitted bits represent the speech waveform Parametric codec Transmitted bits drive a speech synthesiser Waveform codecs include PCM and ADPCM (Adaptive Differential PCM). Parametric codecs are also termed vocoders. There are also hybrid codecs.

6 Waveform codecs Sample Quantise Encode Reconstruct Decode
Waveform encoding: Regular sampling of the speech waveform into short pulses, quantisation of samples, then encoding into digital bit stream. Waveform decoding: Decoding the bit stream to recreate the sample pulses, then applying these pulses to a low-pass filter to reconstitute then speech waveform. Used in fixed line telephony, eg PCM (8k samples/sec, 8 bits/sample = 64 kbits/sec) and ADPCM (32 kbits/sec). Bit rates too high for mobile radio channels. Waveform codecs fail to produce quality speech at bit rates below about 16 kbit/sec. Reconstruct Decode

7 Parametric codecs Parametric model of speech production
Transmitted bits drive a speech synthesiser Parametric codecs using Linear Predictive Coding (LPC) can produce speech of adequate quality for mobile radio at bit rates below 5 kbit/s.

8 Speech production - vocal tract
Soft palate Hard palate Pharynx Lungs are the source of acoustic power. Air vibrates the vocal chords in the larynx at the top of the trachea. The lungs and vocal chords provide the excitation which sets the loudness and pitch. This excitation is then modified by the characteristics and positions of all parts of the vocal tract (pharynx, tongue, jaw, teeth, lips etc). Larynx Tongue

9 Parametric coding - Speech synthesis
Excitation generator Synthesis filters Synthetic speech Speech production can be modelled as an Excitation generator (representing the lungs and vocal chords) exciting Synthesis filters (modelling the vocal tract). The excitation is either a pulse train for voiced sounds or noise for unvoiced sounds. The spectral envelope (produced by resonances of the vocal tract) is characterised by parameters of frequency selective filters which have the same characteristics as the vocal tract. The filters have specific, but time-varying, parameters. Filters are implemented digitally. (Lungs, vocal chords) (Vocal tract)

10 Speech synthesis Synthetic speech Pitch prediction filter (Long term)
LPC synthesis filter (Short term) Perceptual error weighting filter Excitation generator Excitation generator produces sequences of pulses of varying amplitudes and maybe positions (in time). Pitch prediction filter has a long time constant and models the spectral fine structure of the speech LPC (Linear Prediction Coding) synthesis filter has a short time constant and models the short-term spectral envelope of the speech The Perceptual error weighting filter modifies the noise spectrum so that it has a higher value in regions of the speech spectrum where it is less noticeable, ie around the formant regions, where resonances of the vocal tract cause concentrations of speech energy. Parametric coding makes use of the redundancy inherent in speech to reduce the amount of information that needs to be transmitted. Redundancy occurs in: repetition of waveshapes at periodic rate presence of noise components in some speech sounds which are not perceptually important for reconstructing the exact speech waveform the parameters of both the excitation signal and the filters characteristics changing relatively slowly - very little change over 5 msec (but almost certain change over 30 msec) LPC - Linear Predictive Coding

11 Analysis-by-synthesis predictive coding
Speech input Perceptual error weighting Excitation generator Synthesis filters The objective of analysis-by-synthesis codecs is to minimise the mean-squared error between the waveform of the actual speech signal and that of its synthesised version. Firstly, the synthesis filter parameters are calculated in an open-loop process from the input speech. Then synthetic speech is computed for all candidate excitation sequences (different pulse amplitudes and positions) to find the sequence that produces the output closest to the original signal according to a perceptually weighted distortion measure. Error minimisation Speech encoder

12 TETRA ACELP Algebraic Code Excited Linear Predictive
Algebraic codebook Excitation generator TETRA codec uses an Excitation generator based upon a codebook of Gaussian sequences. Codebook is based on highly structured algebraic codes and contains many excitation sequences of pulses with different amplitudes and positions. This algebraic structure has advantages in terms of storage, search complexity and robustness. The encoder needs to search the algebraic codebook to determine the optimum excitation sequence which, when used to drive the synthesis filters having appropriate filter coefficients, will give an output most closely resembling the actual speech waveform. The pitch prediction filter is a long-term filter which aims to model the pseudo-periodicity in the speech signal. It estimates pitch parameters of the speech waveform over periods of up to 160 samples (160 x 1/8000 secs = 20 msecs) (20 msecs of speech at the rate of 8000 samples/sec) The information transmitted is: Information about the excitation sequence (codebook address) Information about the filter coefficients Adaptive codebook Pitch prediction filter

13 Audio processing (Tx) Frame stealing Speech input Homing function
Speech importance Speech encoder Other audio processing functions associated with the actual speech encoder. Frame stealing may be needed for system control purposes or for end-to-end ciphering synchronisation. Frames may also be lost at the receiver due to transmission errors. The Frame stealing function can replace the contents of an encoded speech frame with other information. The Speech Importance function computes a parameter for each speech frame; this is needed for system control purposes and to indicate the possibility of stealing a speech frame with minimal impact on speech. In the TETRA codec a given input sequence always produces the corresponding bit-exact output sequence, provided that the internal state variables are always initially reset to a known state. The Homing Function achieves this. There are also Test Mode control functions (not shown). Encryption Channel coding Digital output

14 Audio processing (Rx) Speech output Missing frame substitution
Missing frame generation Homing function Comfort Noise Speech decoder Complementary audio processing functions are associated with the speech decoder. If frames are missing due to transmission errors (Bad Frame Indicator (BFI) is set), or there is frame stealing for control or end-to-end ciphering synchronisation, Missing Frame Generation and Missing Frame Substitution functions are used to provide the decoder with an adequate set of parameters to allow a smooth restitution of the speech waveform. When a significant number of frames are lost at the receiver, better listening comfort is provided by replacing the synthesised speech by ‘comfort noise’. The Comfort Noise function computes specific parameters which can be used to generate appropriate comfort noise at the receiver. The decoder uses the parameters of the last received ‘good’ frame, retains the filter coefficients and interpolates the bad speech frames. As in the encoder, a Homing Function is used to initially reset the internal state variables to a known state. Also as in the encoder there are Test Mode control functions (not shown). Decryption Channel decoding Digital input

15 Quality speech @ low bit-rate
Channel coding Analogue speech input Digital output Speech encoder Channel encoder Channel coding is used to minimise the effect of transmission errors on the speech. The ACELP process produces 137 bits per 30 ms of speech (= kbit/s). Bits are assigned to sensitivity classes. Two 30 ms speech segments are encoded on channel together. 3 classes of bits: 0, 1, 2 (most important) Class 2 2 x 30 bits + CRC, tail bits and FEC = 162 bits per 60 msec Class 1 2 x 56 bits + FEC = 168 bits per 60 msec Class 0 2 x 51 bits (No coding) = 102 bits per 60 msec = 432 bits (7.2 kbit/s) 432 bits of each normal burst payload are then re-ordered. No interleaving - to avoid unacceptable additional delay to the speech. 274 bits per 60 msec of speech = kbit/s 432 bits per 60 msec of speech = 7.2 kbit/s Quality low bit-rate

16 Complete Codec - Block Diagram
analogue TETRA TETRA TETRA TETRA speech Voice Voice Voice Voice Encoder Encoder Decoder Decoder 8 kHz 8 KHz 8 kHz 8 kHz Sampling Sampling Importance Sampling 4.567 kbps 4.567 kbps Factor 16 bits 16 bits 16 bits TETRA TETRA TETRA TETRA Channel Channel Channel Channel Encoding Encoding Decoding Decoding FEC+ CRC FEC+ CRC Bad Frame Flag Three levels of Hi Hi bit stream Med.. Med.. TX TX RX RX protection FEC 7.2 kbps 7.2 kbps No No

17 Usual Questions How to send four speech channels down one digital pipe? How to “steal” 18th time slot to send synchronisation data without loosing speech?

18 Secret of a “stolen” Frame
1 ch. encoded speech frame 60 msec 2 3 1 15 16 17 1 TETRA Multiframe sec transmitted slot No. transmitted frame No. 15 16 17 18 1 2 3 4 56.67 msec Each 60 msec segment of encoded speech is buffered, then transmitted in a burst at a faster rate (by a factor of x 4 x 18/17), so that the information in 17 x 60 msec segments (1.02 sec duration) can be accommodated in 17 transmitted frames (0.96 sec duration); this leaves the 18th frame available for signalling - a special feature of TETRA, eg allowing emergency calls to interrupt a transmitting terminal. The buffering and speeded-up transmission allow each 60 msec segment of encoded speech to be transmitted in msec (1 TETRA time-slot). 4 time-slots make 1 TETRA frame : Duration 4 x = msec 18 TETRA frames make 1 TETRA multi-frame : Duration 18 x = 1.02 sec But 1.02 secs accommodates just 17 x 60 msec of encoded speech, so only 17 of the 18 frames are needed to transmit the speech information, leaving the 18th frame free for signalling 17 x 60 msec segments of encoded speech bits (1.02 sec) 1.02 sec 18th stolen frame 17 x msec TDMA frames (0.963 sec) ÷4=241ms/ ch.

19 Codec selection Speech quality Subjective assessments Complexity
The codec selection tests compared the subjective performance of a number of candidate digital codecs under various transmission channel conditions and under various conditions of acoustic background noise. The transmission channel conditions represented different channel models (error patterns, Es/No, vehicle speed, BER). The assessment process also considered the computational complexity needed to implement the codec, with highest score for lowest complexity (‘simplest’ implementation). Complexity Computational demand of speech and channel codec

20 Heavy processing demand
Codec complexity C = MOPS * RAM (kB) * ROM (kB) Overall encoder complexity MOPS Overall decoder complexity 5.4 MOPS Operators given computational weight. Factored by number of calls. In this case one kbyte equals bytes. RAM is the data memory in kbytes, and is the sum of the (maximum of) Scratch RAM and (sum of) Static RAM. ROM is the memory for data tables in kbytes. Encoder: speech encoder 9,624 MOPS; channel encoder 0,081 MOPS; RAM 8,34 kbytes, 1,668 MOPS; ROM 11,07 kbytes, 0,550 MOPS. Encoder complexity 11,923 MOPS. Decoder: speech decoder 1,025 MOPS; channel decoder 3,040 MOPS; RAM 3,84 kbytes, 0,768 MOPS; ROM 11,07 kbytes, 0,550 MOPS. Decoder complexity 5,383 MOPS. MOPS - Million Operations Per Second Very complex Heavy processing demand

21 Codec performance MOS 4 Excellent quality : Imperceptible impairment
Subjective Assessment - Mean Opinion Score (MOS) For ‘clean’ speech, with IRS processed input (weighted input frequency characteristic), these results put both TETRA and GSM Full-rate codecs between Scores 3 & 4. MOS Excellent quality : Imperceptible impairment MOS Good quality : Just perceptible impairment, but not annoying

22 TETRA Codec performance
Factor Effect on quality Input level change Insensitive Frame stealing Slight degradation, not significant Tandeming Best avoided! Background noise at Tx Practical results impressive Very robust Effect of input level No significant change in performance with input level (-32 dB to-12 dB). Effect of frame stealing Regular frame stealing at a rate of one speech frame per TETRA multiframe degrades the speech quality slightly, by about 1 dB on average for clean speech. The presence of background noise or transmission errors reduces the audibility of the degradation Effect of tandeming Tandeming codecs degrades speech quality and is preferably avoided whenever possible. Effect of acoustic background noise Practical results have been very impressive, even in acoustically harsh environments like the cockpit of a light aircraft, beside busy roads and in vehicle cabs with sirens and horns operating. .

23 Codec performance Quality (Q) comparison with analogue FM Quality
Q (dB) TETRA FM Q (dB) values (a quality measure relating speech power to speech-correlated noise power) for linear input (not weighted) at various input levels (not RF levels). TETRA remains consistent quality. FM quality drops drastically for low input levels. Audio input level (dB)

24 Codec performance Comparison with analogue FM TETRA Quality FM Range
Low background noise TETRA Quality FM High background noise Close to the base station at high carrier-to-noise ratio (C/N) and in areas of low background noise, FM quality just has the edge over TETRA, however, if there is high background noise TETRA gives superior quality. As the range from the base station increases and the carrier-to-noise ratio worsens, FM speech quality falls off gradually, but TETRA quality holds up for longer. PMR systems are frequently operating near their limits of coverage and this extra quality with TETRA is valuable. Range

25 ETSI demo 2 phrases Analogue 2 phrases TETRA
Codec performance 1 & 2 3 & 4 Quality FM ETSI demo 2 phrases Analogue 2 phrases TETRA TETRA Range 1 Male Moderate C/N TETRA channel subjected to various Error Patterns (Typical Urban and Hilly Terrain). Analogue channel noise set for comparable RF quality. Error/noise models are not the same, but are meant to represent equivalent conditions. 2 phrases - first for Reference Condition (FM), then for TETRA. 1 Male, medium audio level, moderate RF quality 2 Female, medium audio level, moderate RF quality 3 Male, low audio level, poor RF quality 4 Female, low audio level, poor RF quality 2 Female Moderate C/N 3 Male Poor C/N 4 Female Poor C/N

26 TETRA provision for 4 codecs
Future enhancements TETRA provision for 4 codecs Enhanced codec for TETRA telephony AMR (Adaptive Multi-Rate) Provision of a new codec for military The TETRA standard can support up to 4 different voice codecs. In public TETRA networks users have reported no significant difference to the GSM codec under normal conditions. Under some conditions the TETRA codec performs better, especially when the speaker is in a high acoustic noise environment. Nevertheless an Enhanced quality codec is planned for speech quality improvement especially for telephony use. An Adaptive Multi-Rate codec operates at a higher bit rate in low-error environments to give higher quality speech. In higher-error environments, where higher speech quality can not be sustained, the codec reduces its operational bit rate. Future codec enhancements are in the Work Programmes of the ETSI Future TETRA Group and the TETRA MoU Working Group on Enhancement of the Standard. More radical proposals for a changed TETRA air interface to accommodate even higher data bit rates (> 200 kbit/s) would also offer the opportunity for a reduced speech transmission delay.

27 Conclusions Using ACELP technique, TETRA Codec provides a nearly GSM quality at almost half bit rate TETRA Codec provides a superior quality to FM and GSM in the high background noise environment Current TETRA Standard has a provision for 4 Codecs Additional Codec planned for TETRA Release 2 will provide even higher quality for telephony applications


Download ppt "Ranko Pinter Simoco Digital Systems"

Similar presentations


Ads by Google