EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing EE2F1 Multimedia (1): Speech & Audio Technology Lecture 9: Speech Coding Martin Russell Electronic, Electrical & Computer Engineering School of Engineering The University of Birmingham

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 2 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing What is speech coding?  Digitisation of speech for transmission or storage  Aim to minimise bits per second (bps)… …while preserving speech quality: –intelligibility and naturalness  Main kinds of speech coding scheme: –waveform coder –vocoder

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 3 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Approaches  Waveform coding –Work for all audio signals –Generic methods for bit reduction –Exploit properties of human hearing  Vocoders –Optimised for speech coding –Assume that the signal to be encoded is speech

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 4 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Waveform coders  PCM (Pulse Code Modulation)  DPCM (Differential PCM)  ADPCM (Adaptive Differential PCM)  Delta modulation (1 bit ADPCM)

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 5 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Pulse Code Modulation (PCM)  How many quantization points?  How many samples per second (sample rate)? Quantization error Sample point Quantization point

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 6 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Differential PCM  Encode the differences between values at successive quantisation points Quantization error Sample point Quantization point

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 7 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Adaptive DPCM  Use small number of bits to encode differences in DPCM  Adjust quantisation step size to accommodate large changes in the signal

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 8 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Delta Modulation  1 bit ADPCM  Sequence of ‘all 1s’ or ‘all 0s’ indicates need to change step size ‘Slope Overload’ indicated by excessive use of 1s or 0s 1 0 11 0000

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 9 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Waveform coding summarised  PCM, with 8 bits per sample (amplitude compression) and 8 kHz sampling rate, gives a bit rate of 64 kbps  DPCM (aka. Delta PCM), difference between samples needs fewer bits for same accuracy  Adaptive DPCM, scaling of bits varied, depending on dynamic range  Delta modulation = 1-bit DPCM –can adapt step size to avoid slope overload –gives reasonable intelligibility at just 16 kbps

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 10 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Vocoders  Coders designed specifically for speech  Sometimes called analysis-synthesis coders  Exploit source-filter model of speech

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 11 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Vocoders  Encoding –Estimate and encode source –Estimate and encode vocal tract filter –Store as feature vector  Transmission –Transmit at low data rate (~50-100 vectors per second) –Can do this because of relatively slow movement of vocal tract  Decoding –Recover source information –Recover vocal tract filter information –Convert into synthesiser control parameters –Synthesise speech

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 12 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Example: Channel Vocoder 19 band-pass filters, spanning 0-4 kHz centre-frequencies arranged non-linearly on frequency axis bandwidths increase with frequency, like ear’s critical bands Energies from filter outputs averaged over 20 ms

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 13 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Example: Channel Vocoder Spectrum shape (Filter-bank energies) coded by DPCM Combined with binary ‘voiced/unvoiced’ flag plus estimate of fundamental frequency f 0 if ‘voiced’ 1 ‘frame’ of data (48bits) transmitted 50 times per second 2,400 bps

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 14 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Example: Channel Vocoder Spectrum shape decoded and used to configure filterbank Voiced/unvoiced flag plus f 0 used to select source

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 15 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Example: Channel Vocoder AnalyserSynthesiser

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 16 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Linear Predictive Coding (LPC)  Basic idea –Assume that value of speech signal at time t can be written as a weighted sum of its values at times t-1, t-2,…, t-N –Nth order Linear Predictive Coding (LPC) –The coefficients a 0,…, a N can be thought of as the parameters of a digital filter (lecture 3) –They define the vocal tract filter at time t –Used in LPC vocoder

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 17 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Finite Impulse Response (FIR) digital filter x(n)x(n) Z -1  y(n)y(n) a1a1 a2a2 aNaN

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 18 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing LPC Vocoders  Quality of LPC vocoded speech depends critically on the quality of the excitation signal  Two particular forms of LPC used for speech coding in GSM mobile phones –RELP: Residual Excited LPC –CELP: Codebook Excited LPC

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 19 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Example: CELP Vocoders  Vocal tract filter: –LPC analysis conducted over short (~20ms) section of speech to give LPC coefficients  Source –Excitation source estimated over window –Compared with a finite set of ‘reference’ excitation signals e 1,…,e C. –Code for most similar reference transmitted –The set of references is called a codebook –Hence Codebook Excited LPC

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 20 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Formant Vocoder  A formant vocoder exploits the importance of F 1, F 2 and F 3 for speech perception  Formant frequencies, amplitudes and bandwidths estimated and used to model vocal tract filter  Transmitted, with V/UV and f 0 information at 50-100 frames per second  Speech decoded using a formant synthesiser  Using 5-6 bits for each of the 10 control parameters results in 2.5-6 kbps bit rate

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 21 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Recognition-Synthesis Coder Input Speech “recce report…” Speech Recognizer Phone-level transcription r E k i r @ p O t.. TransmitterReceiver Phone-level transcription r E k i r @ p O t.. Speech Synthesiser Output Speech “recce report…” 50 bps

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 22 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Recognition-synthesis coders  New technology – still in research labs  Very low data rates: –Sounds of English (~46 phonemes) can be specified using 6 bits –Talking at 8 phonemes per second, the linguistic content can be encoded in just 50 bps!  Computationally complex

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 23 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Use of ‘knowledge’  Bit rates reduced by exploiting properties of the the speech signal: –waveform coders: limited bandwidth –vocoders: signal contains resonances –recognition-synthesis: signal is speech  Highest-level models give lowest bit rates  Paralinguistic properties of the speech are sacrificed: –speaker’s identity –state of health –emotional/psychological state

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 24 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision Processing Summary of coding  Waveform coders –PCM, DPCM, ADPCM –Delta modulation  Vocoders –Channel vocoder, RELP, CELP –Segment vocoder  Recognition-synthesis coders  Trade-offs

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Similar presentations

Presentation on theme: "EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

Similar presentations

Presentation on theme: "EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision."— Presentation transcript:

Similar presentations

About project

Feedback