Fundamentals of Multimedia, Chapter 1313 Chapter 13 Basic Audio Compression Techniques 13.1 ADPCM in Speech Coding 13.2 G.726 ADPCM 13.3 Vocoders 13.4.

Slides:



Advertisements
Similar presentations
Speech Coding Techniques
Advertisements

Tamara Berg Advanced Multimedia
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Speech & Audio Coding TSBK01 Image Coding and Data Compression Lecture 11, 2003 Jörgen Ahlberg.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
4.2 Digital Transmission Pulse Modulation (Part 2.1)
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Page 15/18/2015 CSE 40373/60373: Multimedia Systems Bluray (  MPEG-2 - enhanced for HD, also used for playback of DVDs and.
CELLULAR COMMUNICATIONS 5. Speech Coding. Low Bit-rate Voice Coding  Voice is an analogue signal  Needed to be transformed in a digital form (bits)
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Lecture 14: Spring 2007 MPEG Audio Compression
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
Chapter 2 Fundamentals of Data and Signals
COMP 249 :: Spring 2005 Slide: 1 Audio Coding Ketan Mayer-Patel.
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
1 Chapter 2 Fundamentals of Data and Signals Data Communications and Computer Networks: A Business User’s Approach.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Chapter 4 Digital Transmission
1 Audio Compression Multimedia Systems (Module 4 Lesson 4) Summary: r Simple Audio Compression: m Lossy: Prediction based r Psychoacoustic Model r MPEG.
CS :: Fall 2003 Audio Coding Ketan Mayer-Patel.
LE 460 L Acoustics and Experimental Phonetics L-13
Data Communications & Computer Networks, Second Edition1 Chapter 2 Fundamentals of Data and Signals.
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
LECTURE Copyright  1998, Texas Instruments Incorporated All Rights Reserved Encoding of Waveforms Encoding of Waveforms to Compress Information.
AUDIO COMPRESSION msccomputerscience.com. The process of digitizing audio signals is called PCM PCM involves sampling audio signal at minimum rate which.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
t x(t) Pulse Code Modulation (PCM) Consider the analog Signal x(t).
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Speech Coding Submitted To: Dr. Mohab Mangoud Submitted By: Nidal Ismail.
Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006.
SPEECH CODING Maryam Zebarjad Alessandro Chiumento.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Submitted By: Santosh Kumar Yadav (111432) M.E. Modular(2011) Under the Supervision of: Mrs. Shano Solanki Assistant Professor, C.S.E NITTTR, Chandigarh.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2009.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Digital Multiplexing 1- Pulse Code Modulation 2- Plesiochronous Digital Hierarchy 3- Synchronous Digital Hierarchy.
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Subband Coding Jennie Abraham 07/23/2009. Overview Previously, different compression schemes were looked into – (i)Vector Quantization Scheme (ii)Differential.
More On Linear Predictive Analysis
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 3 – Digital Audio Representation Klara Nahrstedt Spring 2014.
Voice Sampling. Sampling Rate Nyquist’s theorem states that a signal can be reconstructed if it is sampled at twice the maximum frequency of the signal.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
CELP / FS-1016 – 4.8kbps Federal Standard in Voice Coding
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
1 Speech Compression (after first coding) By Allam Mousa Department of Telecommunication Engineering An Najah University SP_3_Compression.
Chapter 13 Basic Audio Compression Techniques 13.1 ADPCM in Speech Coding 13.2 G.726 ADPCM 13.3 Vocoders 13.4 Further Exploration.
Digital Communications Chapter 13. Source Coding
Vocoders.
Chapter 13 Basic Audio Compression Techniques
4.1 Chapter 4 Digital Transmission Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
Mobile Systems Workshop 1 Narrow band speech coding for mobile phones
Vocoders.
Linear Prediction.
Govt. Polytechnic Dhangar(Fatehabad)
Presentation transcript:

Fundamentals of Multimedia, Chapter 1313 Chapter 13 Basic Audio Compression Techniques 13.1 ADPCM in Speech Coding 13.2 G.726 ADPCM 13.3 Vocoders 13.4 Further Exploration 1 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter ADPCM in Speech Coding ADPCM forms the heart of the ITU’s speech compression standards G.721, G.723, G.726, and G.727. The di ff erence between these standards involves the bit-rate (from 3 to 5 bits per sample) and some algorithm details. The default input is µ-law coded PCM 16-bit samples. 2 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter − 0.5 − − 0.5 − − 0.5 − 1.0 Time (a) Time (b) Time (c) Fig Waveform of Word ”Audio”: (a) Speech sample, linear PCM at 8 kHz/16 bits per sample. (b) Speech sample, restored from G.721-compressed audio at 4 bits/sample. (c) Di ff erence signal between (a) and (b). 3 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter G.726 ADPCM ITU G.726 supersedes ITU standards G.721 and G.723. Rationale: works by adapting a fixed quantizer in a simple way. The di ff erent sizes of codewords used amount to bit- rates of 16 kbps, 24 kbps, 32 kbps, or 40 kbps, at 8 kHz sampling rate. 4 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 The standard defines a multiplier constant α that will change for every di ff erence value, e n, depending on the current scale of signals. Define a scaled di ff erence signal g n as follows: s e n = s n − ˆ n, g n = e n /α, (13.1) s ˆ n is the predicted signal value. quantization. 5 g n is then sent to the quantizer for Li & Drew c Prentice Hall 2003

Output − 10 −5− Fundamentals of Multimedia, Chapter − 5 − 10 Input Fig. 13.2: G.726 Quantizer The input value is a ratio of a di ff erence with the factor α. By changing α, the quantizer can adapt to change in the range of the di ff erence signal — a backward adaptive quan- tizer. 6 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Backward Adaptive Quantizer Backward adaptive works in principle by noticing either of the cases: – too many values are quantized to values far from zero – would happen if quantizer step size in f were too small. – too many values fall close to zero too much of the time — would happen if the quantizer step size were too large. Jayant quantizer allows one to adapt a backward quantizer step size after receiving just one single output. – Jayant quantizer simply expands the step size if the quan- tized input is in the outer levels of the quantizer, and reduces the step size if the input is near zero. 7 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 The Step Size of Jayant Quantizer Jayant quantizer assigns multiplier values M k to each level, with values smaller than unity for levels near zero, and values larger than 1 for the outer levels. For signal f n, the quantizer step size ∆ is changed according to the quantized value k, for the previous signal value f n−1, by the simple formula ∆ ← M k ∆ (13.2) Since it is the quantized version of the signal that is driving the change, this is indeed a backward adaptive quantizer. 8 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 G.726 — Backward Adaptive Jayant Quantizer G.726 uses fixed quantizer steps based on the logarithm of the input di ff erence signal, e n divided by α. The divisor α is: β ≡ log 2 α (13.3) When di ff erence values are large, α is divided into: – locked part α L – scale factor for small di ff erence values. – unlocked part α U — adapts quickly to larger di ff erences. – These correspond to log quantities β L and β U, so that: β = Aβ U + (1 − A)β L (13.4) * A changes so that it is about unity, for speech, and about zero, for voice-band data. 9 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 The “unlocked” part adapts via the equation α U ← M k α U, (13.5) β U ← log 2 M k + β U, where M k is a Jayant multiplier for the kth level. The locked part is slightly modified from the unlocked part: β L ← (1 − B)β L + Bβ U (13.6) where B is a small number, say 2 −6. The G.726 predictor is quite complicated: it uses a linear combination of 6 quantized di ff erences and two reconstructed signal values, from the previous 6 signal values f n. 10 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter Vocoders Vocoders – voice coders, which cannot be usefully applied when other analog signals, such as modem signals, are in use. – concerned with modeling speech so that the salient fea- tures are captured in as few bits as possible. – use either a model of the speech waveform in time (LPC (Linear Predictive Coding) vocoding), or... → – break down the signal into frequency components and model these (channel vocoders and formant vocoders). Vocoder simulation of the voice is not very good yet. 11 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Phase Insensitivity A complete reconstituting of speech waveform is really un- necessary, perceptually: all that is needed is for the amount of energy at any time to be about right, and the signal will sound about right. Phase is a shift in the time argument inside a function of time. – Suppose we strike a piano key, and generate a roughly sinusoidal sound cos(ωt), with ω = 2πf. – Now if we wait su ffi cient time to generate a phase shift π/2 and then strike another key, with sound cos(2ωt + π/2), we generate a waveform like the solid line in Fig – This waveform is the sum cos(ωt) + cos(2ωt + π/2). 12 Li & Drew c Prentice Hall 2003

Time (msec) Fundamentals of Multimedia, Chapter − 1 − 2 Fig. 13.3:Solid line: Superposition of two cosines, with a phase shift. Dashed line: No phase shift. The wave is very di ff erent, yet the sound is the same, perceptually. – If we did not wait before striking the second note, then our waveform would be cos(ωt) + cos(2ωt). But perceptually, the two notes would sound the same sound, even though in actuality they would be shifted in phase. 13 Li & Drew c Prentice Hall 2003

... Fundamentals of Multimedia, Chapter 13 Channel Vocoder Vocoders can operate at low bit-rates, 1–2 kbps. To do so, a channel vocoder first applies a filter bank to separate out the di ff erent frequency components: Low-frequency filter Mid-frequency filter Noise generator Pulse generator Low-frequency filter Mid-frequency filter High-frequency filter From 1st analysis filter From 2nd analysis filter From 3rd analysis filter Multiplex, transmit, demultiplex High-frequency filter Voiced/unvoiced decision Analysis filters Synthesis filters Pitch period Pitch Fig 13.4: Channel Vocoder 14 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Due to Phase Insensitivity (i.e. only the energy is important): – The waveform is “rectified” to its absolute value. – The filter bank derives relative power levels for each fre- quency range. – A subband coder would not rectify the signal, and would use wider frequency bands. A channel vocoder also analyzes the signal to determine the general pitch of the speech (low — bass, or high — tenor), and also the excitation of the speech. A channel vocoder applies a vocal tract transfer model to generate a vector of excitation parameters that describe a model of the sound, and also guesses whether the sound is voiced or unvoiced. 15 Li & Drew c Prentice Hall 2003

abs (Coefficient) Fundamentals of Multimedia, Chapter 13 Format Vocoder Formants: the salient frequency components that are present in a sample of speech, as shown in Fig Rationale: encoding only the most important frequencies Frequency (8,000/32 Hz) Fig. 13.5: The solid line shows frequencies present in the first 40 msec of the speech sample in Fig The dashed line shows that while similar frequencies are still present one second later, these frequencies have shifted. 16 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Linear Predictive Coding (LPC) LPC vocoders extract salient features of speech directly from the waveform, rather than transforming the signal to the frequency domain LPC Features: – uses a time-varying model of vocal tract sound generated from a given excitation – transmits only a set of parameters modeling the shape and excitation of the vocal tract, not actual signals or di ff erences ⇒ small bit-rate About “Linear”: The speech signal generated by the output vocal tract model is calculated as a function of the current speech output plus a second term linear in previous model coe ffi cients 17 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 LPC Coding Process LPC starts by deciding whether the current segment is voiced or unvoiced: – For unvoiced: a wide-band noise generator is used to cre- ate sample values f (n) that act as input to the vocal tract simulator – For voiced: a pulse train generator creates values f (n) – Model parameters a i : calculated by using a least-squares set of equations that minimize the di ff erence between the actual speech and the speech generated by the vocal tract model, excited by the noise or pulse train generators that capture speech parameters 18 Li & Drew c Prentice Hall 2003

s(n) = Fundamentals of Multimedia, Chapter 13 LPC Coding Process (cont’d) If the output values generate s(n), for input values f (n), the output depends on p previous output sample values: p i=1 a i s(n − i) + Gf (n) (13.7) G – the “gain” factor coe ffi cients; a i – values in a linear predictor model min E {[s(n) − LP coe ffi cients can be calculated by solving the following minimization problem: p j =1 a j s(n − j )] 2 } (13.8) 19 Li & Drew c Prentice Hall 2003

E {[s(n) − Fundamentals of Multimedia, Chapter 13 LPC Coding Process (cont’d) By taking the derivative of a i and setting it to zero, we get a set of J equations: p a j s(n − j )]s(n − i)} = 0, i = 1...p, (13.9) j =1 Letting φ(i, j ) = E {s(n − i)s(n − j )}, then: φ(1, 1) φ(1, 2) φ(2, 1) φ(2, 2) φ(1, p)... φ(2, p)... φ(p, 1) φ(p, 2)... φ(p, p) a 1 a 2... apap = φ(0, 1) φ(0, 2)... φ(0, p), (13.10) 20 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 LPC Coding Process (cont’d) An often-used method to calculate LP coe ffi cients is the autocorrelation method: φ(i, j ) = N −1 n=p s w (n−i)s w (n−j )/ N −1 n=p s 2w (n), i = 0...p, j = 1...p. (13.11) s w (n) = s(n + m)w(n) – the windowed speech frame starting from time m 21 Li & Drew c Prentice Hall 2003

i j Fundamentals of Multimedia, Chapter 13 LPC Coding Process (cont’d) Since φ(i, j ) can be defined as φ(i, j ) = R(|i − j |), and when R(0) ≥ 0, the matrix {φ(i, j )} is positive symmetric, there exists a fast scheme to calculate the LP coe ffi cients: E (0) = R(0), i = 1 while i ≤ p i−1 j =1 a ii−1 = k i for j = 1 to i − 1 a ij = a j−1 − k i a i−1 E (i) = (1 − k i2 )E (i − 1) i ← i +1 for j = 1 to p a j = a Jj 22 Li & Drew c Prentice Hall 2003

j =1 a j s(n j =1 a j φ(0, j ) s(n)s(n − i)/[ N −1+m v(i) = i ∈ [P min, P max ] N −1+m 2 s (n) N −1+m 2 s (n Fundamentals of Multimedia, Chapter 13 LPC Coding Process (cont’d) Gain G can be calculated: G = E {[s(n) − p − j )] 2 } = E {[s(n) − p − j )]s(n)} = φ(0, 0) − p (13.12) The pitch P of the current speech frame can be extracted by correlation method, by finding the index of the peak of: n=mn=mn=mn=mn=mn=m − i)] 1/2 (13.13) 23 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Code Excited Linear Prediction (CELP) CELP is a more complex family of coders that attempts to mitigate the lack of quality of the simple LPC model CELP uses a more complex description of the excitation: – An entire set (a codebook) of excitation vectors is matched to the actual speech, and the index of the best match is sent to the receiver – The complexity increases the bit-rate to 4,800-9,600 bps – The resulting speech is perceived as being more similar and continuous – Quality achieved this way is su ffi cient for audio conferenc- ing 24 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 The Predictors for CELP CELP coders contain two kinds of prediction: – LTP (Long time prediction): try to reduce redundancy in speech signals by finding the basic periodicity or pitch that causes a waveform that more or less repeats – STP (Short Time Prediction): try to eliminate the redun- dancy in speech signals by attempting to predict the next sample from several previous ones 25 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Relationship between STP and LTP STP captures the formant structure of the short-term speech spectrum based on only a few samples LTP, following STP, recovers the long-term correlation in the speech signal that represents the periodicity in speech using whole frames or subframes (1/4 of a frame) – LTP is often implemented as “adaptive codebook search- ing” Fig shows the relationship between STP and LTP 26 Li & Drew c Prentice Hall 2003

... W ( z )/ A ( z ) Adaptive codebook − STP Fundamentals of Multimedia, Chapter 13 Original speech s ( n ) GaGa W(z)W(z) LTP   Weighted speech s w ( n ) Weighted synthesized speech sˆ w ( n ) Stochastic codebook  G s Weighted error e w ( n ) Fig 13.6 CELP Analysis Model with Adaptive and Stochastic Codebooks 27 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Adaptive Codebook Searching Rationale: – Look in a codebook of waveforms to find one that matches the current subframe – Codeword: a shifted speech residue segment indexed by the by the lag τ corresponding to the current speech frame or subframe in the adaptive codebook – The gain corresponding to the codeword is denoted as g 0 28 Li & Drew c Prentice Hall 2003

n=0 s(n)s(n − τ ), n=0 s (n − τ ) [ L=01 s(n)s(n − τ )] 2 n=0 s(n − τ )] L−1 2 Fundamentals of Multimedia, Chapter 13 Open-Loop Codeword Searching Tries to minimize the long-term prediction error but not the perceptual weighted reconstructed speech error, E (τ ) = L−1 [s(n) − g 0 s(n − τ )] 2. (13.14) n=0 By setting the partial derivative of g 0 to zero, ∂E (τ )/∂g 0 = 0, we get g 0 = L−1 L−1 2 (13.15) and hence a minimum summed-error value E min (τ ) = L−1 n=0 s 2 (n) − − n[n[ (13.16) 29 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 LZW Close-Loop Codeword Searching Closed-loop search is more often used in CELP coders — also called Analysis-By-Synthesis (A-B-S) speech is reconstructed and perceptual error for that is min- imized via an adaptive codebook search, rather than simply considering sum-of-squares The best candidate in the adaptive codebook is selected to minimize the distortion of locally reconstructed speech Parameters are found by minimizing a measure of the di ff er- ence between the original and the reconstructed speech 30 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 Hybrid Excitation Vocoder Hybrid Excitation Vocoders are di ff erent from CELP in that they use model-based methods to introduce multi-model excitation includes two major types: – MBE (Multi-Band Excitation): a blockwise codec, in which a speech analysis is carried out in a speech frame unit of about 20 msec to 30 msec – MELP (Multiband Excitation Linear Predictive) speech codec is a new US Federal standard to replace the old LPC-10 (FS1015) standard with the application focus on very low bit rate safety communications 31 Li & Drew c Prentice Hall 2003

1 Fundamentals of Multimedia, Chapter 13 MBE Vocoder MBE utilizes the A-B-S scheme in parameter estimation: – The parameters such as basic frequency, spectrum enve- lope, and sub-band U/V decisions are all done via closed- loop searching – The criterion of the closed-loop optimization is based on minimizing the perceptually weighted reconstructed speech error, which can be represented in frequency do- main as ε = +π 2π −π G(ω)|S w (ω) − S wr (ω)|dω (13.29) S w (ω) – original speech short-time spectrum S wr (ω) – reconstructed speech short-time spectrum G(ω) – spectrum of the perceptual weighting filter 32 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter 13 MELP Vocoder MELP: also based on LPC analysis, uses a multiband soft- decision model for the excitation signal The LP residue is bandpassed and a voicing strength param- eter is estimated for each band Speech can be then reconstructed by passing the excitation through the LPC synthesis filter Di ff erently from MBE, MELP divides the excitation into five fixed bands of 0-500, , , , and Hz 33 Li & Drew c Prentice Hall 2003

n=0 x(n)x(n + P ) x (n + P )] 1/2 N −1 2 x (n) Fundamentals of Multimedia, Chapter 13 MELP Vocoder (Cont’d) A voice degree parameter is estimated in each band based on the normalized correlation function of the speech signal and the smoothed rectified signal in the non-DC band Let s k (n) denote the speech signal in band k, u k (n) denote the DC-removed smoothed rectified signal of s k (n). The cor- relation function: R x (P ) = N −1 [ n=0 (13.33) P – the pitch of current frame N – the frame length k – the voicing strength for band (defined as max(R s k (P ), R u k (P ))) 34 Li & Drew c Prentice Hall 2003

n=0 e(n) ] n=0 |e(n)| Fundamentals of Multimedia, Chapter 13 MELP Vocoder (Cont’d) MELP adopts a jittery voiced state to simulate the marginal voiced speech segments – indicated by an aperiodic flag The jittery state is determined by the peakiness of the full- wave rectified LP residue e(n): peakiness = [ N1 1N1N N −1 2 1/2 N −1 (13.34) If peakiness is greater than some threshold, the speech frame is then flagged as jittered 35 Li & Drew c Prentice Hall 2003

Fundamentals of Multimedia, Chapter Further Exploration − → Link to Further Exploration for Chapter 13. A comprehensive introduction to speech coding may be found in Spanias’s excellent article in the textbook references. Sun Microsystems, Inc. has made available the code for its implementation on standards G.711, G.721, and G.723, in C. The code can be found from the link in the Chapter 13 file in the Further Exploration section of the text website The ITU sells standards; it is linked to from the text website for this Chapter More information on speech coding may be found in the speech FAQ files, given as links on the website; also, links to where LPC and CELP code may be found are included 36 Li & Drew c Prentice Hall 2003