Speech Processing Short-Time Fourier Transform Analysis and Synthesis.

Speech Processing Short-Time Fourier Transform Analysis and Synthesis

2 June 2016Veton Këpuska2 Short-Time Fourier Transform Analysis and Synthesis Minimum-Phase Synthesis  Speech & Audio Signals are varying and can be considered stochastic signals that carry information.  This necessitates short-time analysis since a single Fourier transform (FT) can not characterize changes in spectral content over time (i.e., time-varying formants and harmonics) Discrete-time short-time Fourier transform (STFT) consists of separate FT of the signal in the neighborhood of that instant. FT in the STFT analysis is replaced by the discrete FT (DFT) Resulting STFT is discrete in both time and frequency. Discrete STFT vs. Discrete-time STFT which is continuous in frequency.  In linear Prediction and Homomorphic Processing, underlying model of the source/filter is assumed. This leads to: Model based analysis/synthesis, also note that Analysis methods presented implicitly both used short time analysis methods (to be presented).  In Short-Time Analysis systems no such restrictions apply.

2 June 2016Veton Këpuska3 Short-Time Analysis (STFT)  Two approaches of STFT are explored: 1.Fourier-transform & 2.Filterbank

2 June 2016Veton Këpuska4 Fourier-Transform View  Recall (from Chapter 3):  w[n] is a finite-length, symmetrical sequence (i.e., window) of length N w. w[n] ≠ 0 for [0, N w -1] w[n] – Analysis window or Analysis Filter

2 June 2016Veton Këpuska5 Fourier-Transform View  x[n] – time-domain signal  f n [m]=x[m]w[n-m] - Denotes short-time section of x[m] at point n. That is, signal at the frame n.  X(n,) - Fourier transform of f n [m] of short-time windowed signal data.  Computing the DFT:

2 June 2016Veton Këpuska6 Fourier-Transform View  Thus X(n,k) is STFT for every =(2/N)k Frequency sampling interval = (2/N) Frequency sampling factor = N  DFT:

2 June 2016Veton Këpuska7 Fourier-Transform View

2 June 2016Veton Këpuska8 Example 7.1  Let x[n] be a periodic impulse train sequence:  Also let w[n] be a triangle of length P: P 2P 3P -P 0 n … P/2+1 -P/2 P-points n

2 June 2016Veton Këpuska9 Example 7.1 Non-zero only for m=lP Window located at lP & Linear phase -lP

2 June 2016Veton Këpuska10 Example 7.1  Since windows w[n] do not overlap, |X(n,)| = constant and ∠ X(n,) is linear.  Computation of DFT for N=P gives: 1 DFT of translated, non-overlapping windows with phase shift of zero (due to sampling)

2 June 2016Veton Këpuska11 Spectogram |X(n,)| 2  If analysis window length is ≤ pitch period ⇒ wideband spectrogram ⇒ vertical striations  Otherwise ⇒ narrowband spectrogram ⇒ horizontal striations  How often to apply analysis window to the signal? X(n,k) is decimated by a temporal decimation factor L:  X(nL,k) = DFT{f nL (m)}  f nL [m] sections are a subset of f n [m] How to chose sampling rates in time (L) and frequency (N-fft length) it will be addressed in one of the forthcoming sections.

2 June 2016Veton Këpuska12 Analysis window L p=1 p=2 p=3 w[pL-m] x[m]

2 June 2016Veton Këpuska13 Spectrogram |X(n,)| 2

2 June 2016Veton Këpuska14 Fourier-Transform View  Note that in , X(n,) is periodic over 2 (same as Fourier transform) and is Hermetian (H=H’) symmetric. For real sequences   Re{ X(n,) } or | X(n,) | is symmetric  Im{ X(n,) } or arg { X(n,) } is anti-symmetric  A time-shift results in linear phase shift (same as in Fourier Transform):  Thus, a shift by n 0 in the original time sequence introduces a linear phase, but also a shift in time, corresponding to a shift in each short-time section by n 0.

2 June 2016Veton Këpuska15 Filtering View  In this interpretation w[n] is considered to be a filter whose impulse response is w[n].  Thus w[n] is referred to as analysis filter.  Let’s fix the value of = o.  The above equation represents the convolution of the sequence x[n]e -j o n with the sequence w[n]. Thus:

2 June 2016Veton Këpuska16 Filtering View  The product: x[n]e -j o n  Modulation of x[n] up to frequency  o.

2 June 2016Veton Këpuska17 Filtering View  Alternate view:  The discrete STFT can be also interpreted from the filtering viewpoint.  This equation brings the interpretation of the discrete STFT as the output of the filter bank shown in the next slide.

2 June 2016Veton Këpuska18 Filtering View

2 June 2016Veton Këpuska19 Filtering View  General Properties: 1.If x[n] has the length N & w[n] has the length M, then X(n,) has length N+M+1 along n. 2.The bandwidth of X(n, o ) is less than or equal to that of w[n]. 3.Sequence X(n, o ) has its spectrum centered at the origin.

2 June 2016Veton Këpuska20 Example 7.2  Consider a Gaussian window of the form:  The discrete STFT with DFT length N, therefore, can be considered as a bank of filters with impulse responses:  For x[n]=(n)  x[n]*h k [n]=h k [n]  If N=50, corresponding to bandpass filters spaced by 200 Hz for the sampling rate of 10000 samples/s, then:

2 June 2016Veton Këpuska21 Example 7.2  For k=0,5,10,15 the following is obtained:

2 June 2016Veton Këpuska22 Example 7.2

2 June 2016Veton Këpuska23 Example 7.3  Consider the filter bank of previous example 7.2 that was designed with a Gaussian window of the form:  Figure 7.7 shows the Fourier transform magnitudes of the output of the four complex bandpass filters h k [n] for k=0,5,10, and 15 as presented in previous slide and depicted in the figure 7.6.

2 June 2016Veton Këpuska24 Example 7.3  After Demodulation the resulting bandpass outputs have the same spectral shape as in the figure but centered at the origin.

2 June 2016Veton Këpuska25 Time-Frequency Resolution Tradeoffs  In Chapter 3 basic issue in analysis window selection is the compromise required between a long window for showing signal detail in frequency and a short window required for representing fine temporal structure: Since both X() and W() are periodic over 2 linear convolution is essentially circular. From the equation above:  W() smears (smoothes) X(). Want W() as narrow as possible ideally W()=() for good frequency resolution. W()=() will result in a infinitely long w[n]. Poor time resolution. Conflicting goal

2 June 2016Veton Këpuska26 Example 7.4  Figure 7.8 depicts time-frequency resolution tradeoff:

2 June 2016Veton Këpuska27 Time-Frequency Resolution Tradeoffs  From the previous example, smoothing interpretation of STFT is not valid for non-stationary sequences.  For steady signal long analysis windows are appropriate and they yield good frequency resolution as depicted in the next figure.

2 June 2016Veton Këpuska28 Time-Frequency Resolution Tradeoffs  However, for short and transient signals, plosive speech, flaps, diphthongs, etc., short windows are preferred in order to capture temporal events.  Shorter windows yield poor frequency resolution.

2 June 2016Veton Këpuska29 Short-Time Synthesis  How to obtain original sequence back from its discrete-time STFT?  The inversion is represented mathematically by a synthesis equation which expresses a sequence in terms of its discrete-time STFT.  Recall that for f n [m]=x[m]w[n-m]:  Thus: If w[n]≠0 then recovery is complete.

2 June 2016Veton Këpuska30 Short-Time Synthesis  For each n, we take the inverse Fourier transform of the corresponding function of frequency, then we obtain the sequence f n [m].  Evaluating f n [m] for m=n the following is obtained: x[n]w[0]. For w[0]≠0 x[n] can be obtained by dividing f n [n]/w[0].  The process of taking the inverse Fourier transform of X(n,) for a specific n and then dividing by w[0] is represented in the following relation: representing synthesis equation for the discrete-time STFT.

2 June 2016Veton Këpuska31 Short-Time Synthesis  In contrast to discrete-time STFT X(n,) the discrete STFT X(n,k) is not always invertible.  Example 1. Consider the case when w[n] is bandlimited with bandwidth of B.

2 June 2016Veton Këpuska32 Short-Time Synthesis Note if there are frequency components of x[n] which do not pass through any of the filter regions of the discrete STFT then it is not a unique representation of x[n], and x[n] is not invertible.  Example 2. Consider X(n,k) decimated in time by factor L, i.e., STFT is applied every L samples. w[n] is non-zero over its length N w.  If L > N w then there are gaps in time where x[n] is not represented/considered.  Thus in such cases again x[n] is not invertible.

2 June 2016Veton Këpuska33 L > N w L w[pL-m] x[m] NwNw

2 June 2016Veton Këpuska34 Short-Time Synthesis  Conclusion: Constraints must be adopted to ensure uniqueness and invertability: 1.Proper/Adequate frequency sampling: B≥2/N w (B - Window bandwidth) 2.Proper Temporal Decimation: L≤ N w

2 June 2016Veton Këpuska35 Filter Bank Summation (FBS) Method  Traditional short-time synthesis method that is commonly referred to as the Filter Bank Summation (FBS).  FBS is best described in terms of the filtering interpretation of the discrete STFT. The discrete STFT is considered to be the set of outputs of a bank of filters. The output of each filter is modulated with a complex exponential Modulated filter outputs are summed at each instant of time to obtain the corresponding time sample of the original sequence (see Figure 7.5(b) in the slide 18).

2 June 2016Veton Këpuska36 Filter Bank Summation (FBS) Method  Recall the synthesis equation given earlier:  FBS method carries out discrete version of this equation by utilizing discrete STFT X(n,k):  Derive conditions such that to ensure that y[n] x[n].

2 June 2016Veton Këpuska37 Filter Bank Summation (FBS) Method  From Figure 7.5  Thus: Interchanging summation operation this equation reduces to: 11 x[n] y[n] Analysis followed by synthesis

2 June 2016Veton Këpuska38 Filter Bank Summation (FBS) Method  Furthermore

2 June 2016Veton Këpuska39 Filter Bank Summation (FBS) Method  Thus: y[n] is the output of the convolution of x[n] with a product of the analysis window with a periodic impulse sequence.  Note: reduces to [n] if: Window length N w ≤N, or For N w >N, must have w[rN]=0 for r≠0, that is

2 June 2016Veton Këpuska40 Filter Bank Summation (FBS) Method

2 June 2016Veton Këpuska41 Filter Bank Summation (FBS) Method  This constraint is known as the FBS constraint.  It must be fulfilled in order to ensure exact signal synthesis with the FBS method.  This constrained is commonly expressed in frequency domain:  This expression states that the frequency responses of the analysis filters should sum to a constant across the entire bandwidth.  We will conclude this discussion by stating that a filter bank with N filters, based on an analysis filter of length less than or equal to N, is always an all-pass system.

2 June 2016Veton Këpuska42 Generalized FBS Method  Note:  “Smoothing” function f[n.m] is referred to as the time- varying synthesis filter.  It can be shown that any f[n,m] that fulfills the condition below makes the synthesis equation above valid (Exercise 7.6):  Note also that basic FBS method can be obtained by setting the synthesis filter to be a non-smoothing filter: f[n,m]=  [m]

2 June 2016Veton Këpuska43 Generalized FBS Method  Consider the discrete STFT with decimation factor L. Generalized FSB of the synthesized signal is given by:  Furthermore, consider time invariant smoothing filter: f[n,m]=f[m]  That is: f[n,n-rL]=f[n-rL]

2 June 2016Veton Këpuska44 Generalized FBS Method  Thus  This equation holds when the following constrain is satisfied by the analysis and synthesis filters as well as the temporal decimation and frequency sampling factors:  For f[m]=  [m] and L=1 this method reduces to the basic FBS method.

2 June 2016Veton Këpuska45 Generalized FBS Method  Interested in L>1 case and in using f[n] as interpolator.  Interpolation FBS Methods: 1.Helical Interpolation (Partnoff) 2.Weighted Overlap-add Method (Croshiere)

2 June 2016Veton Këpuska46 Overlap-Add (OLA) Method  FBS Method was motivated from the filtering view of the STFT  OLA method was motivated from the Fourier transform view of the STFT.  In the OLA method: 1.Inverse DFT for each fixed time in the discrete STFT is taken, 2.Overlap and add operation between the short-time section is performed,  This works provided that analysis window is designed such that the overlap and add operation effectively eliminates the analysis window from the synthesized sequence.  Basic idea is that the redundancy within overlapping segments and the averaging of the redundant samples remove the effect of windowing.

2 June 2016Veton Këpuska47 Overlap-Add (OLA) Method  Recall the short-time synthesis relation:  If x[n] is averaged over many short-time segments and normalized by W(0) then where

2 June 2016Veton Këpuska48 Overlap-Add (OLA) Method  Discretized version of OLA is given by:  Note that the above IDFT is true provided that N>N w. The expression for y[n] thus becomes:  Which provided that: then y[n]=x[n] Always True because sum of values of a sequence must always equal the first value of its Fourier transform (D.C. Energy of a signal is by definition sum of signal values)

2 June 2016Veton Këpuska49 Overlap-Add (OLA) Method  For decimation in time by factor of L, it can be shown (Exercise 7.4) that:  Then x[n] can be synthesized using the following equation:  The above equation depicts general constrain imposed by OLA method. It requires that the sum of all the analysis windows (obtained by sliding w[n] with L-point increments) to add up to a constant as shown in the next figure.

2 June 2016Veton Këpuska50 Overlap-Add (OLA) Method

2 June 2016Veton Këpuska51 Overlap-Add (OLA) Method  Duality of OLA constraint and FBS constraint:  FBS method requires that finite-length windows have a length N w less than the number of analysis filters N to satisfy FBS constrain (N>N w ).  Analogously, for OLA methods it can be shown that its constrained is satisfied by all-finite-bandwidth analysis windows whose maximum frequency is less than 2/L (where L is temporal decimation factor). In addition this finite-bandwidth constraint can be relaxed by allowing the shifted window transform replicas to take on value zero at the frequency origin =0: Analogous to FBS constrain for N w >N where the window w[n] is required to take on value zero at n=N, 2N, 3N,... FBSOLA

2 June 2016Veton Këpuska52 Overlap-Add (OLA) Method

2 June 2016Veton Këpuska53 Time-Frequency Sampling  Different qualitative view of the time-frequency sampling concepts for OLA and FBS constrains from the perspective of classical time-domain and frequency-domain aliasing.  Following discussion serves as additional summary of sampling issues for those two methods that gives motivation for our earlier statement that sufficient but not necessary conditions for invertability of the discrete STFT are: 1.The analysis window is non-zero over its finite length N w. 2.The temporal decimation factor L≤N w 3.The frequency sampling interval 2/N ≤ 2/N w

2 June 2016Veton Këpuska54 Time-Frequency Sampling  Consider windowed/short-time signal: f n [m]=w[m]x[n-m], and X(n,) – Fourier transform of f n [m] Analysis window duration of N w  From Fourier transform point of view: Reconstruction of f n [m] from X(n,k) requires a frequency sampling of at least 2/N w or finer.  From Time-domain point of view: Time decimation interval L is required to meet Nyquist criterion based on the bandwidth of the window w[n].  This implies sampling of X(n, k) at a time interval L ≤ 2/  c to avoid frequency-domain aliasing of the time sequence X(n,)   c is the bandwidth of W() [- c,  c ] -c-c cc

2 June 2016Veton Këpuska55 Time-Frequency Sampling

2 June 2016Veton Këpuska56 Time-Frequency Sampling  Sufficient (but not necessary) conditions for signal reconstruction are: 1.Window is non-zero over its lengths N w 2.Temporal decimation factor L ≤ N w ( 2/  c ) 3.Frequency sampling interval 2/N ≤ 2/N w  To avoid aliasing: I.In the time domain - by ensuring condition 3. II.In the frequency domain - by ensuring condition 2.

2 June 2016Veton Këpuska57 Time Decimation Sampling  Implication on the use of practical windows: I.Rectangular window, N w Assuming bandwidth equal to the extent of the main lobe B = [-2/N w, : 2/N w ]= 4/N w  ;50% Overlap in windows II.Hamming Window, N w Bandwidth B = 8/N w  ;75% Overlap in windows -c-c cc

2 June 2016Veton Këpuska58 Summary  OLA Method (DFT of order N) 1.No time aliasing if window length N w so that: 2/N ≤ 2/N w 2.No frequency-domain aliasing occurs if decimation factor L is small enough so that filter bandwidth  c =(2/L) 3.If zeros are allowed in W() then condition 2 can be relaxed. In this case we can under-sample in frequency and still recover the sequence.

2 June 2016Veton Këpuska59 Summary  FBS Method 1.No frequency-domain aliasing occurs if the decimation factor L meets the Nyquist criterion, i.e., L ≤ N w ( 2/  c ) where  c is the w[n] bandwidth. 2.Not time-domain aliasing occurs if 2/N ≤ 2/N w N w ≤ N. 3.If zeros in w[n] are allowed then condition 2 can be relaxed. In this case we can under-sample in time and still recover the sequence.

2 June 2016Veton Këpuska60 Short-Time Fourier Transform Magnitude (STFTM)  Spectrogram major tool in speech applications:  Spectrogram is squared STFT magnitude (STFTM). It has been suggested that human ear extracts perceptual information strictly form a spectrogram- like-representation of speech ( J.C. Anderson, “Speech Analysis/Synthesis Based on Perception”, PhD Thesis, MIT, 1984) Experienced speech researchers have trained themselves to “read” the spectrogram itself (Victor Zue, MIT).  Primary topic of FIT-ece5528 – “Acoustics of American Speech”.

2 June 2016Veton Këpuska61 Short-Time Fourier Transform Magnitude (STFTM)  STFTM discards (possibly) phase information, which has numerous uses in application areas: Time-scale modification Speech Enhancement  In all these applications phase information estimation of speech is difficult (e.g., presence of noise in the signal)  Furthermore, a number of techniques have been developed to obtain phase estimate from a STFT magnitude.  This section introduces STFTM as an alternative time- frequency signal representation.  In addition analysis and synthesis techniques will be developed for STFTM.

2 June 2016Veton Këpuska62 Short-Time Fourier Transform Magnitude (STFTM)  Squared-Magnitude and Autocorrelation Relationship: m-autocorrelation “lag” Short-time autocorrelation Short-time magnitude

2 June 2016Veton Këpuska63 Short-Time Fourier Transform Magnitude (STFTM)  Furthermore, the autocorrelation r[n,m] is given by the convolution of the short-time signal: r[n,m] = f n [m]*f n [-m] where f n [m]=x[m]w[n-m]

2 June 2016Veton Këpuska64 Signal Representation  Under what conditions STFTM can be used to represent a sequence uniquely?  Note that: |F {x[n]} | = |F {-x[n]} | ⇒ Ambiguity, thus STFTM is not unique representation for all cases.  However, by imposing certain mild restrictions on: the analysis window and the signal, unique signal representation is indeed possible with the discrete-time STFTM.

2 June 2016Veton Këpuska65 Signal Representation  Suppose x[n] is the sum of two signals: x 1 [n] and x 2 [n] occupying different regions of the n-axis.  Furthermore, suppose that the gap of zeros between x 1 [n] and x 2 [n] is large enough so that there is no analysis window position for which the corresponding short- time section includes non-zero samples of both x 1 [n] and x 2 [n].  Because of the ambiguity condition STFTM of: x 1 [n] + x 2 [n] x 1 [n] - x 2 [n], and -x 1 [n] + x 2 [n] is the same.

2 June 2016Veton Këpuska66 Signal Representation  Any uniqueness conditions must include a restriction on the length of zero gaps between non-zero portions of the signal x[n].  Sufficient uniqueness conditions are the following: 1.The analysis window w[n] is known sequence of finite length N w, with no zeros over its durations. 2.The sequence x[n] is one-sided with at most N w -2 consecutive zero samples, and the sign of its first non-zero value is known.

2 June 2016Veton Këpuska67 Signal Representation  If the successive STFTM correspond to overlapping signal segments then: If short-time spectral magnitude of signal segment at time n is know then Spectral magnitude of the adjacent section at time n+1 must be consistent in the region of overlap with the known short-time section. ⇒ If the analysis window were non-zero and of length N w, then after dividing out the analysis window, the first N w -1 samples of the segment at time n+1, must equal the last N w -1 of the segment at time n (as illustrated in the next slide) ⇒ If the last sample of a segment can be extrapolated from its first N w -1 values, one could repeat this process to obtain the entire signal x[n].

2 June 2016Veton Këpuska68 Signal Representation

2 June 2016Veton Këpuska69 Signal Representation  To develop the procedure for extrapolating the next sample of a sequence using its STFTM, assume that the first N w -1 samples under the analysis window positioned at time n are known. The sequence x[n] has been obtained up to some time n-1 from its STFTM.  Goal is to compute sample x[n] from these initial samples and the STFT magnitude, |X(n,)|, or equivalently r[n,m].

2 June 2016Veton Këpuska70 Signal Representation  Note that r[n, N w -1], the maximum lag of autocorrelation, is given by the product of the first and last value of the segment: ⇒

2 June 2016Veton Këpuska71 Signal Representation  Note that:  If the first value of the short-time section, x[n-(N w -1)] happens to be equal to zero, must find the first non-zero value within the section and again use the product relation as depicted in the last expression.  Note that such a sample can be found because it was assumed that there are at most N w -2 consecutive zero samples between any two non- zero samples of x[n].

2 June 2016Veton Këpuska72 Signal Representation  Sequential extrapolation algorithm 1.Initialize with x[0] 2.Update time n 3.Compute r[n,N w -1] from the inverse DFT of |X(n,k)| 2. 4.Compute: 5.Return to step (2) and repeat

2 June 2016Veton Këpuska73 Reconstruction from Time- Frequency Samples  To carry out STFTM analysis on a digital computer, discrete STFTM must be applied.  Uniqueness theory of STFTM can be easily extended to discrete STFTM. Uniqueness of STFTM based on the short-time autocorrelation functions. Autocorrelation functions can be obtained even if the STFTM is sampled in frequency (discrete STFTM) with adequate frequency sampling.  To consider effects of temporal decimation with factor L, we note that adjacent short-time sections now have an overlap of N w -L instead of N w -1.

2 June 2016Veton Këpuska74 Reconstruction from Time- Frequency Samples  Sufficient uniqueness conditions for the partial overlap case: 1.The analysis window w[n] is a known sequence of finite length N w, with no zeros over its duration. 2.The sequence x[n] is one-sided with, at most N w -2L consecutive zero samples. L consecutive samples of x[n] (from the first non- zero sample) are known. This is a sufficient but not a necessary condition.

2 June 2016Veton Këpuska75 Signal Estimation from the Modified STFT or STFTM  Synthesis of a signal from a time-frequency function of a modified STFT or STFTM required in many applications.  Modification may arise due to: 1.Quantization errors (e.g., from speech coding) 2.Time-varying filtering 3.Speech Enhancement 4.Signal Rate modifications Limitations: Modifications in frequency should result in time modification that are restricted within an analysis window (Figure 7.18 next slide) Overlapping sections must undergo similar modifications (Figure 7.19)

2 June 2016Veton Këpuska76 Signal Estimation from the Modified STFT or STFTM  Example 7.5. Removal of interfering tone. Consider modifying a valid X(n,) of short time f n [m]=x[m]w[n-m] segment by inserting a zero gap where there is known to lie an unwanted interfering sine wave component. Removal of the interfering signal with H(n,). Resulting frequency representation is: Y(n,)=X(n,)H(n,) Inverse transforming it to obtain modified short-time sequence g n [m] is non-zero beyond the extent of the original short-time segment f n [m]=x[m]w[n-m].

2 June 2016Veton Këpuska77 Signal Estimation from the Modified STFT or STFTM  Example 7.6 At time n:  Suppose a time-decimated STFT, X(nL,) is multiplied by a linear phase factor e jn o to obtain Y(nL,)=X(nL,)e jn o At time (n+1)  X((n+1)L,) is multiplied by a negative of this linear phase factor e -jn o to obtain Y((n+1)L,)=X((n+1)L,) e -jn o Overlapping sections of inverse Fourier Transforms denoted by g nL [m] and g (n+1)L [m] are not consistent.

2 June 2016Veton Këpuska78 Heuristic Application of STFT Synthesis Methods  Although modifications of the STFT or STFTM may violate some principles, results may be ”reasonable”.  Resulting effect of modifying STFT (FBS and OLA) with another time-frequency function can be shown to be a time-varying convolution between x[n] and a function ĥ[n,m]: x[n]*ĥ[n,m].  Let X(n,) be modified by a function H(n,): Y(n,) = X(n,)H(n,)  This corresponds to a new short-time segment: g n [m] = f n [n]*h[n,m]  h[n,m] – time varying system impulse response (Chapter 2).

2 June 2016Veton Këpuska79 Heuristic Application of STFT Synthesis Methods  Consider FBS method (discretization in frequency to obtain):  N-point IDFT of H(n,k):  Then resulting sequence can be written as: where

2 June 2016Veton Këpuska80 Heuristic Application of STFT Synthesis Methods  Using OLA method, it can be shown (see Exercise 7.11) that:  Contrasting FBS with OLA FBS:multiplication  instantaneous change OLA:convolution  smoothing

2 June 2016Veton Këpuska81 Heuristic Application of STFT Synthesis Methods  Example 7.7 Suppose we want to deliberately introduce reverberation into a signal x[n] by convolution with the filter: h[n] = [n] +  [n-n o ] Fourier transform of which is: H() = 1 + e -jn o STFT of resulting signal is given by: Y(n,)= X(n,)H() where

2 June 2016Veton Këpuska82 Example 7.7 (cont.)  Using OLA method (7.21):  It is then possible to express y[n] in terms of original sequence:

2 June 2016Veton Këpuska83 Example 7.7 (cont.)  Where is periodic extension of h[n], over N, of which we only consider interval [0,N-1].  This implies that original reverberated signal is obtained only when n o <N, otherwise temporal alias will occur (as illustrated in 7.20).

2 June 2016Veton Këpuska84 Example 7.7 (cont.)

2 June 2016Veton Këpuska85 Time-Scale Modification and Enhancement of Speech  The signal construction methods presented in this chapter can be applied in a variety of speech applications.  Time-Scale Modification In speech case would like to change articulation rate (faster, slower) without changing the pitch

2 June 2016Veton Këpuska86 Time-Scale Modification

2 June 2016Veton Këpuska87 Time-Scale Modification  Methods: Cut & Paste (Fairbanks method):  Discard or duplicate frames, in order to speed up or slow down the articulation respectively.  Problem: Pitch period mismatch at adjacent frames causes distortion. Pitch-synchronous OLA (Scott & Gerber)  Select frame size & location synchronous to pitch periods. Problem of pitch period mismatch is avoided.  Problem: Pitch synchronization is not always easy. STFTM Synthesis  To avoid pitch synchronization problems use only the magnitude of STFT (i.e., STFTM) 1.Compute |X(nL,)| at an appropriate frame interval – decimation rate L (e.g., L=128 at Fs=10000 Hz, and N is several T0 long) 2.Modify decimation rate with new rate M (e.g., M=L/2) for a speed-up of factor of ½: |Y(nM,)|= |X(nL,)| 3.Apply the Least-Squared Error iterative estimation algorithm until |Y(nM,)| converged.  Problem: Occasional reverberant characteristic of synthesized signal are perceived due to lack of STFT phase control.

2 June 2016Veton Këpuska88 Time-Scale Modification

2 June 2016Veton Këpuska89 Noise Reduction  A number of techniques developed to remove/reduce additive noise:  Noise corrupted signal is given by: y[n]=x[n]+b[n] STFT Synthesis:  Subtract Noise spectrum Ŝ b ()  Original phase spectrum Y(nL,) is retained because phase of the noise can not be reliably estimated in general.  Factor  is a control of the degree of noise reduction.

2 June 2016Veton Këpuska90 Noise Reduction STFTM Synthesis:  Ignore phase and use Sequential Extrapolation or Least-Squared Error estimation method to construct clean signal.

Speech Processing Short-Time Fourier Transform Analysis and Synthesis.

Similar presentations

Presentation on theme: "Speech Processing Short-Time Fourier Transform Analysis and Synthesis."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Processing Short-Time Fourier Transform Analysis and Synthesis.

Similar presentations

Presentation on theme: "Speech Processing Short-Time Fourier Transform Analysis and Synthesis."— Presentation transcript:

Similar presentations

About project

Feedback