Microcomputer Systems 2

Microcomputer Systems 2
Digital Systems: Hardware Organization and Design 9/21/2018 Microcomputer Systems 2 Time Stretching & Pitch Shifting of Audio Signals Architecture of a Respresentative 32 Bit Processor

Time Stretching & Pitch Shifting of Audio Signals
Digital Systems: Hardware Organization and Design 9/21/2018 Time Stretching & Pitch Shifting of Audio Signals Outline Introduction Techniques Used for Time Compression/Expansion and Pitch Shifting Comparison Timbre and Formants Architecture of a Respresentative 32 Bit Processor

Digital Systems: Hardware Organization and Design
9/21/2018 Outline Introduction Frequency Shift vs. Pitch Shift – Audio Examples Time Compression/Expansion Techniques Used for Time Compression/Expansion and Pitch Shifting The Phase Vocoder Related Topics Why Phase Time Domain Harmonic Scaling (TDHS) More recent approaches Comparison Which Method to Use Pitch Shifting Considerations Audio Examples Timbre and Formants Phase Vocoder and Formants Time Domain Harmronic scaling and Formants 21 September 2018 Veton Këpuska Architecture of a Respresentative 32 Bit Processor

Introduction Time Stretching & Pitch Shifting
Are two dominant techniques that used for speech and sound manipulation. Typical applications entail: Changing the speed of play-back (altering the length of the signal) without altering the pitch of the voice and/or instruments Changing the pitch of the voice and/or instruments without changing the length of the signal. 21 September 2018 Veton Këpuska

Pitch Shifting

Pitch Shifting: As opposed to the process of pitch transposition achieved using (a simple) sample rate conversion, Pitch Shifting is a way to change the pitch of a signal without changing its length. In practical applications, this is usually achieved by changing the length of a sound using one of the methods discussed next and then performing a sample rate conversion to change the pitch. 21 September 2018 Veton Këpuska

Introduction Pitch Shifting is NOT Frequency Shifting:
There exists a certain confusion in terminology in the literature, as Pitch Shifting is often also incorrectly named 'Frequency Shifting'. A true Frequency Shift (as obtainable by modulating an analytic signal by a complex exponential) will shift the spectrum of a sound, while Pitch Shifting will dilate it, upholding the harmonic relationship of the sound. Frequency Shifting yields a metallic, inharmonic sound which may well be an interesting special effect but which is a totally inadequate process for changing the pitch of any harmonic sound except a single sine wave. 21 September 2018 Veton Këpuska

Audio Examples of Pitch Shifting vs. Frequency Shifting
Original Sound: Pitch Shifted: Frequency Shifted: 21 September 2018 Veton Këpuska

Time Compression/Expansion

Time Compression/Expansion, also known as "Time Stretching" is the reciprocal process to Pitch Shifting. It leaves the pitch of the signal intact while changing its speed (tempo). This is a useful application when you wish to change the speed of a voiceover without messing with the timbre of the voice. 21 September 2018 Veton Këpuska

There are several fairly good methods to do time compression/expansion and pitch shifting but most of them will not perform well on all different kinds of signals and for any desired amount of shift/stretch ratio. Typically, good algorithms allow pitch shifting up to 5 semitones on average or stretching the length by 130%. When time stretching and pitch shifting single instrument recordings you might even be able to achieve a 200% time stretch, or a one-octave pitch shift with no audible loss in quality. 21 September 2018 Veton Këpuska

Time Compression/Expansion of Speech
Typical Goals To either speed up or slow down a speech signal while maintaining the approximate pitch Applications Change voice mail playback Court stenographers-play proceedings quicker Sound effects Etc… 21 September 2018 Veton Këpuska

Techniques Used for Time Compression/Expansion & Pitch Shifting
Option 1 – Change sample rate If you modify the sample rate, you can change the speed but the pitch is also changed Increase sample rate = higher pitch (chipmunk sound) Decrease sample rate = lower pitch (drawn out echo sound) Option 2 – Decimate or Interpolate Signal If you change the number of samples, the result is the same as modifying the sample rate 21 September 2018 Veton Këpuska

Option 3 – Use more complex methods This will change the speed of the sample while preserving the pitch data Short Time Fourier Transform Short Time Fourier Transform Magnitude Sinusoidal Synthesis Linear Prediction Synthesis 21 September 2018 Veton Këpuska

Currently, there are two different principal time compression/expansion and pitch shifting schemes employed in most of today's applications: Phase Vocoder. Time Domain Harmonic Scaling (TDHS). 21 September 2018 Veton Këpuska

Phase Vocoder

Phase Vocoder Phase Vocoder. This method was introduced by Flanagan and Golden in 1966 and digitally implemented by Portnoff ten years later. Portnoff, M.R. 1981a. "Short-Time Fourier Analysis of Sampled Speech." IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(3): Portnoff, M.R. 1981b. "Time-Scale Modification of Speech Based on Short-Time Fourier Analysis." IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29(3): 21 September 2018 Veton Këpuska

Phase Vocoder It uses a Short Time Fourier Transform (use abbreviation STFT from here on) to convert the audio signal to the complex Fourier representation. Since the STFT returns the frequency domain representation of the signal at a fixed frequency grid, the actual frequencies of the partial bins have to be found by converting the relative phase change between two STFT outputs to actual frequency changes. Note the term 'partial' has nothing to do with the signal harmonics. In fact, a STFT will never readily give you any information about true harmonics if you are not matching the STFT length to the fundamental frequency of the signal – and even then is the frequency domain resolution quite different to what our ear and auditory system perceives. The timebase of the signal is changed by calculating the frequency changes in the Fourier domain on a different time basis, and then an iSTFT is done to regain the time domain representation of the signal. 21 September 2018 Veton Këpuska

Phase Vocoder Phase vocoder algorithms are used mainly in scientific and educational software products (to show the use and limitations of the Fourier Transform) but have gained in popularity over the past few years due to improvements that made it possible to greatly reduce the artifacts of the "original" phase vocoder algorithm. The basic phase vocoder suffers from a severe drawback because it introduces a considerable amount of artifacts audible as 'smearing' and 'reverberation' (even at low expansion ratios) due to the “non-synchronized vertical coherence of the sine and cosine basis functions” that are used to change the timebase. 21 September 2018 Veton Këpuska

Phase Vocoder Puckette, Laroche and Dolson have shown that the phasiness can be greatly reduced by picking peaks in the Fourier spectrum and keeping the relative phases around the peaks unchanged. Even though this improves the quality considerably it still renders the result somewhat phasey and diffuse when compared to time domain methods. Current research focuses on improving the phase vocoder by applying intra-frame sinusoidal sweep and ramp rate correction (Bristow-Johnson and Bogdanowicz) and multi-resolution phase vocoder concepts (Bonada). 21 September 2018 Veton Këpuska

Links to Publicly Available Vocoders
Pointers - Phase Vocoder: The MIT Lab Phase Vocoder WaveMasher - GPL/Open Source Phase Vocoder by Kenneth Sturgis Sculptor: A Real Time Phase Vocoder by Nick Bailey A Phase Vocoder implementation using Matlab More reading on the Phase Vocoder The IRCAM "Super Phase Vocoder“ S.M.Bernsee's Pitch Shifting Using The Fourier Transform article (with C code) 21 September 2018 Veton Këpuska

Time Domain Harmonic Scaling (TDHS).

Time Domain Harmonic Scaling (TDHS).
Time Domain Harmonic Scaling (TDHS). This is based on a method proposed by Rabiner and Schafer in It is heavily based on a correct estimate of the fundamental frequency of the sound processed. 21 September 2018 Veton Këpuska

Theory Short Time Fourier Transform Methods
Chapter 7 in our text (Discrete-Time Speech Signal Processing) Refer to notes from in class for mathematical theory of operation I will pick up from where Dr. Kepuska stopped in his notes 21 September 2018 Veton Këpuska

How is the Speech/Sound Signal Processed
Link: Ch7-Short-Time_Fourier_Transform_Analysis_and_Synthesis.ppt 21 September 2018 Veton Këpuska

Terminology & Basic Idea
Frame Rate Window Size 21 September 2018 Veton Këpuska

Short Time Fourier Transform
Also called the Fairbanks method Extract successive short-time segments and then discard the following ones STFT Decimate Samples IFFT OLA Signal Output 21 September 2018 Veton Këpuska

Frame Rate factor L In frequency domain after taking the STFT, you get X(nL,ω) Form a new signal by Y(nL, ω) = X(snL, ω) where s = compression factor Take Inverse Fourier Transform Use Overlap and Add method to form new signal 21 September 2018 Veton Këpuska

X(nL, ω) Y(nL, ω) = X(2nL, ω) 21 September 2018 Veton Këpuska

New Sequence Original Windowed Sequence 21 September 2018 Veton Këpuska

Problems Pitch Synchronization It is highly likely that the pitch periods will not line up properly 21 September 2018 Veton Këpuska

Short Time Fourier Transform Magnitude
Problems with STFT method relate directly to the linear phase component of the STFT Time shift = phase change Alternate approach is to only use the magnitude portion of the STFT—Short Time Fourier Transform Magnitude 21 September 2018 Veton Këpuska

Compression With the Fairbanks method, time slices were discarded Now we can just compress the time slices Form a new signal by |Y(nM, ω)| = |X(nL, ω)| where M = compression factor = L / speed i.e. for speeding up by two => M = L/2 21 September 2018 Veton Këpuska

Compression Take Inverse Fourier Transform Use Overlap and Add method to form new signal 21 September 2018 Veton Këpuska

X(nL, ω) Y(nM, ω) = X(nL, ω) M=L/2 21 September 2018 Veton Këpuska

New Sequence Original Windowed Sequence 21 September 2018 Veton Këpuska

Other Methods Sinusoidal Synthesis—Chapter 9
Time-warp the sinewave frequency track and the amplitude function This technique has been successful with not only speech but also music, biological, and mechanical signals Problems Does not maintain the original phase relations Suffer from reverberance 21 September 2018 Veton Këpuska

Other Methods Linear Prediction Synthesis
Use Homomorphic and Linear Prediction results to modify the time base Book briefly mentions this is possible but ran out of time before I could investigate this process more 21 September 2018 Veton Këpuska

Other Methods New Techniques Software
Internet search showed several methods trying to improve on what is out there now Software Different software programs that will change speed for you Adobe Audition is one of the most all encompassing right now 21 September 2018 Veton Këpuska

Matlab Code -Prepare the Workspace
%%%%%%%%%%%%%%%% % Prepare Workspace close all; clear all; window_size_1 = 200; frame_rate_1 = 100; %Speed to slow down by speed = 2; 21 September 2018 Veton Këpuska

Matlab Code -Load the Speech Signal
%%%%%%%%%%%%%%%% % Load Data File filename = input('Please enter the file name to be used. '); [sample_data,sample_rate,nbits] = wavread(filename); loop_time = floor(max(size(sample_data))/frame_rate_1); sample_data((max(size(sample_data))):(loop_time+1)* frame_rate_1)=0; 21 September 2018 Veton Këpuska

Matlab Code -Develop the Window
%%%%%%%%%%%%%%%% % Create Windows % Want windows of 25ms % File sampled at 10,000 samples/sec % Want a window of size * 25ms(10ms) triangle_30ms = triang(window_size_1); %triangle_30ms = hamming(window_size_1); W0 = sum(triangle_30ms); 21 September 2018 Veton Këpuska

Matlab Code -Window the Entire Speech Signal
%%%%%%%%%%%%%%%% % Window the speech for i =0:loop_time-1 window_data(:,i+1)=sample_data((frame_rate_1*i)+1:((i+2)* frame_rate_1)).*triangle_30ms; end 21 September 2018 Veton Këpuska

Matlab Code -Perform the Fast Fourier Transform
%%%%%%%%%%%%%%%% % Create FFT for i = 1:loop_time window_data_fft(:,i) = fft(window_data(:,i),1024); end 21 September 2018 Veton Këpuska

Matlab Code -Recreate the Modified Signal
%%%%%%%%%%%%%%%% % Recreate Original Signal %Initialize the recreated signals reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; real_reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0; modified_reconstructed_signal(1:(loop_time+3)*(frame_rate_1/speed))=0; modified_reconstructed_signal_compressed(1:(loop_time+3)* (frame_rate_1/ speed))=0; 21 September 2018 Veton Këpuska

% Perform the ifft for i = 1:loop_time recreated_data_ifft(:,i) = ifft(window_data_fft(:,i),1024); real_recreated_data_ifft(:,i) = ifft(abs(window_data_fft(:,i)),1024); truncated_recreated_data_ifft(:,i) = recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); real_truncated_recreated_data_ifft(:,i) = real_recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0); end 21 September 2018 Veton Këpuska

% Get back to the original signal for i=0:loop_time-1 reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + truncated_recreated_data_ifft(:,i+1)'; real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i+1)'; end 21 September 2018 Veton Këpuska

% Get a modified signal by deleting certain parts (STFT) for i=0:(loop_time-1)/speed modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i*speed+1)'; end 21 September 2018 Veton Këpuska

% Initialize the compressed sequence (STFTM) modified_reconstructed_signal_compressed(1:frame_rate_1+frame_rate_1/speed+1)=truncated_recreated_data_ifft(frame_rate_1-frame_rate_1/speed:window_size_1,1)'; % Get a modified signal by compressing for i=0:(loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/speed*i)+1:(frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/speed*i)+1:(frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(:,i+2)'; end 21 September 2018 Veton Këpuska

Matlab Code -Plot Results
%%%%%%%%%%%%%%%% % Plot Results Figure; subplot(211) plot(sample_data) title('Original Speech'); v1=axis; hold on; subplot(212) plot(real(modified_reconstructed_signal)) title(['STFT Synthesis w/ Speed = ',num2str(speed),'X']); v2=axis; if speed > 1 subplot(211); axis(v1) subplot(212); axis(v1) else subplot(211); axis(v2) subplot(212); axis(v2) end 21 September 2018 Veton Këpuska

Matlab Code -Write Sound Files
%%%%%%%%%%%%%%%% % Write sound files wavwrite(modified_reconstructed_signal,sample_rate,nbits,'C:\Classes\ECE_5525\tea party fairbanks 2x.wav') 21 September 2018 Veton Këpuska

Examples Baseline Samples
Sample Rate 2X STFT Sound file Original File Sample Rate .5X STFTM Sound file 21 September 2018 Veton Këpuska

Examples STFT—Speed 0.5X Sound file 21 September 2018 Veton Këpuska

Examples STFT—Speed 2X Sound file 21 September 2018 Veton Këpuska

Examples STFT—Speed 4X Sound file 21 September 2018 Veton Këpuska

Examples STFTM—Speed 0.5X
Sound file 21 September 2018 Veton Këpuska

Examples STFTM—Speed 2X

Examples STFTM—Speed 4X

More Results Change in window size
If the window size becomes too small, then a change in pitch will occur Need window to be 2 to 3 pitch periods long I generally used 20 – 30 ms windows 21 September 2018 Veton Këpuska

More Results Change in frame rate
If the frame rate decreases too much, then there will be too many samples overlapping to get an intelligible signal 21 September 2018 Veton Këpuska

More Results Change filter type
Tried Hamming—not much perceptual difference Using the window energy becomes important here Frame Rate/W0 is not equal to one 21 September 2018 Veton Këpuska

Conclusion Optimum area
Frame rate is one half of the window size Window size needs to be 2 to 3 pitch periods long It is possible to easily change the time scale and still maintain the original pitch although the result is not always natural sounding 21 September 2018 Veton Këpuska

Conclusion Further investigation
What to do when you want to slow down over half. Using the STFTM means there will be gaps between the sequences 21 September 2018 Veton Këpuska

What to do when you want to slow down over half Could replicate windowed segments 21 September 2018 Veton Këpuska

Use the other methods to determine quality Implement Sinusoidal Synthesis Implement Linear Predictive Synthesis using linear prediction and homomorphic methods Work on synchronizing pitch periods Shift samples so that the peaks line up Scott and Gerber—Synchronized Overlap and Add (SOLA) Cross-correlation of two samples to find peak Use the peaks to line up samples Align the window at same relative location within a pitch period 21 September 2018 Veton Këpuska

Questions Are there any questions? 21 September 2018 Veton Këpuska

References Quatieri, Thomas E. Discrete-Time Speech Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2002. Rabiner, L.R. and Schafer, R.W. Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River, NJ, 1978. Oppenheim, A.V and Schafer, R.W. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, 1975. Scott, R. and Gerber, S. “Pitch Synchronous Time-Compression of Speech,” Proc. Conf. Speech Communications Processing, p63-85, April 1972. 21 September 2018 Veton Këpuska

References Fairbanks, G., Everitt, W.L., and Jaeger, R.P. “Method for Time or Frequency Compression-Expansion of Speech,” IEEE Transaction Audio and Electroacoustics, vol. AU-2 pp.7-12, Jan 1954. 21 September 2018 Veton Këpuska

Reference Material http://www.dspdimension.com/ of Stephan M. Bernsee
21 September 2018 Veton Këpuska

Microcomputer Systems 2

Similar presentations

Presentation on theme: "Microcomputer Systems 2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Microcomputer Systems 2

Similar presentations

Presentation on theme: "Microcomputer Systems 2"— Presentation transcript:

Similar presentations

About project

Feedback