Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4: Pitch estimation for music signal processing

Similar presentations


Presentation on theme: "Chapter 4: Pitch estimation for music signal processing"— Presentation transcript:

1 Chapter 4: Pitch estimation for music signal processing
KH Wong Ch4. pitch, v4b

2 Introduction (lecture 4)
Pitch estimation is essential to many music signal applications Genre classification Music tutor: detection of playing fault Music style analysis Automatic transcription, audio signal music score Ch4. pitch, v4b

3 Techniques in pitch extraction
Time domain approaches (1) ACF (Autocorrelation function) and MACF (Modified Autocorrelation function) (2) Normalized cross correlation function NCCF (3) AMDF (Average magnitude difference function) Frequency domain approaches (4) Cepstrum Pitch Determination (CPD) Ch4. pitch, v4b

4 Definition of pitch What is the pitch (音高) of a tone?
Answer: The perceived frequency of sound. (wiki) Ch4. pitch, v4b

5 Method 1: ACF (Autocorrelation function)
Autocorrelation function (ACF) R x n Symmetrical on both side m Ch4. pitch, v4b

6 What is Auto-correlation, R(m)?
E.g. x=[ ] N=5, R(0)=[x(0)*x(0)+x(1)*x(1)+x(2)*x2+x(3)*x(3)+x(4)*x(4)] R(0)= ( )=92 R(1)=[x(0)*x(1)+x(1)*x(2)+x(2)*x(3)+x(3)*x(4)] [ ] ( )=51 And so on… R=[ ] Ch4. pitch, v4b

7 Exercise 4.1 First, what is auto-correlation?
%matlab code fs=1 x=[ ]' auto_corr_x=xcorr(x) %auto-correlation figure(1), clf subplot(2,1,1),plot(x) grid on, grid(gca,'minor'), hold on subplot(2,1,2),plot(auto_corr_x) grid on, grid(gca,'minor') [pks,locs] = findpeaks(auto_corr_x) [mm,peak1_ind]=max(pks) 'peak value1 at location' pks(peak1_ind) %peak locs (peak1_ind) %location 'peak value2 at location' pks(peak1_ind+1)%peask next to the top peak locs (peak1_ind+1) %location period=locs(peak1_ind+1)-locs(peak1_ind) pitch_Hz=fs/period %display pitch in Hz %peaks at t=11,15, dt=15-11=4 Exercise: Show the steps of calculation Exercise 4.1 First, what is auto-correlation? X[t] t Auto_correlation(x[t]) We only look at positive n Gap between two peaks is 4, so period of X is around 4 Ch4. pitch, v4b Ans: ??

8 autocorrelation When a segment of a signal is correlated with itself, the distance (-=Lag_time_in_samples) between the positions of the maximum and the second maximum correlation is defined as the fundamental period (1/pitch_frequency) of the signal. Lag Time j in samples Auto correlation R(j) Rthe_max (j1) Rsecond_max (j2) j1=0 j2 Ch4. pitch, v4b

9 Then the fundamental frequency can be calculated as:
Usually =0, because is at . Ch4. pitch, v4b

10 Testing a real sound A5_flute 880Hz, (sampling at fs=44100Hz)
%testing a real sound , matlab code %x=[ ], [xx,fs,nbits]=wavread('c:\sounds\A5_flute.wav'); sound(x,fs)%fs=44100Hz, fs %sampling freuqncy start=10000; %pitch a fram around t=10000 length=512; x=xx(start:start+length); auto_corr_x=xcorr(x); %auto-correlation figure(1), clf subplot(2,1,1),plot(x) title(' one frame of the sound A5-flute=880Hz') grid on, grid(gca,'minor'), hold on subplot(2,1,2),plot(auto_corr_x) title('cross correlation result') grid on, grid(gca,'minor') [pks,locs] = findpeaks(auto_corr_x) [mm,peak1_ind]=max(pks) 'peak value1 at location' pks(peak1_ind) %peak locs (peak1_ind) %location 'peak value2 at location' pks(peak1_ind+1)%peask next to the top peak locs (peak1_ind+1) %location period=locs(peak1_ind+1)-locs(peak1_ind) pitch_Hz=fs/period %display pitch in Hz Testing a real sound A5_flute 880Hz, (sampling at fs=44100Hz) (x[t]) Auto_correlation(x[t]) 2 peaks at t=513, 563 Use sort( ) in matlab to find the two peaks, The gap between 2 peaks is dt= =50, hence frequency is fs/dt=44100/50=882 Hz. Note: Pitch of a flute sound played by a human may not be too stable. Ch4. pitch, v4b

11 Modified Auto-Correlation Method: Auto-Correlation Method enhanced by Center clipping
It will give more accurate result because higher frequency signals will not interfere with the result X(n) CL CL clc(x)=Cut(remove) the middle part Typical CL =1/4 peak-to-peak of X n y(n) =clc(x) n Ch4. pitch, v4b

12 Finding pitch by center clipping
X(n) Y(n)= Center Clipped In R(m) auto correlation of x(n), it is not easy to pick peaks In R’(m), auto correlation of clipped signal y(n)=clc{x(n)}, peaks are easy to pick R(m) R’(m) T1 T2 T3 T=mean(T1,T2,T3)= Period=1/(pitch_frequency) Ch4. pitch, v4b

13 The MACF (Modified Autocorrelation function) algorithm
Ch4. pitch, v4b

14 Example For each frame, find a pitch.
Plot pitch against time (blue), you can see the pitch profile X(n) time Pitch (n) frequency Time n (frame) Ch4. pitch, v4b

15 Class exercise 4.2 x=[ ], If Fs= sampling frequency= 1Hz. (a) Find pitch of this signal x using ACF (Autocorrelation function) . (b) Repeat above of if Fs = 8KHz Ch4. pitch, v4b

16 Method 2: Normalized cross correlation function NCCF method [Verteletskaya 2009 ]
Ch4. pitch, v4b

17 An intuitive method, just pick the peaks and find the period
Method 3: Average Magnitude Difference Function (AMDF) Method [Verteletskaya 2009 ] An intuitive method, just pick the peaks and find the period Find peaks in D, the estimated period is the average gaps between two neighboring –ve peaks peaks Ch4. pitch, v4b

18 Method 4: Cepstrum Pitch Determination (CPD) [Verteletskaya 2009 ]
Peak at Q’, Pitch =1/0.006= 166Hz. Q’ The problem : For human voice, the peak may be the result of glottal excitation. Ch4. pitch, v4b

19 For human voice pitch detection (or recognition )
We must study its structure of the vocal system and find out how to get the accurate answer. vocal system has 2 elements Glottal excitation (no use for pitch measurement) Vocal tract filter Use liftering to remove glottal excitation before we use the spectrum of the vocal tract filter for pitch extraction. Ch4. pitch, v4b

20 Cepstrum of speech A new word by reversing the first 4 letters of spectrum  cepstrum. It is the spectrum of a spectrum of a signal Why we need this? Answer: remove the ripples of the spectrum caused by glottal excitation. Too many ripples in the spectrum caused by vocal cord vibrations. But we are more interested in the speech envelope for recognition and reproduction Fourier Transform Speech signal x Ch4. pitch, v4b Spectrum of x

21 Liftering method: Select the higher and lower samples
Signal X(n) Cepstrum= C(n)=fft|(log|fft(x(n))|)| Select high time liftering, select C_high (lower frequency):glottal excitation Select low time liftering, Select C_low (higher frequency) :Vocal tract filter response Quefrency is in time domain (in second) So Higher Quefrency lower frequency Ch4. pitch, v4b

22 Recover Glottal excitation and vocal track spectrum
Spectrum of glottal excitation Cepstrum of glottal excitation C_high For Glottal excitation Vocal track Frequency Spectrum of vocal track filter Cepstrum of vocal track Frequency quefrency (sample index) This peak may be the pitch period: This smoothed vocal track spectrum can be used to find pitch For more information see : Ch4. pitch, v4b

23 Measure pitch of musical instruments Example: Find pitch of Oboe A4 sound A4_Oboe Spectrogram Ch4. pitch, v4b

24 Example: Find pitch of Oboe A4 sound http://www. cse. cuhk. edu
The first peak of the cepstrum (in Quefrency) time= (1/time)=F1=440.91Hz is the pitch, it has the strongest energy Input: Oboe A4 X(n) Fourier Transform X(w)=fft(x) Cepstrum C(n)=fft|(log|fft(x(n))|)| From range 200 To 900 Hz Cepstrum C(n) All range, around From 30 to  Hz Found two Harmonics 440, 220Hz This axis is in x10^-3  Hz 900Hz 1/900=1.11x10^-3 The second peak: time= (1/time)=F2= 200Hz 1/200=5x10^-3 Ch4. pitch, v4b

25 Summary Methods of pitch extraction have been studied.
Cepstrum and its use for pitch extraction is discussed. Ch4. pitch, v4b

26 References [Naotoshi Seo 2007] Project: Pitch Detection, ] [Verteletskaya 2009 ] E. Verteletskaya, B. Šimák,” Performance Evaluation of Pitch Detection Algorithms”, [Rabiner1976] Rabiner, L.; Cheng, M.; Rosenberg, A.; McGonegal, C." A comparative performance study of several pitch detection algorithms",IEEE Transactions on Acoustics, Speech and Signal Processing, Volume: 24, Issue:5 page(s): , Oct 1976 Ch4. pitch, v4b

27 Appendix Ch4. pitch, v4b

28 Music Frequency table http://wc. pima
Ch4. pitch, v4b

29 Music frequency table % source : http://www. angelfire
Ch4. pitch, v4b

30 Autocorrelation In signal processing, given a signal f(t), the continuous autocorrelation is the continuous cross-correlation of f(t) with itself, at lag τ, and is defined as: In discrete system, autocorrelation R at lag j for signal is defined as: Ch4. pitch, v4b

31 Anwer4.1: Exercise 4.1 First, what is auto-correlation?
%matlab code x=[ ]' auto_corr_x=xcorr(x) %auto-correlation figure(1), clf subplot(2,1,1),plot(x) grid on, grid(gca,'minor'), hold on subplot(2,1,2),plot(auto_corr_x) grid on, grid(gca,'minor') Exercise: Show the steps of calculation X[t] t Auto_correlation(x[t]) We only look at positive n Gap between two peaks is 4, so period of X is around 4 Ans: [ ] Ch4. pitch, v4b

32 Answer 4.2 for exercise 4.2 It is using MACF, you can use ACF, and the result for the pitch found is the same for this example. Question: x=[ ], sampling at 1Hz.Find pitch of this signal x using MACF (Modified Autocorrelation function) . %%%%%%%%%%%%%%Answer: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% orginal_x = x =centered_wave =orginal_x-mean_x = cl=center clipped range= 2 y =center clipped signal= (a) if the sampling frequency Fs = 1KHz >> Answer: from the autocorrelation result of y in the figure, we can see that the distance between 2 peaks is 3, so pitch is 1/3 Hz, since the sampling is 1 Hz.. Ch4. pitch, v4b

33 Answer 4.2: Class exercise 4.2
2nd diagram, R(+ve only) , pick 2 peaks, Period is 3, frequency =1/3 hz (b) if FS = 8KHz Answer: If the sampling frequency is Fs=8KHz, sampling period is dt=1/Fs=(1/8)ms , the period of x is 3 units, therefore the actual time is 3*dt= 3*(1/8)ms. The frequency of x is 1/dt=(8/3) KHz Ch4. pitch, v4b

34 Matlab Ch4. pitch, v4b %assume the signal x is voltage against time
%center clip means set those signals with levels within the clipped regions %center = mean voltage level of the whole signal %positive peak = maxim,um of the signal voltage %negative peak = minimum of the signal voltage %center clip regions are:(i) from center to 1/2 of center_to_positive peak % (ii) from center to -1/2 from center_to_negative peak for t=1:n if x(t)<cl & x(t) > -1*cl %those within center clipped region set to 0 y(t)=0; else y(t)=x(t); end; end ; auto_corr_y=xcorr(y) %auto correlation figure(2) clf subplot(3,1,1),plot(x) ylabel('x=centered wave') subplot(3,1,2),plot(y) ylabel('y=center clipped wave') hold on subplot(3,1,3),plot(auto_corr_y) ylabel('auto correlation of y') xlabel('time ') max_list=max(y) fs 'orginal_x ' , orginal_x 'x =centered_wave =orginal_x-mean_x ' , x 'cl=center clipped range', cl 'y =center clipped signal' , y %Ver2, MACF (Modified Autocorrelation function)using center clipping clear %select one of the followings %real_data=1 %1 or 0 real_data=0 if real_data==1 %use real sound %[x,fs]=wavread ('d:\0music\sounds\violin3.wav'); [orginal_x,fs]=wavread ('violin3.wav'); x=x(10000:11000); else %use test data %x=[ ] orginal_x=[ ] fs=1 %assume frquecy is 1Hz end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% test x=orginal_x-mean(orginal_x) n=length(x) maxx=max(x) minx=min(x) dd=maxx-minx figure(1) clf plot(x) %pause %center clipping algo for pitch extraction cl=dd/4000 cl=dd/4 %center clippped "cl" length is 1/4 of total peak-to_peak span pause Ch4. pitch, v4b


Download ppt "Chapter 4: Pitch estimation for music signal processing"

Similar presentations


Ads by Google