Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University

Similar presentations


Presentation on theme: "Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University"— Presentation transcript:

1

2 Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University jang@mirlab.org http://mirlab.org/jang

3 Pitch ( 音高) zDefinition of pitch yFundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform yPitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later) zCharacteristics of pitch yNoise and unvoiced sounds do not have pitch.

4 Pitch Tracking ( 音高追蹤 ) z Pitch tracking (PT): The process of computing the pitch vector of a give audio segment ( 對整段音訊求 取音高 ) z Sample applications y Query by singing/humming ( 哼唱選歌 ) y Tone recognition for Mandarin ( 華語的音調辨識 ) y Intonation scoring for English ( 英語的音調評分 ) yProsody analysis for speech synthesis ( 語音合成中的韻律 分析 ) y Pitch scaling and duration modification ( 音高調節與長度 改變 )

5 Typical Steps for Pitch Tracking zPre-processing yFiltering yExcitation extraction zMain processing yFrame blocking yPDF (periodicity detection function) computation yPitch candidates via max picking over PDF zPost-processing yUnreliable pitch removal via volume/clarity thresholding yPitch refinement via parabolic interpolation yPitch smoothing via median filters, etc.

6 Frame Blocking Sample rate = 16 kHz Frame size = 512 samples Frame duration = 512/16000 = 0.032 s = 32 ms Overlap = 192 samples Hop size = frame size – overlap = 512-192 = 320 samples Frame rate = 16000/320 = 50 frames/sec = Pitch rate Zoom in Overlap Frame

7 Periodicity Detection Functions zPDF (periodicity detection function) is used to detect the period of a waveform zTwo categories of PDF y Time domain ( 時域 ) xACF (Autocorrelation function) xNSDF (Normalized squared difference function) xAMDF (Average magnitude difference function) y Frequency domain ( 頻域 ) xHarmonic product spectrum xCepstrum

8 ACF: Auto-correlation Function Shifted frame s(t-  ): Original frame s(t):  =30 acf(30) = inner product of the overlap part  Pitch period To play safe, the frame size needs to cover at least two fundamental periods! 0-index based, [s(0), s(1), …, s(n-1)] Quiz candidate! Quiz candidate!

9 ACF: Formula 1 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t-  ): s(t):  s(t-  ) t s(t) Shift to right

10 ACF: Formula 2 zAssume a frame is represented by s(t), t=0~n-1 zACF formula s(t+  ): s(t):  s(t+  ) t s(t) Shift to left This formula is the same as the previous one!

11 Example of ACF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yMax of ACF occurs at index 131 yFF = 16000/131 = 123.077 Hz zframe2acf01.mframe2acf01.m Index 0 Index 131 We suppose it is zero-based indexing.

12 Locating the Pitch Point zIf the range of human’s FF is [40, 1000], then we have the interval for locating fundamental period (FP): zframe2acfPitchPoint01.mframe2acfPitchPoint01.m Index: 0 Index: FP Quiz candidate! Sample rate

13 Locating the Fundamental Period (II) zThe human pitch range could go wrong yPitch too high xVitas (local short clip)Vitaslocal short clip xWhistlingWhistling yLow-pitch singing/humming  requires a big frame size

14 Example of ACF Based PT zSpecs ySample rate = 11025 Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = 31.25 f/s zPlayback yOriginal singingOriginal singing yPitch by ACFPitch by ACF zwave2pitchByAcf01.mwave2pitchByAcf01.m

15 Example of ACF Based PT (II) zSpecs yThe previous script is converted into a function pitchTrackingSimple.m for easy access. zptByAcf01.mptByAcf01.m

16 Demo of ACF-based PT zReal-time display of ACF for pitch tracking ygoPtByAcf.mdl under SAP toolbox zReal-time pitch tracking for mic input ygoPtByAcf2.mdl under SAP toolbox

17 ACF Variants to Avoid Tapering zNormalized version zframe2acf02.mframe2acf02.m zHalf-frame shifting zframe2acf03.mframe2acf03.m method=2method=3

18 NSDF: ACF Variant with Normalize Range zNSDF: normalized squared difference function yFormula: yA variant of ACF within the range [-1 1], based on the inequality:

19 NSDF Example zframe2nsdf01.mframe2nsdf01.m Clarity: height of the pitch point

20 AMDF: Average Magnitude Difference Function Shifted frame s(i-  ): Original frame s(i):  =30 30 amdf(30) = sum of abs. difference of the overlap part  Pitch period Quiz candidate!

21 Comparison between ACF & AMDF zFormulas yACF: yAMDF: zTwo major advantages of AMDF over ACF yAMDF requires less computing power yAMDF is less likely to have the risk of overflow Quiz candidate!

22 Example of AMDF zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is harder to determine zframe2amdf01.mframe2amdf01.m Index 0 Index 131

23 Example of AMDF to Pitch zsunday.wav ySample rate = 16kHz yFrame size = 512 (starting from point 9000) zFundamental frequency yPitch point occurs at index 131, which is determined correctly yFF = 16000/131 = 123.077 Hz zframe2amdf4pt01.mframe2amdf4pt01.m Index 0 Index 131

24 Example of AMDF Based PT zSpecs ySample rate = 11025 Hz yFrame size = 353 points = 32 ms yOverlap = 0 yFrame rate = 31.25 f/s zPlayback yOriginal singingOriginal singing yPitch by AMDFPitch by AMDF zptByAmdf01.mptByAmdf01.m

25 AMDF: Variations to Avoid Tapering zNormalized version zframe2amdf02.mframe2amdf02.m zHalf-frame shifting zframe2amdf03.mframe2amdf03.m method=2method=3

26 Combining ACF and AMDF ACF AMDF Frame ACF/AMDF

27 Audio Features in Time Domain zAudio features presented in the time domain Intensity Fundamental period Timbre: Waveform within an FP

28 Audio Features in Frequency Domain zEnergy: Sum of power spectrum zPitch: Distance between harmonics zTimber: Smoothed spectrum Second formant F2 First formant F1 Pitch freq Energy

29 About DFT & FFT zTerminology yDFT: Discrete Fourier transform yFFT: Fast Fourier transform, which is an efficient method for computing DFT zMore about DFTMore about DFT

30 Harmonic Product Spectrum (HPS) zProcedure 1.Compute the power spectrum of a frame 2.Eliminate its trend obtained from 20-order polynomial fitting  Formants are removed 3.Apply exponential weighting to suppress high- frequency harmonics 4.Down sample and add to enhance the harmonics at the fundamental frequency 5.Find the max as the pitch point

31 “Down Sample and Add” in HPS

32 Example of HPS xframe2hps01.mframe2hps01.m

33 Example of PT by HPS xptByHps01.mptByHps01.m

34 PT by Cepstrum zFormula for cepstrum zProcedure for PT by cepstrum 1.Compute the power spectrum of a frame. 2.Eliminate the trend of the power spectrum if necessary. 3.Take the inverse FFT on the (symmetric) power spectrum. (The result is real, why?) 4.Find position of the max to compute the pitch.

35 PT by Cepstrum: How It Works? Close to sinusoids! This should be a single pulse only!

36 Example of Cepstrum xframe2ceps01.mframe2ceps01.m

37 Example of PT by Cepstrum xptByCeps01.mptByCeps01.m

38 Two Parts of PT zPT has two parts yVoicing detection xDecide if a frame has a melody pitch or not yPitch estimation xEstimate the most likely melody pitch of a frame zThese two parts can be performed in any order zPerformance evaluation of PT depends on these two parts

39 Performance Evaluation of PT zSeveral criteria for PT performance evaluation yRaw pitch accuracy xProb. of a correct pitch value (to within ±¼ tone or ±0.5 semitone) over the voiced frames yRaw chroma accuracy xProb. that the chroma (i.e. the note name) is correct over the voiced frames yOverall accuracy xProb. of a correct pitch value (via pitch estimation) and pitched decision (via voicing detection) over all frames

40 Preprocessing for Pitch Tracking zSome commonly used preprocessing for the audio signals before pitch tracking yPre-filtering the signals yClipping the signals ySIFT method for the signals

41 Preprocessing: Pre-filtering zObservation yRange of humans’ pitch: [40, 1000] zIdea yLow-pass the signals with a cutoff frequency between 800 and 1000 zCharacteristics yThe effect is yet to be verified

42 Preprocessing: Clipping zObservation ySmall signals near zero is likely to cause pitch tracking error zIdea yClip the signals zCharacteristics ySave computation for embedded system yOverall effect is yet to be verified

43 Preprocessing: SIFT zObservation yChannel effect is likely to cause pitch tracking error zIdea of SIFT (simple inverse filter tracking) yIdentify the excitation via LPC yUse the excitation for PDF zCharacteristics yOverall effect is yet to be verified

44 Example of SIFT zsiftAcf01.msiftAcf01.m

45 Example of PT based on SIFT & ACF zptBySiftAcf01.mptBySiftAcf01.m

46 Postprocessing for Pitch Tracking zSome commonly used postprocessing for pitch tracking ySmoothing to remove abrupt-changing pitch yInterpolation to increase pitch precision

47 Postprocessing: Smoothing zSmoothing by a median filter zptWithMedianFilter01.mptWithMedianFilter01.m

48 Postprocessing: Interpolation zIdea yUsing the pitch point and its neighbors to identify the max position zptWithParabolicFit01.mptWithParabolicFit01.m

49 48/44 UPDUDP (1/4) zUPDUDP: Unbroken Pitch Determination Using DP yGoal: To take pitch smoothness into consideration z : a given path in the AMDF matrix z : Number of frames z : Transition penalty z : Exponent of the transition difference Jiang-Chun Chen, J.-S. Roger Jang, "TRUES: Tone Recognition Using Extended Segments", ACM Transactions on Asian Language Information Processing, No. 10, Vol. 7, Aug 2008.

50 UPDUDP (2/4) zOptimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j) zRecurrent formula: z Initial conditions : z Optimum cost :

51 Example of UPDUDP zA typical example (via AMDF)

52 Robustness of UPDUDP zInsensitivity in

53 Another Example of UPDUDP zExample of MATLAB code using UPDUDP (via ACF) zResult waveFile='arina_short.wav'; wObj=waveFile2obj(waveFile); ptOpt=ptOptSet(wObj.fs, wObj.nbits, 1); pitch=pitchTracking(wObj, ptOpt, 1);

54 Frequency to Semitone Conversion zSemitone : A music scale based on A440 zReasonable pitch range: yE2 - C6 y82 Hz - 1047 Hz ( - )

55 Unreliable Pitch Removal (1/2) zPitch removal via volume thresholding

56 Unreliable Pitch Removal (2/2) zPitch removal via volume/clarity thresholding

57 Rest Handling Rests are removed. Good for DTW. Rests are replaced by previous nonzero pitch. Good for LS. Original pitch vectors with rests.

58 Typical Result of Pitch Tracking Pitch tracking via autocorrelation for 茉莉花 (jasmine)

59 Comparison of Pitch Vectors Yellow line : Target pitch vector

60 Other Pitch Related Demos zPitch scaling ypitchShiftDemo/project1.exe ypitchShift-multirate/multirate.m


Download ppt "Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab, Dept of CSIE National Taiwan University"

Similar presentations


Ads by Google