Download presentation
Presentation is loading. Please wait.
1
Harmonic-Temporal Clustering of Speech
Jonathan Le Roux, Hirokazu Kameoka, Nobutaka Ono, Alain de Cheveigné, Shigeki Sagayama
2
Motivation and Approach
Precise and Robust F0 analysis Analysis of complex and varied acoustical scenes For speech, applications in speech recognition, prosody analysis, speech enhancement, speaker identification… Desirable features of a new pitch determination algorithm (PDA) The performance should stay high in a wide range of background noises (white noise, pink noise, noise bursts, music, other speech) Extracting simultaneously the pitch contours of several concurrent voices is possible Overall speech model, spectro-temporal model with constraints Several existing multi-pitch tracking algorithms: initial frame-by-frame analysis, then post-processing to reduce errors and obtain a smooth pitch contour (for example using HMMs) We propose to perform estimation and model-based interpolation simultaneously: Parametric model of the voiced parts of the power spectrum of speech Introduction of a noise model to extract harmonically structured “islands” within a “sea” of unstructured noise.
3
Overview of the method Simultaneous optimization of the parameters
Express the whole pitch contour as a smooth curve→ cubic spline Distribute audio objects with different acoustical properties Express the harmonic structure as a parametric function: GMM Express the power envelope in time direction as a parametric function: GMM Log-Frequency Simultaneous optimization of the parameters Characteristic: Through the harmonicity assumption, the method models the voiced parts of speech time
5
F0 estimation in noisy environments
Speech mixed with broadband background noise: Voiced speech with several types of interferences: Accuracy (%) of the F0 estimation:
6
Multi-pitch estimation
Co-channel speech of two speakers speaking simultaneously with equal average power. Test data Bagshaw database、150 mixtures 16kHz, monaural signal Results 8kHz Frequency 50Hz 0s time 「a-o-i」 「o-i-o-o-u」 1.3s 0s 1.3s No second sound here
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.