Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topics Recognition results on Aurora noisy speech databaseRecognition results on Aurora noisy speech database Proposal of robust formant.

Similar presentations


Presentation on theme: "Topics Recognition results on Aurora noisy speech databaseRecognition results on Aurora noisy speech database Proposal of robust formant."— Presentation transcript:

1 b.milner@uea.ac.uk Topics Recognition results on Aurora noisy speech databaseRecognition results on Aurora noisy speech database Proposal of robust formant estimation from MFCCsProposal of robust formant estimation from MFCCs Availability of real in-car speech databasesAvailability of real in-car speech databases Contact from Pi ResearchContact from Pi Research

2 b.milner@uea.ac.uk

3 Robust Formant Prediction from MFCCs One of the aims of this integrated project is to use the speech recogniser to provide clean speech information for the speech enhancement componentOne of the aims of this integrated project is to use the speech recogniser to provide clean speech information for the speech enhancement component Proposal is to use the speech recogniser to provide robust formant information from noisy speechProposal is to use the speech recogniser to provide robust formant information from noisy speech Review previous work on predicting pitch from MFCC vectorsReview previous work on predicting pitch from MFCC vectors Extension to proposed prediction of formantsExtension to proposed prediction of formants

4 b.milner@uea.ac.uk Pitch Prediction from MFCCs In speech recognition most common feature extracted is the mel-frequency cepstral coefficient (MFCC)In speech recognition most common feature extracted is the mel-frequency cepstral coefficient (MFCC) This is designed for class discrimination and contains spectral envelope informationThis is designed for class discrimination and contains spectral envelope information Excitation information (pitch) is lost through smoothing processesExcitation information (pitch) is lost through smoothing processes Project at UEA aimed at reconstructing speech from MFCC vectors - therefore needed additional pitch estimate or prediction of pitchProject at UEA aimed at reconstructing speech from MFCC vectors - therefore needed additional pitch estimate or prediction of pitch

5 b.milner@uea.ac.uk MFCC Extraction Mel Frequency Cepstral Coefficients (MFCC)Mel Frequency Cepstral Coefficients (MFCC) designed for speech recognizerdesigned for speech recognizer simulate human perceptual abilitysimulate human perceptual ability currently give best recognition performancecurrently give best recognition performance extract information of vocal tractextract information of vocal tract ignore most of speaker information, such as pitchignore most of speaker information, such as pitch speech Framing,Pre-emphasis and windowing FFT and Magnitude Spectrum DCT Mel Filterbank Log( ) Truncation 13-D MFCCs

6 b.milner@uea.ac.uk Pitch Prediction from MFCC vectors There is clearly no global correlation between pitch frequency and spectral envelope (or MFCC vector)There is clearly no global correlation between pitch frequency and spectral envelope (or MFCC vector) There does exist a class-dependent correlation - the classes being different speech soundsThere does exist a class-dependent correlation - the classes being different speech sounds If this class-based correlation can be modelled then prediction of pitch from spectral envelope, or MFCC, should be possibleIf this class-based correlation can be modelled then prediction of pitch from spectral envelope, or MFCC, should be possible Investigate two methods for modelling this correlationInvestigate two methods for modelling this correlation GMMGMM HMMHMM

7 b.milner@uea.ac.uk Class-based GMM Pitch Prediction Training phase Introduce augmented feature vectorIntroduce augmented feature vector y = [x, f] Model joint distribution by clustersing to form a GMM - tested from 64 to 128 clustersModel joint distribution by clustersing to form a GMM - tested from 64 to 128 clusters Pitch Prediction During prediction stage only have MFCC component xDuring prediction stage only have MFCC component x Pitch is predicted using MAP algorithm from the means and covariance of the clustersPitch is predicted using MAP algorithm from the means and covariance of the clusters Does not fully exploit the class- based correlation between the MFCC vector and pitchDoes not fully exploit the class- based correlation between the MFCC vector and pitch x f

8 b.milner@uea.ac.uk HMM Pitch Prediction Training phase Model joint distribution of pitch and MFCC using a series of HMMsModel joint distribution of pitch and MFCC using a series of HMMs Pitch Prediction Perform standard Viterbi decoding of MFCC stream in the HMMPerform standard Viterbi decoding of MFCC stream in the HMM Use model and state sequence information to locate mapping for each MFCC vector and then use MAP to predict pitchUse model and state sequence information to locate mapping for each MFCC vector and then use MAP to predict pitch GMM does not model the temporal correlation of pitchGMM does not model the temporal correlation of pitch GMM clusters are trained unsupervised - may be better to used supervised trainingGMM clusters are trained unsupervised - may be better to used supervised training x 1 2 f

9 b.milner@uea.ac.uk Pitch Prediction Results Aurora database - 200 utterances for training (50 speakers), 90 utterances for testing (23 speakers)Aurora database - 200 utterances for training (50 speakers), 90 utterances for testing (23 speakers) 42,902 frames in total42,902 frames in total

10 b.milner@uea.ac.uk Reconstructed Speech original MFCC+HMM -based pitch MFCC+ reference pitch

11 b.milner@uea.ac.uk Extension to Formant Prediction Prediction of formants may also be possible from MFCC vectors using similar strategy of modelling joint distributionPrediction of formants may also be possible from MFCC vectors using similar strategy of modelling joint distribution y = [x, f1, f2, f3, f4, …] Potentially stronger correlation between formant and MFCCs than pitch and MFCCsPotentially stronger correlation between formant and MFCCs than pitch and MFCCs Use Brunel format estimator to provide frequency, bandwidth, amplitude of formantsUse Brunel format estimator to provide frequency, bandwidth, amplitude of formants

12 b.milner@uea.ac.uk Why Predict Formants? Formant estimation from noisy speech is a difficult task and prone to errorsFormant estimation from noisy speech is a difficult task and prone to errors Predicting them from MFCCs may be more robustPredicting them from MFCCs may be more robust Before prediction can apply noise compensation methods to MFCCs (spectral subtraction/Wiener)Before prediction can apply noise compensation methods to MFCCs (spectral subtraction/Wiener) Alternatively model the joint distribution of noisy MFCCs and formantsAlternatively model the joint distribution of noisy MFCCs and formants In effect utilise the correlation information available inside the speech models themselvesIn effect utilise the correlation information available inside the speech models themselves Formant predictions provide clean speech information necessary for speech enhancement component of projectFormant predictions provide clean speech information necessary for speech enhancement component of project

13 b.milner@uea.ac.uk Noisy Speech Databases Two more noisy speech databases availableTwo more noisy speech databases available SpeechDat-Car - DanishSpeechDat-Car - Danish SpeechDat-Car - SpanishSpeechDat-Car - Spanish Connected digit strings recorded in a moving car under different driving conditions.Connected digit strings recorded in a moving car under different driving conditions. Both hands-free and close-talking microphoneBoth hands-free and close-talking microphone Available through SIG in COST278 - will request availability to other partnersAvailable through SIG in COST278 - will request availability to other partners

14 b.milner@uea.ac.uk Pi Research Pi Research in Cambridge specialise in data communication in Formula 1 racingPi Research in Cambridge specialise in data communication in Formula 1 racing Made an approach regarding possibility of reducing noise on driver-to-pit crew communicationMade an approach regarding possibility of reducing noise on driver-to-pit crew communication Example - down to SNRs of -30dBExample - down to SNRs of -30dB

15 b.milner@uea.ac.uk Pi Research

16 b.milner@uea.ac.uk End


Download ppt "Topics Recognition results on Aurora noisy speech databaseRecognition results on Aurora noisy speech database Proposal of robust formant."

Similar presentations


Ads by Google