Download presentation

Presentation is loading. Please wait.

Published byYasmine Dowlen Modified over 2 years ago

1
Current HOARSE related activities 6-7 Sept 2002

2
…include the following (+ more) Novel architectures 1.All-combinations HMM/ANN 2.Tandem HMM/ANN hybrid 3.DBNs: exploring new topologies 4.HMM2: formant ftrs for spkr norm. Clustering & segmentation 5.Speech/music segmentation 6.Speaker clustering Evidence weighting 7.Mic arrays for MD mask estimation 8.Entropy based MS combination 9.Confusion based entropy correction 10.Noise PDF transformation in MD ASR

3
All-combinations HMM/ANN AC sum rule ACMS overcomes assumption of conditional independence between data streams MAP static weighting leads to MAP combination after decoding

4
Tandem HMM/ANN hybrid Output from one or more MLPs is appended and orthogonalised, then used as discriminative feature data for training standard HMM/GMM e.g. combine MSG with PLP Training narrow sub-band MLPs with noisy data results in robust features which are independent of noise type Robust sub-band features concatenated before input to speech feature extractor Tandem multi-band Tandem multi-stream

5
DBNs: exploring new topologies Baseline DBN for IWRTopology 1 Topology 2Topology 3 Aux variables tested: articulator (quantised)(+); pitch(-); speech rate(-); energy(+)

6
HMM2: formant ftrs for spkr norm. WER avg over SNR: 4 fmnt = 28.1%, MFCC = 14.8%, fmnt + MFCC = 14.3%

7
Speech/Music Segmentation Entropy Dynamism Best results from concatenated Entropy & Dynamism ftrs. Whether best from GMM or MLP is task dependent ACMS not tested

8
Speaker Clustering Usual distance is BIC (Bayesian Information Criterion) dist. Clustering: start with many clusters. Repeat (merge cluster pair with most negative dist) untill no such pair. New model: proposed distance avoids estimation of lamda by ensuring K = 0, where K is diff. in size (# params) of merged cluster and sum of sizes of separate clusters.

9
Mic arrays for MD mask estimation Oracle 1 chan 2 chan “one” Reliability mask 4 mic array used Filter-sum beamformer with post filter MA + MD => 40% rel. err. red. over MA enhancement Advantage still greater with 2 mics

10
Entropy based MS combination Various functions of the stream entropies were tested for recognition performance. Combination used weighted ACMS sum rule.

11
Confusion based entropy correction Confusion matrix for 1/6 fullband MFCC expert with: band 1band 6 silencespace With multi-condition trained narrowband models, entropy first increases with noise level, but then decreases to zero Misleading expert entropies can be avoided if posterior probabilities are corrected by a linear transformation obtained from corresponding X validation confusion matrix

12
Noise PDF transf. in MD ASR Usual SMD clean data mix pdf has 2 mix comps (uniform & dirac) for SNR =0. But “max” assumption used here is inaccurate. Better to use 3 mix pdf: SNR SNR hi or neither For case “neither”, with noise pdf p N (.), compression function C(.) with inverse B(.), and noisy obs. z, clean data mix pdf p X (.) is p X (x) = p N (B(z)-B(x))B’(x), over x in [0,z] e.g. for p N (.) uniform and cube root compression, p X (x) = 3x 2 /z 3

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google