…include the following (+ more) Novel architectures 1.All-combinations HMM/ANN 2.Tandem HMM/ANN hybrid 3.DBNs: exploring new topologies 4.HMM2: formant ftrs for spkr norm. Clustering & segmentation 5.Speech/music segmentation 6.Speaker clustering Evidence weighting 7.Mic arrays for MD mask estimation 8.Entropy based MS combination 9.Confusion based entropy correction 10.Noise PDF transformation in MD ASR
All-combinations HMM/ANN AC sum rule ACMS overcomes assumption of conditional independence between data streams MAP static weighting leads to MAP combination after decoding
Tandem HMM/ANN hybrid Output from one or more MLPs is appended and orthogonalised, then used as discriminative feature data for training standard HMM/GMM e.g. combine MSG with PLP Training narrow sub-band MLPs with noisy data results in robust features which are independent of noise type Robust sub-band features concatenated before input to speech feature extractor Tandem multi-band Tandem multi-stream
DBNs: exploring new topologies Baseline DBN for IWRTopology 1 Topology 2Topology 3 Aux variables tested: articulator (quantised)(+); pitch(-); speech rate(-); energy(+)
HMM2: formant ftrs for spkr norm. WER avg over SNR: 4 fmnt = 28.1%, MFCC = 14.8%, fmnt + MFCC = 14.3%
Speech/Music Segmentation Entropy Dynamism Best results from concatenated Entropy & Dynamism ftrs. Whether best from GMM or MLP is task dependent ACMS not tested
Speaker Clustering Usual distance is BIC (Bayesian Information Criterion) dist. Clustering: start with many clusters. Repeat (merge cluster pair with most negative dist) untill no such pair. New model: proposed distance avoids estimation of lamda by ensuring K = 0, where K is diff. in size (# params) of merged cluster and sum of sizes of separate clusters.
Mic arrays for MD mask estimation Oracle 1 chan 2 chan “one” Reliability mask 4 mic array used Filter-sum beamformer with post filter MA + MD => 40% rel. err. red. over MA enhancement Advantage still greater with 2 mics
Entropy based MS combination Various functions of the stream entropies were tested for recognition performance. Combination used weighted ACMS sum rule.
Confusion based entropy correction Confusion matrix for 1/6 fullband MFCC expert with: band 1band 6 silencespace With multi-condition trained narrowband models, entropy first increases with noise level, but then decreases to zero Misleading expert entropies can be avoided if posterior probabilities are corrected by a linear transformation obtained from corresponding X validation confusion matrix
Noise PDF transf. in MD ASR Usual SMD clean data mix pdf has 2 mix comps (uniform & dirac) for SNR =0. But “max” assumption used here is inaccurate. Better to use 3 mix pdf: SNR SNR hi or neither For case “neither”, with noise pdf p N (.), compression function C(.) with inverse B(.), and noisy obs. z, clean data mix pdf p X (.) is p X (x) = p N (B(z)-B(x))B’(x), over x in [0,z] e.g. for p N (.) uniform and cube root compression, p X (x) = 3x 2 /z 3