Presentation is loading. Please wait.

Presentation is loading. Please wait.

HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Similar presentations


Presentation on theme: "HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino."— Presentation transcript:

1 HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino

2 Outline Beamforming and Adaptive Noise Cancellation Environmental Acoustics Estimation Audio-Video data collection Multi-channel pitch estimation Fixed-platform prototype acquisition module

3 Beamforming: D&S Availability of multi-channel signals allows to selectively capture the desired source: Issues: estimation of reliable TDOAs; Method: CSP analysis over multiple frames Advantages: robustness reduced computational power

4 D&S with MarkIII Test set: set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels clean models, trained on original TIDIGITS Results (WRR [%]): C_138.5 C_3250.8 DS_C879.9 DS_C1683.0 DS_C3285.3 DS_C6485.4

5 Adaptive Noise Cancellation A remote microphone can be used as reference for noise estimation: ++ + - equivalent noise path filter (cockpit) noise (beamformed) speech Adaptive filter noisy speech filtered noise denoised speech

6 NMLS The tested algorithm is the Normalized Mean Least Squares: iterativelly estimate a FIR filter that minimizes the difference between the primary channel and the reference We implemented two algorithms: time domain frequency domain (subband)

7 D&S + ANC Test set: set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels clean models, trained on original TIDIGITS Results (WRR): C_32 (T)64.7 C_32 (F)72.4 DS_C64 (T)81.8 DS_C64 (F)88.4

8 Acoustics estimation Idea: Simulate in a realistic way an environment (and the noise) Method: Measure several impulse responses in an environment with a multi-channel equipment (through reproduction of chirp signals) preserving relative amplitudes and mutual delays; Generate appropriate noisy signals starting from clean data; The derived acoustics models perform better in the given environment (also) using real data.

9 Audio–Video Data Collection Idea: In a noisy environment exploit additional features from video data (collaboration with NTUA and TUC) Design of AV corpus: Task: English connected digits, HIWIRE commands/keywords Channels: 4 audio, 3 video Environment: acoustically-treated room + noise diffusion

10 Audio–Video Setup ) ) ) ) ) ) Cockpit noise ) ) ) 70-80 cm

11 Audio–Video Setup Audio 4 omnidirectional PZM Shure microphones, 16 kHz/16 bits background noise diffused by 2 loudspeakers Video Webcam: 640x480, 30 fps – color, Unix timestamps Stereoscopic camera pair: 640x480, 30 fps - bw or 15 fps – color, perfectly synchronous Current data sets 8 speakers / connected digits 2 speakers / HIWIRE keyword lists

12 Fixed prototype acquisition device Hardware platform: 8 Shure microphones + RME Hammerfall Software environment: Linux, ALSA driver Acquisition module: acquires synchronously multiple channels (8); writes (to its standard output/file) the enhanced signal + additional information/features (start/end speech hyphoteses, voiced/unvoiced, pitch, …)

13 Multi-channel pitch analysis The basic principle is that we can exploit many observations of the same speech process Once located the speaker, we can take into account the different propagation time at the microphones and perform a time-alignment Pitch analysis can be performed using: adjacent time intervals extracted from different microphone signals Basic correlation techniques: AMDF, AUTOC, WAUTOC

14 Single Channel Method: Weighted Autocorrelation For every frame of length N: (see Shimamura-Kobayashi, Trans. on SAP, 2001) Hz Samples

15 WAUTOC is computed for each channel, and summed over the M channels. For a given frame: Issues: Weights w i may represent the channel reliability ; Use of possible intraframe smoothing of the resulting fundamental frequency contour, which could improve the overall accuracy A Multichannel WAUTOC Method

16 Video example: distant-talking speech recognition

17 Video example: multi-channel pitch estimation

18 Forthcoming activities more effective combination of beamforming and ANC; test also ANC before D&S beamforming; test post-filtering after D&S; audio-video collection: an improved audio/video synchronization would be advisable; audio-video collection: select best balance beetween quality and frame rate acoustically characterize the target environment (prototype); integrate the selected features in the multi-channel front-end;


Download ppt "HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino."

Similar presentations


Ads by Google