HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March 9.-10. 2006 Torino.

Slides:



Advertisements
Similar presentations
Change-Point Detection Techniques for Piecewise Locally Stationary Time Series Michael Last National Institute of Statistical Sciences Talk for Midyear.
Advertisements

The Fully Networked Car Geneva, 4-5 March Jean-Pierre Jallet Car Active Noise Cancellation for improved car efficiency, From/In/To car voice communication.
Acoustic Echo Cancellation for Low Cost Applications
August 2004Multirate DSP (Part 2/2)1 Multirate DSP Digital Filter Banks Filter Banks and Subband Processing Applications and Advantages Perfect Reconstruction.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Speech Enhancement through Noise Reduction By Yating & Kundan.
EE2F2 - Music Technology 2. Stereo and Multi-track Recording.
Towards speaker and environmental robustness in ASR: the HIWIRE project A. Potamianos 1, G. Bouselmi 2, D. Dimitriadis 3, D. Fohr 2, R. Gemello 4, I. Illina.
Improvement of Audio Capture in Handheld Devices through Digital Filtering Problem Microphones in handheld devices are of low quality to reduce cost. This.
Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
Xkl: A Tool For Speech Analysis Eric Truslow Adviser: Helen Hanson.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
Advances in WP2 Trento Meeting – January
Project Presentation: March 9, 2006
Zhengyou Zhang, Qin Cai, Jay Stokes
Advances in WP1 and WP2 Paris Meeting – 11 febr
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
Why is ASR Hard? Natural speech is continuous
Department of Electrical Engineering | University of Texas at Dallas Erik Jonsson School of Engineering & Computer Science | Richardson, Texas ,
Sarah Middleton Supervised by: Anton van Wyk, Jacques Cilliers, Pascale Jardin and Florence Nadal 3 December 2010.
Inputs to Signal Generation.vi: -Initial Distance (m) -Velocity (m/s) -Chirp Duration (s) -Sampling Info (Sampling Frequency, Window Size) -Original Signal.
Sound Source Localization based Robot Navigation Group 13 Supervised By: Dr. A. G. Buddhika P. Jayasekara Dr. A. M. Harsha S. Abeykoon 13-1 :R.U.G.Punchihewa.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
WOZ acoustic data collection for interactive TV A. Brutti*, L. Cristoforetti*, W. Kellermann+, L. Marquardt+, M. Omologo* * Fondazione Bruno Kessler (FBK)
Introduction to Adaptive Digital Filters Algorithms
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
Data Processing Functions CSC508 Techniques in Signal/Data Processing.
Technical Seminar Presented by :- Debabandana Apta (EC ) National Institute of Science and Technology [1] “ECHO CANCELLATION” Presented.
Acoustic impulse response measurement using speech and music signals John Usher Barcelona Media – Innovation Centre | Av. Diagonal, 177, planta 9,
Intel Labs Self Localizing sensors and actuators on Distributed Computing Platforms Vikas Raykar Igor Kozintsev Igor Kozintsev Rainer Lienhart.
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Study on the Use of Error Term in Parallel- form Narrowband Feedback Active Noise Control Systems Jianjun HE, Woon-Seng Gan, and Yong-Kim Chong 11 th Dec,
Name : Arum Tri Iswari Purwanti NPM :
EE 426 DIGITAL SIGNAL PROCESSING TERM PROJECT Objective: Adaptive Noise Cancellation.
Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.
Basics of Neural Networks Neural Network Topologies.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
Communication Group Course Multidimensional DSP DoA Estimation Methods Pejman Taslimi – Spring 2009 Course Presentation – Amirkabir Univ Title: Acoustic.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Timo Haapsaari Laboratory of Acoustics and Audio Signal Processing April 10, 2007 Two-Way Acoustic Window using Wave Field Synthesis.
Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
CARE / ELAN / EUROTeV Feedback Loop on a large scale quadrupole prototype Laurent Brunetti* Jacques Lottin**
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
Microphone Array Project ECE5525 – Speech Processing Robert Villmow 12/11/03.
Automatic Equalization for Live Venue Sound Systems Damien Dooley, Final Year ECE Progress To Date, Monday 21 st January 2008.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
APPLICATION OF A WAVELET-BASED RECEIVER FOR THE COHERENT DETECTION OF FSK SIGNALS Dr. Robert Barsanti, Charles Lehman SSST March 2008, University of New.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
SOME SIMPLE MANIPULATIONS OF SOUND USING DIGITAL SIGNAL PROCESSING Richard M. Stern demo January 15, 2015 Department of Electrical and Computer.
Position Calibration of Audio Sensors and Actuators in a Distributed Computing Platform Vikas C. Raykar | Igor Kozintsev | Rainer Lienhart University of.
Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.
Dan Nichols Head of Recording Services Internet2 Multimedia Specialist Northern Illinois University Your TV IS TOO SLOW.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Digital transmission over a fading channel
Digital Communications Chapter 13. Source Coding
Adaptive Filters Common filter design methods assume that the characteristics of the signal remain constant in time. However, when the signal characteristics.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Voice Manipulator Department of Electrical & Computer Engineering
INTRODUCTION TO ADVANCED DIGITAL SIGNAL PROCESSING
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March Torino

Outline Beamforming and Adaptive Noise Cancellation Environmental Acoustics Estimation Audio-Video data collection Multi-channel pitch estimation Fixed-platform prototype acquisition module

Beamforming: D&S Availability of multi-channel signals allows to selectively capture the desired source: Issues: estimation of reliable TDOAs; Method: CSP analysis over multiple frames Advantages: robustness reduced computational power

D&S with MarkIII Test set: set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels clean models, trained on original TIDIGITS Results (WRR [%]): C_138.5 C_ DS_C879.9 DS_C DS_C DS_C6485.4

Adaptive Noise Cancellation A remote microphone can be used as reference for noise estimation: equivalent noise path filter (cockpit) noise (beamformed) speech Adaptive filter noisy speech filtered noise denoised speech

NMLS The tested algorithm is the Normalized Mean Least Squares: iterativelly estimate a FIR filter that minimizes the difference between the primary channel and the reference We implemented two algorithms: time domain frequency domain (subband)

D&S + ANC Test set: set N1_SNR0 of MC-TIDIGITS (cockpit noise), MarkIII channels clean models, trained on original TIDIGITS Results (WRR): C_32 (T)64.7 C_32 (F)72.4 DS_C64 (T)81.8 DS_C64 (F)88.4

Acoustics estimation Idea: Simulate in a realistic way an environment (and the noise) Method: Measure several impulse responses in an environment with a multi-channel equipment (through reproduction of chirp signals) preserving relative amplitudes and mutual delays; Generate appropriate noisy signals starting from clean data; The derived acoustics models perform better in the given environment (also) using real data.

Audio–Video Data Collection Idea: In a noisy environment exploit additional features from video data (collaboration with NTUA and TUC) Design of AV corpus: Task: English connected digits, HIWIRE commands/keywords Channels: 4 audio, 3 video Environment: acoustically-treated room + noise diffusion

Audio–Video Setup ) ) ) ) ) ) Cockpit noise ) ) ) cm

Audio–Video Setup Audio 4 omnidirectional PZM Shure microphones, 16 kHz/16 bits background noise diffused by 2 loudspeakers Video Webcam: 640x480, 30 fps – color, Unix timestamps Stereoscopic camera pair: 640x480, 30 fps - bw or 15 fps – color, perfectly synchronous Current data sets 8 speakers / connected digits 2 speakers / HIWIRE keyword lists

Fixed prototype acquisition device Hardware platform: 8 Shure microphones + RME Hammerfall Software environment: Linux, ALSA driver Acquisition module: acquires synchronously multiple channels (8); writes (to its standard output/file) the enhanced signal + additional information/features (start/end speech hyphoteses, voiced/unvoiced, pitch, …)

Multi-channel pitch analysis The basic principle is that we can exploit many observations of the same speech process Once located the speaker, we can take into account the different propagation time at the microphones and perform a time-alignment Pitch analysis can be performed using: adjacent time intervals extracted from different microphone signals Basic correlation techniques: AMDF, AUTOC, WAUTOC

Single Channel Method: Weighted Autocorrelation For every frame of length N: (see Shimamura-Kobayashi, Trans. on SAP, 2001) Hz Samples

WAUTOC is computed for each channel, and summed over the M channels. For a given frame: Issues: Weights w i may represent the channel reliability ; Use of possible intraframe smoothing of the resulting fundamental frequency contour, which could improve the overall accuracy A Multichannel WAUTOC Method

Video example: distant-talking speech recognition

Video example: multi-channel pitch estimation

Forthcoming activities more effective combination of beamforming and ANC; test also ANC before D&S beamforming; test post-filtering after D&S; audio-video collection: an improved audio/video synchronization would be advisable; audio-video collection: select best balance beetween quality and frame rate acoustically characterize the target environment (prototype); integrate the selected features in the multi-channel front-end;