We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byBo Morman
Modified over 3 years ago
© Fraunhofer FKIE Corinna Harwardt 28.10.2009 Automatic Speaker Recognition in Military Environment
© Fraunhofer FKIE Overview Basics of automatic speaker recognition The VerA system GMM-UBM based system Results on military relevant audio data High-Level Features for improved speaker recognition The Problem looked at in the PhD thesis: Different degrees of vocal effort in training and test data
© Fraunhofer FKIE Corinna Harwardt Speaker recognition The goal of speaker recognition is to determine the probability that a given speech signal is uttered by a certain speaker.
© Fraunhofer FKIE Corinna Harwardt Speaker identification
© Fraunhofer FKIE Corinna Harwardt Speaker verification
© Fraunhofer FKIE Corinna Harwardt Typical configuration of a speaker recognition system
© Fraunhofer FKIE Corinna Harwardt VerA VerA – SprecherVerifikation militärisch relevanter Audiodaten (speaker verification on military relevant audio data) Baseline: MFCC, GMM-UBM based system Energy-based VAD (voice activity detection) MFCC (mel frequency cepstrum coefficients) Developed for speech recognition applications acoustic features Calculated on short parts of the signal (20 ms) GMM (gaussian mixture models) Statistical Modeling of the features extracted from the signal (e.g. MFCC)
© Fraunhofer FKIE Corinna Harwardt Preliminary results on the Kiel corpus
© Fraunhofer FKIE Corinna Harwardt Preliminary results on military relevant audio data
© Fraunhofer FKIE Corinna Harwardt Comparison to other systems Military relevant data average EER SIDSysytem5, EU 18,76% SIDSystem4, FU 14,88% SIDSystem6 V2, EU14,97% SIDSystem6 V2, FU17,04% SIDSystem7 V2, EU18,05% SIDSystem7 V2, FU22,48% SIDSystem1, FU20,15% SIDSystem2, FU23,46% SIDSystem3, FU21,33% VerA16,67% Kiel corpusEER SIDSysytem5, EU22,82% SIDSystem4, FU12,23% SIDSystem6 V1, EU 31,25% SIDSystem6 V1, FU 31,25% SIDSystem6 V2, EU9,74% SIDSystem6 V2, FU9,83% SIDSystem7 V1, EU31,25% SIDSystem7 V1, FU31,25% SIDSystem7 V2, EU 10,31% SIDSystem7 V2, FU14,11% SIDSystem1, FU8,71% SIDSystem2, FU44,34% SIDSystem3, FU40,38% VerA4,71%
© Fraunhofer FKIE Corinna Harwardt High-Level Features I … are features relying on linguistic content or features which are calculated on parts of the signal longer than the normally used approximately 20 ms in frame-based approaches … might for example use prosodic, phonetic or idiolectal information. … lead to additional information compared to acoustically motivated features like MFCCs … shall therefore be used additionally to acoustic features and not exclusively … are relatively robust against distortions
© Fraunhofer FKIE Corinna Harwardt High-Level features II Goal: Pick a high-level feature, which does not need a speech recognizer High-Level features under consideration: F0 statistics as proposed in (Reynolds et al. 2002 and Rose 2002) Formant statistics (Becker et al. 2008)
© Fraunhofer FKIE Corinna Harwardt Different degrees of vocal effort Problem: The recognition performance degrades for several speech processing tasks if speech with high vocal effort is used without additional training (Becker et al. 2008). The goal is either: Find robust features for speaker recognition with normal and high- vocal effort. Or: to find a method to predict the changes of acoustic features due to raised vocal effort.
© Fraunhofer FKIE Corinna Harwardt References D. Reynolds et al.: SuperSID Project Final Report – Exploiting High-Level Information for High-Performance Speaker Recognition. Department of Defense; National Science Foundation, 2002. P. Rose: Forensic Speaker Identification, Taylor & Francis, 2002. T. Becker, M. Jessen, and C. Grigoras, “Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models,” 9th Annual Conference of the International Speech Communication Association, 2008.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Advanced Speech Enhancement in Noisy Environments
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
A Text-Independent Speaker Recognition System
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Vorlesung Video Retrieval Kapitel 8.2 – Speaker Recognition
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
Speaker Detection Without Models Dan Gillick July 27, 2004.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
AdvAIR Supervised by Prof. Michael R. Lyu Prepared by Alex Fok, Shirley Ng 2002 Fall An Advanced Audio Information Retrieval System.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
A PRESENTATION BY SHAMALEE DESHPANDE
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Advisor: Prof. Tony Jebara
© 2018 SlidePlayer.com Inc. All rights reserved.