We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byBo Morman
Modified about 1 year ago
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment
© Fraunhofer FKIE Overview Basics of automatic speaker recognition The VerA system GMM-UBM based system Results on military relevant audio data High-Level Features for improved speaker recognition The Problem looked at in the PhD thesis: Different degrees of vocal effort in training and test data
© Fraunhofer FKIE Corinna Harwardt Speaker recognition The goal of speaker recognition is to determine the probability that a given speech signal is uttered by a certain speaker.
© Fraunhofer FKIE Corinna Harwardt Speaker identification
© Fraunhofer FKIE Corinna Harwardt Speaker verification
© Fraunhofer FKIE Corinna Harwardt Typical configuration of a speaker recognition system
© Fraunhofer FKIE Corinna Harwardt VerA VerA – SprecherVerifikation militärisch relevanter Audiodaten (speaker verification on military relevant audio data) Baseline: MFCC, GMM-UBM based system Energy-based VAD (voice activity detection) MFCC (mel frequency cepstrum coefficients) Developed for speech recognition applications acoustic features Calculated on short parts of the signal (20 ms) GMM (gaussian mixture models) Statistical Modeling of the features extracted from the signal (e.g. MFCC)
© Fraunhofer FKIE Corinna Harwardt Preliminary results on the Kiel corpus
© Fraunhofer FKIE Corinna Harwardt Preliminary results on military relevant audio data
© Fraunhofer FKIE Corinna Harwardt Comparison to other systems Military relevant data average EER SIDSysytem5, EU 18,76% SIDSystem4, FU 14,88% SIDSystem6 V2, EU14,97% SIDSystem6 V2, FU17,04% SIDSystem7 V2, EU18,05% SIDSystem7 V2, FU22,48% SIDSystem1, FU20,15% SIDSystem2, FU23,46% SIDSystem3, FU21,33% VerA16,67% Kiel corpusEER SIDSysytem5, EU22,82% SIDSystem4, FU12,23% SIDSystem6 V1, EU 31,25% SIDSystem6 V1, FU 31,25% SIDSystem6 V2, EU9,74% SIDSystem6 V2, FU9,83% SIDSystem7 V1, EU31,25% SIDSystem7 V1, FU31,25% SIDSystem7 V2, EU 10,31% SIDSystem7 V2, FU14,11% SIDSystem1, FU8,71% SIDSystem2, FU44,34% SIDSystem3, FU40,38% VerA4,71%
© Fraunhofer FKIE Corinna Harwardt High-Level Features I … are features relying on linguistic content or features which are calculated on parts of the signal longer than the normally used approximately 20 ms in frame-based approaches … might for example use prosodic, phonetic or idiolectal information. … lead to additional information compared to acoustically motivated features like MFCCs … shall therefore be used additionally to acoustic features and not exclusively … are relatively robust against distortions
© Fraunhofer FKIE Corinna Harwardt High-Level features II Goal: Pick a high-level feature, which does not need a speech recognizer High-Level features under consideration: F0 statistics as proposed in (Reynolds et al and Rose 2002) Formant statistics (Becker et al. 2008)
© Fraunhofer FKIE Corinna Harwardt Different degrees of vocal effort Problem: The recognition performance degrades for several speech processing tasks if speech with high vocal effort is used without additional training (Becker et al. 2008). The goal is either: Find robust features for speaker recognition with normal and high- vocal effort. Or: to find a method to predict the changes of acoustic features due to raised vocal effort.
© Fraunhofer FKIE Corinna Harwardt References D. Reynolds et al.: SuperSID Project Final Report – Exploiting High-Level Information for High-Performance Speaker Recognition. Department of Defense; National Science Foundation, P. Rose: Forensic Speaker Identification, Taylor & Francis, T. Becker, M. Jessen, and C. Grigoras, “Forensic Speaker Verification Using Formant Features and Gaussian Mixture Models,” 9th Annual Conference of the International Speech Communication Association, 2008.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Anil Alexander 1, Oscar Forth 1, Marianne Jessen 2 and Michael Jessen 3 1 Oxford Wave Research Ltd, Oxford, United Kingdom 2 Stimmenvergleich, Wiesbaden,
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Abstract This article investigates the importance of the vocal source information for speaker recogni- tion. We propose a novel feature extraction scheme.
Secure-Access System via Fixed and Mobile Telephone Networks using Voice Biometrics Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi, P. Rao Dept. of Electrical Engineering, Indian.
Study of Word-Level Accent Classification and Gender Factors Xing Wang, Peihong Guo, Tian Lan, Guoyu Fu CSCE 666 Term Project Presentation Dec 11th, 2013.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
A Text-Independent Speaker Recognition System Catie Schwartz Advisor: Dr. Ramani Duraiswami Mid-Year Progress Report.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
S. Raghavan, G. Lazarou and J. Picone Intelligent Electronic Systems Center for Advanced Vehicular Systems Mississippi State University URL:
Chapter 7 Speech Recognition Framework 7.1 The main form and application of speech recognition 7.2 The main factors of speech recognition 7.3 The.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
Representing Acoustics with Mel Frequency Cepstral Coefficients Lecture 7 Spoken Language Processing Prof. Andrew Rosenberg.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Jacob Zurasky ECE5526 – Spring Generate an Acoustic Model for digit recognition Create a MATLAB tool to transcribe training data to be used.
Kinect Player Gender Recognition from Speech Analysis Radford Parker ECE 6254.
Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Introduction to Automatic Speech Recognition. Outline Define the problem What is speech? Feature Selection Models Early methods Modern statistical.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.
Natural Language Understanding Raivydas Simenas. Overwiev History Speech Recognition Natural Language Understanding – statistical methods to resolve ambiguities.
Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition Thilo Stadelmann, Bernd Freisleben, Ralph Ewerth University of Marburg,
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Speaker Detection Without Models Dan Gillick July 27, 2004.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
SPEAKER RECOGNITION A PRESENTATION BY SHAMALEE DESHPANDE.
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
Performance Comparison of Speaker and Emotion Recognition.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
© 2017 SlidePlayer.com Inc. All rights reserved.