Douglas A. Reynolds, PhD Senior Member of Technical Staff

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
ECE 5367 – Presentation Prepared by: Adnan Khan Pulin Patel
CSC 386 – Computer Security Scott Heggen. Agenda Authentication Passwords Reducing the probability of a password being guessed Reducing the probability.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Chapter 9 Creating and Maintaining Database Presented by Zhiming Liu Instructor: Dr. Bebis.
Automatic Lip- Synchronization Using Linear Prediction of Speech Christopher Kohnert SK Semwal University of Colorado, Colorado Springs.
Voice Biometric Overview for SfTelephony Meetup March 10, 2011 Dan Miller Opus Research.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Engineering Data Analysis & Modeling Practical Solutions to Practical Problems Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer.
8/12/2003 Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye International Computer Science Institute.
AdvAIR Supervised by Prof. Michael R. Lyu Prepared by Alex Fok, Shirley Ng 2002 Fall An Advanced Audio Information Retrieval System.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
ENEE408G Capstone Design Project: Multimedia Signal Processing Group 1 By : William “Chris” Paul Louis Lo Jang-Hyun Ko Ronald McLaren Final Project : V-LOCK.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
A PRESENTATION BY SHAMALEE DESHPANDE
A Brief Survey on Face Recognition Systems Amir Omidvarnia March 2007.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Biometrics: Voice Recognition
1J. M. Kizza - Ethical And Social Issues Module 16: Biometrics Introduction and Definitions Introduction and Definitions The Biometrics Authentication.
Module 14: Biometrics Introduction and Definitions The Biometrics Authentication Process Biometric System Components The Future of Biometrics J. M. Kizza.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Track: Speech Technology Kishore Prahallad Assistant Professor, IIIT-Hyderabad 1Winter School, 2010, IIIT-H.
MIT Lincoln Laboratory Nuance Communications ICASSP01 Tutorial 5/7/01 1 © 2001 D.A.Reynolds and L.P.Heck Speaker Verification: From Research to Reality.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Speaker Recognition By Afshan Hina.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
SIV Applications Claudia Daboul (IBP) Martin Eckert (T-Systems) Judith Markowitz (J. Markowitz, Consultants) 08. Aug 2006.
A Talking Elevator, WS2006 UdS, Speaker Recognition 1.
CPSC 601 Lecture Week 5 Hand Geometry. Outline: 1.Hand Geometry as Biometrics 2.Methods Used for Recognition 3.Illustrations and Examples 4.Some Useful.
1 Phoneme and Sub-phoneme T- Normalization for Text-Dependent Speaker Recognition Doroteo T. Toledano 1, Cristina Esteve-Elizalde 1, Joaquin Gonzalez-Rodriguez.
Juan Ortega 10/20/09 NTS490. Speaker recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their.
» Jun 9, 2003 Speaker Verification Secure AND Efficient, Deployments in Finance and Banking Jonathan Moav Director of Marketing
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
An Intro to Speaker Recognition
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Speaker Authentication Qi Li and Biing-Hwang Juang, Pattern Recognition in Speech and Language Processing, Chap 7 Reporter : Chang Chih Hao.
Nick Wang, 25 Oct Speaker identification and verification using EigenVoices O. Thyes, R. Kuhn, P. Nguyen, and J.-C. Junqua in ICSLP2000 Presented.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
INTRODUCTION TO BIOMATRICS ACCESS CONTROL SYSTEM Prepared by: Jagruti Shrimali Guided by : Prof. Chirag Patel.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
Speech Recognition Created By : Kanjariya Hardik G.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Shital ghule..  INTRODUCTION: This paper proposes an ATM security model that would combine a physical access card,a pin and electronic facial recognition.
By: Brad Brosig.  Introduction  Types of Biometric Security  The Installation Process  Biometric Authentication Errors  The Necessity of Mobile Device.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
BIOMETRICS VOICE RECOGNITION. Meaning Bios : LifeMetron : Measure Bios : LifeMetron : Measure Biometrics are used to identify the input sample when compared.
Natural Language Processing and Speech Enabled Applications
Automatic Speech Recognition
A review of audio fingerprinting (Cano et al. 2005)
ARTIFICIAL NEURAL NETWORKS
Artificial Intelligence for Speech Recognition
Speech Technology Center Solutions
3.0 Map of Subject Areas.
Sfax University, Tunisia
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Faculty of Science IT Department Lecturer: Raz Dara MA.
A maximum likelihood estimation and training on the fly approach
Presenter: Shih-Hsiang(士翔)
Presentation transcript:

Automatic Speaker Recognition Recent Progress, Current Applications, and Future Trends Douglas A. Reynolds, PhD Senior Member of Technical Staff M.I.T. Lincoln Laboratory Larry P. Heck, PhD Speaker Verification R&D Nuance Communications Describe the positions of the speakers of this talk.

Outline Introduction and applications General theory Performance Conclusion and future directions The aim of this talk is to provide an overview of automatic speaker recognition. We will first present the definition and terminology for the core tasks underlying all speaker recognition applications and then provide three (?) concrete examples of speaker recognition applications. Next we will present some general details of the underlying technology behind speaker recognition systems and provide an overview of performance of automatic systems. Finally we will present some ideas of current limitations and future directions for research and applications. 1 1

Extracting Information from Speech Goal: Automatically extract information transmitted in speech signal Speech Recognition Language Speaker Words Language Name Speaker Name “How are you?” English James Wilson Speech Signal Speaker conveys several levels of information. In addition to the message being spoken (words) there is also information about the language and the speaker. The aim in all automatic speech processing techniques is to extract these levels of information for further processing (database query, info for synthesis, etc.) Although shown as independent extraction paths, knowledge gained from the different levels of information can be used together for different application goals.

Introduction Identification Determines who is talking from set of known voices No identity claim from user (many to one mapping) Often assumed that unknown voice must come from set of known speakers - referred to as closed-set identification ? Whose voice is this? ? Definition of identification task and closed set condition.. ? ?

Introduction Verification/Authentication/Detection Determine whether person is who they claim to be User makes identity claim: one to one mapping Unknown voice could come from large set of unknown speakers - referred to as open-set verification Adding “none-of-the-above” option to closed-set identification gives open-set identification Definition of verification/authentication/detection task and open-set condition. Is this Bob’s voice? ?

Introduction Speech Modalities Application dictates different speech modalities: Text-dependent recognition Recognition system knows text spoken by person Examples: fixed phrase, prompted phrase Used for applications with strong control over user input Knowledge of spoken text can improve system performance Text-independent recognition Recognition system does not know text spoken by person Examples: User selected phrase, conversational speech Used for applications with less control over user input More flexible system but also more difficult problem Speech recognition can provide knowledge of spoken text Definition of speech modalities of input speech.

Introduction Voice as a Biometric Biometric: a human generated signal or attribute for authenticating a person’s identity Voice is a popular biometric: natural signal to produce does not require a specialized input device ubiquitous: telephones and microphone equipped PC Voice biometric with other forms of security Strongest security Something you have - e.g., badge Are Definition of voice as a biometric. Potential lead in to slide describing the integration of voice and knowledge verification. Something you know - e.g., password Know Have Something you are - e.g., voice

Introduction Applications Access control Physical facilities Data and data networks Transaction authentication Toll fraud prevention Telephone credit card purchases Bank wire transfers Monitoring Remote time and attendance logging Home parole verification Prison telephone usage Information retrieval Customer information for call centers Audio indexing (speech skimming device) Forensics Voice sample matching List of application areas of speaker recognition technology. Define two or three that will be described in more detail.

Outline Introduction and applications General theory Performance Conclusion and future directions 1 1

General Theory Components of Speaker Verification System Bob’s “Voiceprint” Bob Speaker Model ACCEPT ACCEPT Feature extraction Input Speech “My Name is Bob” S Decision REJECT Outline the three main components of all speaker recognition systems. Feature extraction; speaker modeling (voiceprint creation) and verification decision. Impostor Model Impostor “Voiceprints” Identity Claim

General Theory Phases of Speaker Verification System Two distinct phases to any speaker verification system Enrollment Phase Enrollment speech for each speaker Voiceprints (models) for each speaker Sally Bob Bob Feature extraction Model training Model training Sally There are two distinct phases for automatic speaker verification systems - enrollment to create a voiceprint (model) for the specific speaker and verification to verify the unknown voice with the proffered identity.. Feature extraction Verification decision Claimed identity: Sally Verification Phase Verification decision Accepted!

General Theory Features for Speaker Recognition Humans use several levels of perceptual cues for speaker recognition Easy to automatically extract Difficult to automatically extract High-level cues (learned traits) Low-level cues (physical traits) Hierarchy of Perceptual Cues Describe understanding of how humans recognize speakers from speech and how this leads to information that is suitable for automatic systems. There are no exclusive speaker identifiably cues Low-level acoustic cues most applicable for automatic systems

General Theory Features for Speaker Recognition Desirable attributes of features for an automatic system (Wolf ‘72) Occur naturally and frequently in speech Easily measurable Not change over time or be affected by speaker’s health Not be affected by reasonable background noise nor depend on specific transmission characteristics Not be subject to mimicry Practical Robust Secure Desirable attributes for automatic features. No feature has all these attributes Features derived from spectrum of speech have proven to be the most effective in automatic systems

General Theory Speech Production Speech production model: source-filter interaction Anatomical structure (vocal tract/glottis) conveyed in speech spectrum Glottal pulses Vocal tract Speech signal Describe link of speech signal to speaker specific information (anatomical structure)

General Theory Features for Speaker Recognition Speech is a continuous evolution of the vocal tract Need to extract time series of spectra Use a sliding window - 20 ms window, 10 ms shift ... Fourier Transform Magnitude Produces time-frequency evolution of the spectrum Briefly describe how automatic systems extract spectral information from speech signal using Fourier transform of sliding window over continuous speech. (Same technique as used for speech and language recognition).

General Theory Speaker Models General Theory Components of Speaker Verification System General Theory Speaker Models Speaker Model Bob’s “Voiceprint” Bob ACCEPT Feature extraction “My Name is Bob” Impostor Model Identity Claim Decision REJECT S Impostor “Voiceprints” Speaker Model Bob’s “Voiceprint” Bob Outline the three main components of all speaker recognition systems. Feature extraction; speaker modeling (voiceprint creation) and verification decision.

General Theory Speaker Models Speaker models (voiceprints) represent voice biometric in compact and generalizable form Modern speaker verification systems use Hidden Markov Models (HMMs) HMMs are statistical models of how a speaker produces sounds HMMs represent underlying statistical variations in the speech state (e.g., phoneme) and temporal changes of speech between the states. Fast training algorithms (EM) exist for HMMs with guaranteed convergence properties. h-a-d Describe basic model used to capture the speaker characteristics - HMM

General Theory Speaker Models Form of HMM depends on the application “Open sesame” Fixed Phrase Word/phrase models /s/ /i/ /x/ Prompted phrases/passwords Phoneme models Describe the basic forms of how HMMs are used for different speech modalities General speech Text-independent single state HMM

General Theory Verification Decision General Theory Components of Speaker Verification System General Theory Verification Decision S Bob’s “Voiceprint” Bob ACCEPT Feature extraction “My Name is Bob” Identity Claim Speaker Model Impostor Model Decision REJECT Impostor “Voiceprints” Speaker Model Bob’s “Voiceprint” Bob Impostor Model Decision REJECT Impostor “Voiceprints” S ACCEPT Outline the three main components of all speaker recognition systems. Feature extraction; speaker modeling (voiceprint creation) and verification decision.

General Theory Verification Decision Verification decision approaches have roots in signal detection theory 2-class Hypothesis test: H0: the speaker is an impostor H1: the speaker is indeed the claimed speaker. Statistic computed on test utterance S as likelihood ratio: Likelihood S came from speaker HMM Likelihood S did not come from speaker HMM L = log L < q reject Feature extraction Speaker Model Impostor Model Decision S + - > q accept Describe basic verification decision - likelihood ratio

Outline Introduction and application General theory Performance Conclusion and future directions 1 1

Verification Performance Evaluating Speaker Verification Systems There are many factors to consider in design of an evaluation of a speaker verification system Speech quality Channel and microphone characteristics Noise level and type Variability between enrollment and verification speech Speech modality Fixed/prompted/user-selected phrases Free text Speech duration Duration and number of sessions of enrollment and verification speech Speaker population Size and composition Describe some of the dimensions of core speaker verification technology evaluation. Application specific evaluation, however, will depend on other practical considerations of throughput, ease of enrollment, etc. Most importantly: The evaluation data and design should match the target application domain of interest

Verification Performance Evaluating Speaker Verification Systems Example Performance Curve Wire Transfer: False acceptance is very costly Users may tolerate rejections for security Application operating point depends on relative costs of the two error types High Security PROBABILITY OF FALSE REJECT (in %) Equal Error Rate (EER) = 1 % Balance Example of DET (ROC) curve and where different applications may operate. Toll Fraud: False rejections alienate customers Any fraud rejection is beneficial High Convenience PROBABILITY OF FALSE ACCEPT (in %)

Verification Performance NIST Speaker Verification Evaluations Annual NIST evaluations of speaker verification technology (since 1995) Aim: Provide a common paradigm for comparing technologies Focus: Conversational telephone speech (text-independent) Linguistic Data Consortium Data Provider Evaluation Coordinator Comparison of technologies on common task Evaluate Technology Developers Describe NIST text-independent speaker recognition evaluations.This is an example of the core research evaluation. Improve

Verification Performance Range of Performance Text-independent (Read sentences) Military radio Data Multiple radios & microphones Moderate amount of training data Increasing constraints Text-independent (Conversational) Telephone Data Multiple microphones Moderate amount of training data Probability of False Reject (in %) Text-dependent (Combinations) Clean Data Single microphone Large amount of train/test speech Summary of performance variability of core verification system with respect to constraints (vocabulary and environment). This leads into application performance where various constraints and auxiliary knowledge are used to produce performance required. Text-dependent (Digit strings) Telephone Data Multiple microphones Small amount of training data Probability of False Accept (in %)

Verification Performance Human vs. Machine Motivation for comparing human to machine Evaluating speech coders and potential forensic applications Schmidt-Nielsen and Crystal used NIST evaluation (DSP Journal, January 2000) Same amount of training data Matched Handset-type tests Mismatched Handset-type tests Used 3-sec conversational utterances from telephone speech Humans 44% better Humans 15% worse Error Rates

Verification Performance Application Deployments Benefits Security Personalization Application Voice authentication based on spoken phone number Provides secure access to customer record & credit card information Volume 250k customers enrolled currently @20K calls/day 5 million customers will enroll by Q2 ‘00 @150K calls/day Application specific performance where constraints and other knowledge sources are applied to produce required operating levels. Implementation Edify telephony platform Performance @1% EER

Verification Performance Speaker + Knowledge Verification Voice Prints Please enter your account number “5551234” Say your date of birth “October 13, 1964” You’re accepted by the system Authenticate Voice Accept Reject Biometric Knowledge Voice over Telephone Authenticate Knowledge Application specific performance where constraints and other knowledge sources are applied to produce required operating levels. Data

Outline Introduction General theory Performance Conclusion and future directions 1 1

Conclusions Speaker verification is one of the few recognition areas where machines can outperform humans Speaker verification technology is a viable technique currently available for applications Speaker verification can be augmented with other authentication techniques to add further security Provide basic wrap up of current state of area.

Future Directions Research will focus on using speaker verification techniques for more unconstrained, uncontrolled situations Audio search and retrieval Increasing robustness to channel variabilities Incorporating higher-levels of knowledge into decisions Speaker recognition technology will become an integral part of speech interfaces Personalization of services and devices Unobtrusive protection of transactions and information Provide basic wrap up of current state of area.