Presentation is loading. Please wait.

Presentation is loading. Please wait.

June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.

Similar presentations


Presentation on theme: "June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST."— Presentation transcript:

1 June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST chollet@tsi.enst.fr@ Biometrics in Current Security Environments Biometrics in Current Security Environments

2 June 28th, 2004BioSecure, SecurePhone2Outline  State of affairs (tasks, security, forensic,…)  Speaker characteristics in the speech signal  Automatic Speaker Verification :  Decision theory  Text dependent / Text independent  Imposture (occasional, dedicated)  Voice transformations  Audio-visual speaker verification  Evaluations (algorithms, field tests, ergonomy,…)  Conclusions, Perspectives

3 June 28th, 2004BioSecure, SecurePhone3 Why should a computer recognize who is speaking ?  Protection of individual property (habitation, bank account, personal data, messages, mobile phone, PDA,...)  Limited access (secured areas, data bases)  Personalization (only respond to its master’s voice)  Locate a particular person in an audio-visual document (information retrieval)  Who is speaking in a meeting ?  Is a suspect the criminal ? (forensic applications)

4 June 28th, 2004BioSecure, SecurePhone4 Tasks in Automatic Speaker Recognition  Speaker verification (Voice Biometric)  Are you really who you claim to be ?  Identification (Speaker ID) :  Is this speech segment coming from a known speaker ?  How large is the set of speakers (population of the world) ?  Speaker detection, segmentation, indexing, retrieval, tracking :  Looking for recordings of a particular speaker  Combining Speech and Speaker Recognition  Adaptation to a new speaker, speaker typology  Personalization in dialogue systems

5 June 28th, 2004BioSecure, SecurePhone5 Applications  Access Control  Physical facilities, Computer networks, Websites  Transaction Authentication  Telephone banking, e-Commerce  Speech data Management  Voice messaging, Search engines  Law Enforcement  Forensics, Home incarceration

6 June 28th, 2004BioSecure, SecurePhone6 Voice Biometric  Avantages  Often the only modality over the telephone,  Low cost (microphone, A/D), Ubiquity  Possible integration on a smart (SIM) card  Natural bimodal fusion : speaking face  Disadvantages  Lack of discretion  Possibility of imitation and electronic imposture  Lack of robustness to noise, distortion,…  Temporal drift

7 June 28th, 2004BioSecure, SecurePhone7 Speaker Identity in Speech  Differences in  Vocal tract shapes and muscular control  Fundamental frequency (typical values)  100 Hz (Male), 200 Hz (Female), 300 Hz (Child)  Glottal waveform  Phonotactics  Lexical usage  The differences between Voices of Twins is a limit case  Voices can also be imitated or disguised

8 June 28th, 2004BioSecure, SecurePhone8 spectral envelope of / i: / f A Speaker A Speaker B Speaker Identity  segmental factors (~30ms)  glottal excitation: fundamental frequency, amplitude, voice quality (e.g., breathiness)  vocal tract: characterized by its transfer function and represented by MFCCs (Mel Freq. Cepstral Coef)  suprasegmental factors  speaking speed (timing and rhythm of speech units)  intonation patterns  dialect, accent, pronunciation habits

9 June 28th, 2004BioSecure, SecurePhone9 Acoutic features  Short term spectral analysis

10 June 28th, 2004BioSecure, SecurePhone10 Intra- and Inter-speaker variability

11 June 28th, 2004BioSecure, SecurePhone11 Speaker Verification Typology of approaches (EAGLES Handbook)  Text dependent  Public password  Private password  Customized password  Text prompted  Text independent Incremental enrolment Evaluation

12 June 28th, 2004BioSecure, SecurePhone12 History of Speaker Recognition

13 June 28th, 2004BioSecure, SecurePhone13 Current approaches

14 June 28th, 2004BioSecure, SecurePhone14 HMM structure depends on the application

15 June 28th, 2004BioSecure, SecurePhone15 Gaussian Mixture Model  Parametric representation of the probability distribution of observations:

16 June 28th, 2004BioSecure, SecurePhone16 Gaussian Mixture Models 8 Gaussians per mixture

17 June 28th, 2004BioSecure, SecurePhone17  Two types of errors :  False rejection (a client is rejected)  False acceptation (an impostor is accepted)  Decision theory : given an observation O and a claimed identity  H 0 hypothesis : it comes from an impostor  H 1 hypothesis : it comes from our client  H 1 is chosen if and only if P(H 1 |O) > P(H 0 |O) which could be rewritten (using Bayes law) as Decision theory for identity verification

18 June 28th, 2004BioSecure, SecurePhone18 Signal detection theory

19 June 28th, 2004BioSecure, SecurePhone19 Decision

20 June 28th, 2004BioSecure, SecurePhone20 Distribution of scores

21 June 28th, 2004BioSecure, SecurePhone21 Detection Error Tradeoff (DET) Curve

22 June 28th, 2004BioSecure, SecurePhone22 Evaluation  Decision cost (FA, FR, priors, costs,…)  Receiver Operating Characteristic Curve  Reference systems (open software)  Evaluations (algorithms, field trials, ergonomy,…)

23 June 28th, 2004BioSecure, SecurePhone23 National Institute of Standards & Technology (NIST) Speaker Verification Evaluations Annual evaluation since 1995 Common paradigm for comparing technologies

24 June 28th, 2004BioSecure, SecurePhone24 NIST evaluations : Results

25 June 28th, 2004BioSecure, SecurePhone25 Combining Speech Recognition and Speaker Verification.  Speaker independent phone HMMs  Selection of segments or segment classes which are speaker specific  Preliminary evaluations are performed on the NIST extended data set (one hour of training data per speaker)

26 June 28th, 2004BioSecure, SecurePhone26 ALISP data-driven speech segmentation

27 June 28th, 2004BioSecure, SecurePhone27 Searching in client and world speech dictionaries for speaker verification purposes

28 June 28th, 2004BioSecure, SecurePhone28 Fusion

29 June 28th, 2004BioSecure, SecurePhone29 Fusion results

30 June 28th, 2004BioSecure, SecurePhone30 Speaking Faces : Motivations  A person speaking in front of a camera offers 2 modalities for identity verification (speech and face).  The sequence of face images and the synchronisation of speech and lip movements could be exploited.  Imposture is much more difficult than with single modalities. Many PCs, PDAs, mobile phones are equiped with a camera. Audio-Visual Identity Verification will offer non-intrusive security for e-commerce, e- banking,…

31 June 28th, 2004BioSecure, SecurePhone31 Talking Face Recognition (hybrid verification)

32 June 28th, 2004BioSecure, SecurePhone32 Lip features  Tracking lip movements

33 June 28th, 2004BioSecure, SecurePhone33 A talking face model  Using Hidden Markov Models (HMMs) Acoustic parameters Visual parameters

34 June 28th, 2004BioSecure, SecurePhone34 Morphing, avatars

35 June 28th, 2004BioSecure, SecurePhone35 Conclusions, Perspectives   Deliberate imposture is a challenge for speech only systems   Verification of identity based on features extracted from talking faces should be developped   Common databases and evaluation protocols are necessary   Free access to reference systems will facilitate future developments


Download ppt "June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST."

Similar presentations


Ads by Google