Evaluation of Speaker Recognition Algorithms. Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent.

Slides:

Advertisements

Similar presentations

Speech Recognition with Hidden Markov Models Winter 2011

Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

Visual Recognition Tutorial

HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.

Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.

Speaker Adaptation for Vowel Classification

Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Machine Learning CMPT 726 Simon Fraser University

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.

1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Dynamic Time Warping Applications and Derivation

Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos

9.0 Speaker Variabilities: Adaption and Recognition References: of Huang 2. “ Maximum A Posteriori Estimation for Multivariate Gaussian Mixture.

Representing Acoustic Information

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.

Isolated-Word Speech Recognition Using Hidden Markov Models

0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

7-Speech Recognition Speech Recognition Concepts

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

1 CS 552/652 Speech Recognition with Hidden Markov Models Spring 2010 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.

Jacob Zurasky ECE5526 – Spring 2011

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

Singer similarity / identification Francois Thibault MUMT 614B McGill University.

CS Statistical Machine learning Lecture 24

1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

 Present by 陳群元.  Introduction  Previous work  Predicting motion patterns  Spatio-temporal transition distribution  Discerning pedestrians  Experimental.

Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Biointelligence Laboratory, Seoul National University

ARTIFICIAL NEURAL NETWORKS

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Statistical Models for Automatic Speech Recognition

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Speech Processing Speech Recognition

Statistical Models for Automatic Speech Recognition

Wavelet-Based Denoising Using Hidden Markov Models

EE513 Audio Signals and Systems

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

LECTURE 15: REESTIMATION, EM AND MIXTURES

Qiang Huo(*) and Chorkin Chan(**)

Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)

Presentation transcript:

Evaluation of Speaker Recognition Algorithms

Speaker Recognition Speech Recognition and Speaker Recognition speaker recognition performance is dependent on the channel, noise quality. Two sets of data one to enroll and the other to verify.

Data Collection and processing MFCC extraction Test Algorithms include AHS(Arithmetic Harmonic Sphericity) Gaussian Divergence Radial Basis Function Linear Discriminant Analysis etc.,

cepstrum Cepstrum is a common transform, used to gain information from a speech signal, whose x-axis is quefrency. Used to separate transfer function from excitation signal. X(ω)=G(ω)H(ω) log|X(ω) | =log|G(ω) | +log|H(ω) | F−1log|X(ω) | =F−1log|G(ω) | +F−1log|H(ω) |

Cepstrum

MFCC Extraction

Short-time FFT Frame Blocking and Windowing Eg: First Frame size=N samples Second Frame size begins M(M<N) Overlap of N-M samples and so on… Window Function: y(n)=x(n)w(n) Eg: Hamming Window: w(n)= cos(2πn/N-1), 0<n<N-1

Mel-Frequency Wrapping Mel frequency scale is linear upto 1000Hz and logarithmic above 1000 Hz. mel(f)=2595*log(1+f / 700)

Mel-Spaced Filter bank

MFCC Cepstrum log mel spectrum back to time = MFCC MFCCs(C n ) given by where S k is the mel power spectrum coefficients

Arithmetic Harmonic Sphericity Function of eigen values of a test covariance matrix relative to a reference covariance matrix for speakers x and y, defined by where D is the dimensionality of the covariance matrix.

Gaussian Divergence Mixture of gaussian densities to model the distribution of the features of each speaker.

YOHO Dataset Sampling Frequency 8kHz

Performance – AHS with 138 subjects and 24 MFCCs

Performance – Gaussian Div with 138 subjects and 24 MFCCs

Performance – AHS with 138 subjects and 12 MFCCs

Performance – Gaussian Div with 138 subjects and 12 MFCCs

Probability Density Functions  Example 2: Review of Probability and Statistics f(x)f(x) x a=0.25 b=0.75 Probability that x is between 0.25 and 0.75 is

Cumulative Distribution Functions  cumulative distribution function (c.d.f.) F(x) for c.r.v. X is:  example: Review of Probability and Statistics f(x)f(x) x b=0.75 C.D.F. of f(x) is

Expected Values and Variance  expected (mean) value of c.r.v. X with p.d.f. f(x) is:  example 1 (discrete):  example 2 (continuous): Review of Probability and Statistics E(X) = 2·0.05+3·0.10+ … +9·0.05 =

Review of Probability and Statistics The Normal (Gaussian) Distribution  the p.d.f. of a normal distribution is where μ is the mean and σ is the standard deviation μ σ

Review of Probability and Statistics The Normal Distribution  any arbitrary p.d.f. can be constructed by summing N weighted Gaussians (mixtures of Gaussians) w1w1 w2w2 w3w3 w4w4 w5w5 w6w6

A Markov Model (Markov Chain) is: similar to a finite-state automata, with probabilities of transitioning from one state to another: Review of Markov Model? S1S1 S5S5 S2S2 S3S3 S4S transition from state to state at discrete time intervals can only be in 1 state at any given time 1.0

Transition Probabilities: no assumptions (full probabilistic description of system): P[q t = j | q t-1 = i, q t-2 = k, …, q 1 =m] usually use first-order Markov Model: P[q t = j | q t-1 = i] = a ij first-order assumption: transition probabilities depend only on previous state a ij obeys usual rules: sum of probabilities leaving a state = 1 (must leave a state) Review of Markov Model?

S1S1 S2S2 S3S Transition Probabilities: example: Review of Markov Model? a 11 = 0.0a 12 = 0.5a 13 = 0.5a 1Exit =0.0  =1.0 a 21 = 0.0a 22 = 0.7a 23 = 0.3a 2Exit =0.0  =1.0 a 31 = 0.0a 32 = 0.0a 33 = 0.0a 3Exit =1.0  =

Transition Probabilities: probability distribution function: Review of Markov Model? S1S1 S2S2 S3S p(remain in state S 2 exactly 1 time) = 0.4 ·0.6 = p(remain in state S 2 exactly 2 times) = 0.4 ·0.4 ·0.6 = p(remain in state S 2 exactly 3 times) = 0.4 ·0.4 ·0.4 ·0.6 = = exponential decay (characteristic of Markov Models)

Example 1: Single Fair Coin Review of Markov Model? S1S1 S2S2 0.5 S 1 corresponds to e 1 = Headsa 11 = 0.5a 12 = 0.5 S 2 corresponds to e 2 = Tailsa 21 = 0.5a 22 = 0.5 Generate events: H T H H T H T T T H H corresponds to state sequence S 1 S 2 S 1 S 1 S 2 S 1 S 2 S 2 S 2 S 1 S 1

Example 2: Weather Review of Markov Model? S1S1 S2S S3S

Example 2: Weather (con’t) S 1 = event 1 = rain S 2 = event 2 = clouds A = {a ij } = S 3 = event 3 = sun what is probability of {rain, rain, rain, clouds, sun, clouds, rain}? Obs. = {r, r, r, c, s, c, r} S ={S 1, S 1, S 1, S 2, S 3, S 2, S 1 } time = {1, 2, 3, 4, 5, 6, 7} (days) = P[S 1 ] P[S 1 |S 1 ] P[S 1 |S 1 ] P[S 2 |S 1 ] P[S 3 |S 2 ] P[S 2 |S 3 ] P[S 1 |S 2 ] = 0.5 · 0.7 · 0.7 · 0.25 · 0.1 · 0.7 · 0.4 = Review of Markov Model? π 1 = 0.5 π 2 = 0.4 π 3 = 0.1

Example 2: Weather (con’t) S 1 = event 1 = rain S 2 = event 2 = clouds A = {a ij } = S 3 = event 3 = sunny what is probability of {sun, sun, sun, rain, clouds, sun, sun}? Obs. = {s, s, s, r, c, s, s} S ={S 3, S 3, S 3, S 1, S 2, S 3, S 3 } time = {1, 2, 3, 4, 5, 6, 7} (days) = P[S 3 ] P[S 3 |S 3 ] P[S 3 |S 3 ] P[S 1 |S 3 ] P[S 2 |S 1 ] P[S 3 |S 2 ] P[S 3 |S 3 ] = 0.1 · 0.1 · 0.1 · 0.2 · 0.25 · 0.1 · 0.1 = 5.0x10 -7 Review of Markov Model? π 1 = 0.5 π 2 = 0.4 π 3 = 0.1

Simultaneous speech and speaker recognition using hybrid architecture –Dominique Genoud, Dan Ellis, Nelson Morgan The automatic recognition process of the human voice is often divided in two part –speech recognition –speaker recognition

Traditional System Traditional state of the art speaker recognition system task can be divided into two parts- –Feature Extraction –Model Creation

Feature Extraction

Model Creation Once the feature is extracted, a model can be created using various techniques i.e. Gaussian Mixture Model. Once the model is created we can find distance from one model to another Based on the distance a decision can be inferred.

A simultaneous speaker and speech recognition A system that models the “phone” of the speaker and also the speakers features and combines them into a model could perform very well.

A simultaneous speaker and speech recognition Maximum a posteriori (MAP) estimation is used to generate speaker-specific models from a set of speaker independent (SI) seed models. Assuming no prior knowledge about the speaker distribution, the a posteriori probability Pr is approximated by the score defined as where the speaker-specific models for all, world model.

A simultaneous speaker and speech recognition In the previous equation, was determined to be 0.02 empirically. Using Viterbi algroithm, N probable speaker P(x| ) can be found. Results: –Author reported 0.7% EER compared to 5.6% EER of GMM based system on the same dataset of 100 person.

Speech and Speaker Combination Posteriori probabilities and Likelihoods Combination for Speech and Speaker Recognition Mohamed Faouzi BenZeghiba, Eurospeech Authors used a combination of HMM/ANN (MLP) system for this work. For the features of the speech, he used 12 MFCC coefficients with energy and their first derivatives were calculated every 10 ms over a 30 ms window.

System Description is the word from a set of finite word {W} is the speakers from a set of finite registered speakers {S} is ANN parameters.

System Description Probability that a speaker is accepted is LLR(X) is the likelihood ratio.Is GMM modelis the background model where its parameters are derived from using MAP adaptation and the world data set

Combination Use of MLP adaptation. –shifting the boundaries between the phone classes without strongly affecting the posterior probabilities of the speech sounds of other speakers Author proposed following formula to combine the both system

Combination Using posteriori on the test set it can be shown that- Probability that a speaker is accepted is Determined from a posteriori of the test set.

HMM-Parameter Estimation Given an observation sequence O, determine the model parameters (A,B,π) that maximize P(O|λ) where λ= (A,B,π) γ t (i) is the probability of being in state i, then

HMM-Parameter Estimation = Expected frequency in state i at time t=1

Thank You