Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)

Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
2018/12/7 1999 CSIST Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識) J.-S. Roger Jang (張智星) CS Dept., Tsing Hua Univ., Taiwan ... In this talk, we are going to apply two neural network controller design techniques to fuzzy controllers, and construct the so-called on-line adaptive neuro-fuzzy controllers for nonlinear control systems. We are going to use MATLAB, SIMULINK and Handle Graphics to demonstrate the concept. So you can also get a preview of some of the features of the Fuzzy Logic Toolbox, or FLT, version 2.

Outline Introduction Data acquisition Feature extraction
2018/12/7 Outline Introduction Data acquisition Feature extraction Data reduction Condensing, editing, fuzzy clustering Fuzzy classifier refinement Random search Experiments Conclusions and future work 2018/12/7

Speaker Recognition Types: Methodologies involved:
2018/12/7 Speaker Recognition Types: Text-dependent or text-independent Close-set or open-set Methodologies involved: Digital signal processing Pattern recognition Clustering or vector quantization Nonlinear optimization Neuro-fuzzy techniques Specifically, this is the outline of the talk. Wel start from the basics, introduce the concepts of fuzzy sets and membership functions. By using fuzzy sets, we can formulate fuzzy if-then rules, which are commonly used in our daily expressions. We can use a collection of fuzzy rules to describe a system behavior; this forms the fuzzy inference system, or fuzzy controller if used in control systems. In particular, we can can apply neural networks?learning method in a fuzzy inference system. A fuzzy inference system with learning capability is called ANFIS, stands for adaptive neuro-fuzzy inference system. Actually, ANFIS is already available in the current version of FLT, but it has certain restrictions. We are going to remove some of these restrictions in the next version of FLT. Most of all, we are going to have an on-line ANFIS block for SIMULINK; this block has on-line learning capability and it ideal for on-line adaptive neuro-fuzzy control applications. We will use this block in our demos; one is inverse learning and the other is feedback linearization. 2018/12/7

Data Acquisition Recording: Samples Recording program of Windows 95/98
2018/12/7 Data Acquisition Recording: Recording program of Windows 95/98 8 kHz sampling rate, 8-bit resolution (worse than phone quality) 5-second speech signal takes about 40KB. Samples Speaker #1 Speaker #2 Speaker #3 2018/12/7

Feature Extraction Major steps: Optional steps:
2018/12/7 Feature Extraction Major steps: Overlapping frames of 256 points (32 ms) Hamming windowing to lessen distortion cepstrum(frame) = real(IFFT(log|FFT(frame)|)) FFT = Fast Fourier Transform IFFT = Inverse FFT A feature vector consists of the first 14 cepstral coefficients of a frame. Optional steps: Frequency-selective filter to reduce noise Mel-warped cepstral coefficients Feature-wise normalization 2018/12/7

Physical Meanings of Cepstrum
2018/12/7 Physical Meanings of Cepstrum 2018/12/7

Feature Extraction 2.39 sec. speech signal low-pass filter take frames
2018/12/7 Feature Extraction 2.39 sec. speech signal 148 frames of 256 points low-pass filter take frames Hamming windowing IFFT resample log abs FFT real first 14 coefficients normalization 148 feature vectors of length 14 2018/12/7

Feature Extraction Upper: speaker #1 , lower: speaker #2 2018/12/7

Pattern Recognition Schematic diagram: Design (off-line) Sample speech
2018/12/7 Pattern Recognition Schematic diagram: Design (off-line) Feature Extraction Data Reduction Sample speech Sample set Application (on-line) Classifier Feature Extraction Recognized speaker Test speech 2018/12/7

Pattern Recognition Methods
2018/12/7 Pattern Recognition Methods K-NNR: K nearest neighbor rule Euclidean distance Mahalanobis distance Maximum log likelihood Adaptive networks Multilayer perceptrons Radial basis function networks Fuzzy classifiers with random search 2018/12/7

K-Nearest Neighbor Rule (K-NNR)
2018/12/7 K-Nearest Neighbor Rule (K-NNR) Steps: 1. Find the first k nearest neighbors of a given point. 2. Determine the class of the given point by a voting mechanism among these k nearest neighbors. : class-A point : class-B point : point with unknown class Circle of 3-nearest neighbors The point is class B via 3-NNR. Feature 2 2018/12/7 Feature 1

Decision Boundary for 1-NNR
2018/12/7 Decision Boundary for 1-NNR Voronoi diagram: piecewise linear boundary 2018/12/7

Distance Metrics Euclidean distance Mahalanobis distance 2018/12/7

Data Reduction Purpose: Techniques: Reduce NNR computation load
2018/12/7 Data Reduction Purpose: Reduce NNR computation load Increase data consistency Techniques: To reduce data size: Editing: To eliminate noisy (boundary) data Condensing: To eliminate redundant (deeply embedded) data Vector quantization: To find representative data To reduce data dimensions: Principal component projection: To reduce the dimensions of the feature sets Discriminant projection: To find the best set of vectors which best separates the patterns 2018/12/7

2018/12/7 Editing To remove noisy (boundary) data 2018/12/7

Condensing To remove redundant (deeply embedded) data 2018/12/7

VQ: Fuzzy C-Means Clustering
2018/12/7 VQ: Fuzzy C-Means Clustering A point can belong to various clusters with various degrees. 2018/12/7

A fuzzy classifier is equivalent to a 1-NNR if all MFs have the
2018/12/7 Fuzzy Classifier Rule base: if x is close to (A1 or A2 or A3), then class = if x is close to (B1 or B2 or B3), then class = A3 A fuzzy classifier is equivalent to a 1-NNR if all MFs have the same width. v A2 B1 v B2 A1 v 2018/12/7 B3

Fuzzy Classifier Adaptive network representation: x1 + y - x2
2018/12/7 Fuzzy Classifier Adaptive network representation: A1 max A2 x1 A3 + y S x2 B1 - max B2 B2 multidimensional MFs x = [x1 x2] belongs to class if y > 0 class if y < 0 2018/12/7

Refining Fuzzy Classifier
2018/12/7 Refining Fuzzy Classifier MFs with the same width v v v MFs’ widths refined via random search 2018/12/7

Principal Component Projection
2018/12/7 Principal Component Projection Eigenvalues of covariance matrix: l1 > l2 > l3 > ... > ld Projection on v1 & v Projection on v3 & v4 2018/12/7

Discriminant Projection
2018/12/7 Discriminant Projection Best discriminant vectors : v1, v2, ... , vd Projection on v1 & v Projection on v3 & v4 2018/12/7

Experiments Experimental data : Experiments :
2018/12/7 Experiments Experimental data : Sample size = 578, test size = 1063, no. of class = 3 No. of each speaker for sample data : No. of each speaker for test data : Experiments : K-NNR with all sample data K-NNR with reduced sample data Fuzzy classifier refined via random search 2018/12/7

Performance Using All Samples
2018/12/7 Performance Using All Samples Sample size = 578 Test size = 1063 Recognition rates as functions of the speech signal length Confusion matrix 2018/12/7

2018/12/7 Performance After E & D Sample size = 497 after editing, 64 after condensing Test size = 1063 Recognition rates as functions of the speech signal length Confusion matrix 2018/12/7

Performance After VQ (FCM)
2018/12/7 Performance After VQ (FCM) Sample size = 60 after FCM Test size = 1063 Recognition rates as functions of the speech signal length Confusion matrix 2018/12/7

Performance After VQ + RS
2018/12/7 Performance After VQ + RS Sample (rule) size = 60, tuned via random search Test size = 1063 Recognition rates as functions of the speech signal length Confusion matrix 2018/12/7

On-line Recognition & Hardware Setup
2018/12/7 On-line Recognition & Hardware Setup 2018/12/7

Conclusions Performance after editing and condensing is unpredictable.
2018/12/7 Conclusions Performance after editing and condensing is unpredictable. Performance after VQ (FCM) is consistently better than that of editing and condensing. A simple derivative-free optimization method, I.e., random search, can significantly enhance the performance. 2018/12/7

Future work Data dimension reduction
2018/12/7 Future work Data dimension reduction Other feature extraction methods (e.g., LPC) Scale up the problem size More speakers (ten or more) Other vocal signals (laughter, coughs, singing, etc.) Other biometric identification using Faces Fingerprints and palm prints Retina and iris scans Hand shapes/sizes/proportions Hand vein distributions 2018/12/7

Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)

Similar presentations

Presentation on theme: "Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)

Similar presentations

Presentation on theme: "Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)"— Presentation transcript:

Similar presentations

About project

Feedback