Speaker Verification System Part B Final Presentation

Name: Speaker Verification System Part B Final Presentation
Uploaded: 2017-11-29T07:15:26+00:00
Duration: PTM24S35
Channel: Abigayle Shaw
Description: Speaker Verification System Part B Final Presentation

Speaker Verification System Part B Final Presentation
Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabbag

Implementation of a speaker verification algorithm on a DSP
The Project Goal Implementation of a speaker verification algorithm on a DSP The verification module will perform a real time authentication of the user based on sampled voice data. The idea is to integrate the speaker verification model with other security and management models allowing them to grant access to resources based on the speakers voice verification.

Introduction Speaker verification is the process of automatically authenticating the speaker on the basis of individual information included in speech waves. Speaker’s Identity (Reference) Speaker’s Voice Segment Speaker Verification System Result [0:1]

System Overview Access Denied BT Base Station My name is Bob! LAN
Speaker Verification Unit Server BT Base Station LAN

System Description The system is compound from TI’s C6701floating point DSP with the speaker verification algorithm on it. A user with a hand device (e.g. bluetooth on a PDA), will receive access to different resources ( door opening, file access, etc) based on a voice verification process. The project implements only the speaker verification algorithm on the DSP and has input and output interfaces to interact with other devices (e.g. Bluetooth). The DSP is encoded with the users voice signature. Each time user verification is needed, the algorithm compares the speakers voice with the signature. 3

(training phase – building
System Block Diagram DSP Signature parameters Enrollment Server (training phase – building A signature) Codec Verification Channel Voice Channel (optional) Bluetooth Radio Interface Bluetooth unit Codec Bluetooth Base station Authorization Server LAN Voice Channel (optional) Voice Channel “My name is Bob” 5

Project Description: Part One: Part Two:
Literature review Algorithms selection MATLAB implementation Result analysis Part Two: Implementation of the chosen algorithm on a DSP

Speaker Verification Process
Analog Speech Pre-Processing Feature Extraction Reference Model Pattern Matching Decision Result [0:1]

Implemented Algorithms: Feature Extraction Module – MFCC
MFCC (Mel Frequency Cepstral Coefficients) is the most common technique for feature extraction. MFCC tries to mimic the way our ears work by analyzing the speech waves linearly at low frequencies and logarithmically at high frequencies. The idea acts as follows: Spectrum Mel Spectrum FFT Mel-frequency Wrapping Cepstrum Mel Cepstrum Windowed PDS Frame

Implemented Algorithms: Pattern Matching Modeling Module – Vector Quantization (VQ)
In the enrolment part we build a codebook of the speaker according to the LBG (Linde, Buzo, Gray) algorithm, which creates an N size codebook from set of L feature vectors. In the verification stage, we are measuring the distortion of the given sequence of the feature vectors to the reference codebook. Pattern Matching = Distortion measure Reference Model = Codebook Distortion Rate Feature Vector

Implemented Algorithms: Decision
In VQ the decision is based on checking if the distortion rate is higher than a preset threshold: acceptance if distortion rate > t, else rejection. In this project no decision model will be build, the output of the system will be based on the following score rate (values between 0 to 1), which indicates the suitability of the person to the reference model: Score = exp (-mean distance)

Implementation Environment
Hardware tools: TI DSP 6701 EVM board PC host station Software development tools: TI Code Composer Matlab 6.1 Programming Languages: C Assembler Matlab

Working Environment

TI DSP 6701 EVM Why? Floating Point
Designed Especially for Voice Applications Large Bank of On Chip Memory High level development (C) PCI Interface Why Not? Price Size Consumption

Program Workflow Program DSP MATLAB Program Analog Speech (input)
Pre-Processing Feature Extraction MATLAB Program Reference Model Pattern Matching Decision Result [0:1] (output)

Step By Step Implementation
Pre-processing a ‘ones’ vector on the DSP and comparing it to the Matlab results Pre-processing an audio file and comparing to the Matlab results Feature extracting of the audio file (after pre-processing) and comparing to the Matlab results Pattern matching the feature vectors to a ‘ones’ codebook matrix and comparing to the Matlab results (running with the same codebook) Creating a real codebook from a reference speaker importing it to the DSP and comparing the running results of the DSP and the Matlab Verifying that the distances of the speakers from the codebook in the DSP program and in the Matlab program are the same

Creating the Assembler Lookup Files
Creating the output data through Matlab functions (e.g. hamming(n)) Saving the output in an assembler lookup table format Referencing the lookup table with a name that will be called from the C source code in the DSP project (as a function) hamming = fopen('hamming.asm', 'wt', 'l'); fprintf(hamming, '; hamming.asm - single precision floating point table generated from MATLAB\n'); fprintf(hamming, '\t.def\t_hamming\n'); fprintf(hamming, '\t.sym\t_hamming, _hamming, 54, 2, %d,, %d\n', size, n); fprintf(hamming, '\t.data\n'); fprintf(hamming, '_hamming:\n'); fprintf(hamming, '\t.word\t%tXh, %tXh, %tXh, %tXh\n', h); fprintf(hamming, '\n'); fclose(hamming); Importing the file as an asm file (adding a file to the project) to the DSP project ; hamming.asm - single precision floating point table generated from MATLAB .def _hamming .sym _hamming, _hamming, 54, 2, 8192,, 256 .data _hamming: .word 3DA3D70Ah, 3DA4203Fh, 3DA4FBD3h, 3DA669A4h .word 3DA86978h, 3DAAFB01h, 3DAE1DD8h, 3DB1D180h .word 3DB61567h, 3DBAE8E1h, 3DC04B30h, 3DC63B7Dh .word 3DCCB8DCh, 3DD3C24Bh, 3DDB56B1h, 3DE374E1h .word 3DEC1B99h, 3DF5497Fh, 3DFEFD27h, 3E049A87h .word 3E09F7D0h, 3E0F9597h, 3E1572FFh, 3E1B8F1Ch h = hamming(n); Using the lookup table in the C source code // Windowing the filtered frame with Hamming ---- for (k=0 ; k < N ; k++){ for (j=0 ; j < N ; j++){ if (k - j < 0) break; frame[k] += hamming[j]*filtered_frame[k-j]; }

Binding All The Pieces DSP Program C Code
Analog Speech (input) Generation of assembly functions through Matlab Hamming.asm Generation of voice data file from a *.wav format file through Matlab waveread function Sari5fix.asm DSP Program C Code Pre-Processing Generation of assembly functions through Matlab Melbank.asm Rdct.asm Feature Extraction Generation of assembly functions through Matlab Codebook.asm Pattern Matching Decision Result [0:1] (output)

Software Modules main init O(1) extract_frame O(n^2) digitrev_index
bitrev O(n) hamming O(1) bitrev O(n) melbank O(n) cfftr2_dit O(nlog(n)) calc_dist O(1)

Project Structure speakerverification.pjt Include Files board.h
codec.h dma.h intr.h mcbsp.h link.cmd pci.h regs.h Libraries verification.h rts6700.lib Source bitrevf.asm cfftr2.asm codebook.asm digitrev_index.c hamming.asm melbank.asm rdct.asm verification.c

Tested System The Tested System parameters:
The tested algorithms and methods were the MFCC and VQ with the following parameters: Sampling Frequency: 11025Hz Feature Vector Size: 18 Window Size: 256 Offset Size: 128 Codebook Size: 128 Number of iterations for codebook creation: 25 We compared between the Matlab and DSP results based on a codebook created from Daniel’s 60 seconds of random speech and random selection of different five seconds speakers.

Verifications The DSP results were compared to the Matlab simulation.
We chose random speakers from the speakers DB with one reference codebook. For Example: Person MATLAB DSP Daniel % (0.4011) 66.95% (0.4011) Barak % (0.8206) 44.01% (0.8206) Ayelet % (0.8299) 43.61% (0.8299) Diego % (0.6166) 53.97% (0.6166) Adi % (0.8656) 42.07% (0.8656)

Conclusions The TI DSP 6701 EVM is capable of preforming speaker
verification analysis and achieve high resolution results (as achieved in the Matlab) Speaker Verification algorithms are not mature enough to become a good biometric detection solution Code Composer is not stable and good enough to become an “easy to use” development environment A second phase project, which will implement a complete verification system should be build

Time Table – First Semester
– Project description presentation – completion of phase A: literature review and algorithm selection – Handing out the mid-term report – Beginning of phase B: algorithm implementation in MATLAB – Publishing the MATLAB results and selecting the algorithm that will be implemented on the DSP

Time Table – Second Semester
– Presenting the progress and planning of the project to the supervisor – Finishing MATLAB Testing – The beginning of the implementation on the DSP – Project presentation and handing the project final report

Thanks

Backup Slides

Pre-Processing (step 1)
Analog Speech Windowed PDS Frames [1, 2, … , N] Pre-Processing

Pre-Processing module
Analog Speech Anti aliasing filter to avoid aliasing during sampling. LPF [0, Fs/2] LPF Band Limited Analog Speech Analog to digital converter with frequency sampling (Fs) of [10,16]KHz A/D Digital Speech Low order digital system to spectrally flatten the signal (in favor of vocal tract parameters), and make it less susceptible to later finite precision effects First Order FIR Pre-emphasized Digital Speech (PDS) Frame Blocking Frame blocking of the sampled signal. Each frame is of N samples overlapped with N-M samples of the previous frame. Frame rate ~ 100 Frames/Sec N values: [200,300], M values: [100,200] PDS Frames Frame Windowing Using Hamming (or Hanning or Blackman) windowing in order to minimize the signal discontinuities at the beginning and end of each frame. Windowed PDS Frames

Feature Extraction (step 2)
Windowed PDS Frames Set of Feature Vectors [1, 2, … , N] [1, 2, … , K] Feature Extraction Extracting the features of speech from each frame and representing it in a vector (feature vector).

MFCC – Mel-frequency Wrapping
Psychophysical studies have shown that human perception of the frequency contents of sounds for speech signals does not follow a linear scale. Thus for each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the ‘mel’ scale. The mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. Therefore we can use the following approximate formula to compute the mels for a given frequency f in Hz:

MFCC – Filter Bank One way to simulating the spectrum is by using a filter bank, spaced uniformly on the mel scale. That filter bank has a triangular bandpass frequency response, and the spacing as well as the bandwidth is determined by a constant mel frequency interval.

MFCC – Cepstrum Here, we convert the log mel spectrum back to time. The result is called the mel frequency cepstrum coefficients (MFCC). Because the mel spectrum coefficients are real numbers, we can convert them to the time domain using the Discrete Cosine Transform (DCT) and get a featured vector.

Pattern Matching Modeling (step 3)
The pattern matching modeling techniques is divided into two sections: The enrolment part, in which we build the reference model of the speaker. The verifications (matching) part, where the users will be compared to this model.

Enrollment part – Modeling
Set of Feature Vectors [1, 2, … , K] Modeling Speaker Model This part is done outside the DSP and the DSP receives only the speaker model (calculated offline in a host).

Pattern Matching Speaker Model Pattern Matching Matching Rate
Set of Feature Vectors [1, 2, … , K] Pattern Matching Matching Rate

Decision Module (Optional)
In VQ the decision is based on checking if the distortion rate is higher than a preset threshold: if distortion rate > t, Output = Yes, else Output = No. In HMM the decision is based on checking if the probability score is higher than a preset threshold: if probability scores > t, Output = Yes,

The Voice Database Two reference models were generated (one male and one female), each model was trained in 3 different ways: repeating the same sentence for 15 seconds repeating the same sentence for 40 seconds reading random text for one minute The voice database is compound from 10 different speakers (5 males and 5 females), each speaker was recorded in 3 ways: repeating the reference sentence once (5 seconds) repeating the reference sentence 3 times (15 seconds) speaking a random sentence for 5 seconds

Experiment Description Cont.
Conclusions: Window size of 330 and offset of 110 samples performs better than window size of 256 and offset of 128 samples

Conclusions: Feature vector of 18 coeffs is better than feature vector of 12 coeffs

Conclusions: Worst combinations: 5 seconds of fixed sentence for testing with an enrolment of 15 seconds of the same sentence. enrolment of 40 seconds of the same sentence. Best combinations: 15 seconds of fixed sentence for testing with an enrolment of 40 seconds of the same sentence. enrolment of 60 seconds of random sentences. 5 seconds of a random sentence with an

The Best Results:

Additional verification results
The DSP results were compared to the Matlab simulation. We chose random speakers from the speakers DB with one reference codebook. For Example: Person MATLAB DSP Alex % (0.3627) % (0.3627) Sari % (0.4835) % (0.4835) Roee % (0.6938) % (0.6938) Eran % (0.6023) % (0.6023) Hila % (0.5849) 55.72% (0.5849)

Speaker Verification System Part B Final Presentation

Similar presentations

Presentation on theme: "Speaker Verification System Part B Final Presentation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speaker Verification System Part B Final Presentation

Similar presentations

Presentation on theme: "Speaker Verification System Part B Final Presentation"— Presentation transcript:

Similar presentations

About project

Feedback