Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz.

Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz

Structure Project Objective The Model The Classification Process Results & Analysis Conclusion

Project Objective Configure a neural network based system for voice recognition

The Model

The Main Principle The readout function recognizes the basin that the network has converged to, and classifies the input according to the indicator of that basin

Correspondence with the Theory of Attractor Neural Networks The system converges to a basin The basins are periodic attractors

Correspondence with the LSM theory The neural network may be treated as a liquid The readout function receives only the current state of the liquid and transforms it to an output signal The system can perform several tasks simultaneously

Neural Network Structure 22 Input Neurons 135 spiking neurons in a 3x3x15 formation LIF model for neurons behavior 20% of the neurons are inhibitory and 80% of them are excitatory Dynamic synapses

Creating the Stimulus 30 seconds of recorded speech are encoded into 1 second of spike trains, in the following methods: Time Encoding – A straight forward conversion

Creating the Stimulus Mel Frequency Cepstral Coefficients (MFCCs) encoding - In this method the frequency bands are positioned logarithmically, on the mel scale. A periodic spikes train is added to the second of the voice segment.

Performing a Simulation A new network is created A stimulus of one speech segment is fed to the network, followed by a periodic driving force (Repeated for every combination of segment and frequency). The basins are categorized by their activity vector.

The Classification Process

The Indicators Map - The number of segments of the wanted voice that converged to the basin b. - The number of segments of the unwanted voice that converged to the basin b. - The total number of initials that converged to the basin b.

The Indicators Map The indicator of basin b:

The Indicators Map Examples:

The Indicators Map

Indicators’ Average:

Tuning Step 1. Select frequencies

Tuning Preceding to Step 2. Why do we need a threshold?

Tuning Step 2. Determine the threshold

Results – Amplitude Encoded Input Input Examples Wanted Voice Unwanted Voice

Results – Amplitude Encoded Input Results of a verification test

Results – Amplitude Encoded Input Results of a Classification Test InputClassified asOur ClassificationTrue Classification Wanted 71%100% WantedUnwanted29%0% UnwantedWanted55.9%0% Unwanted 44.1%100%

Results – Amplitude Encoded Input Results of Classification by Two Different Systems Input Classified as System 1System 2True Classification Wanted 71%94%100% WantedUnwanted29%6%0% UnwantedWanted55.9%61.23%0% Unwanted 44.1%38.77%100%

Results – Amplitude Encoded Input Cross Classification

Results – Amplitude Encoded Input Results of cross classification for systems 1 and 2: 50.2% Answered, 49.8% Unanswered Input Classified as System 1System 2Cross Classification Wanted 71%94%97.1% WantedUnwanted29%6%2.9% UnwantedWanted55.9%61.23%66.5% Unwanted 44.1%38.77%33.5%

Results – MFCC Encoded Input Input Examples Wanted Voice Unwanted Voice

Results – MFCC Encoded Input True Classification Classified as Test I Segments: 100 wanted, 400 unwanted Test II Segments: 30 wanted, 30 unwanted WantedWanted (Hit)87%86.8% WantedUnwanted (Miss-Hit)13%13.2% UnwantedWanted (False Alarm)55.3%45% UnwantedUnwanted (Hit)44.7%55% Results of a classification test Two sets of new data were used

Results – MFCC Encoded Input InputClassificationf=18Hz, th=0.3 f=18Hz, th=0 f=18Hz, th=-0.12 f=18Hz, th=-0.2 Data set 3 Segments: Wanted (Hit)58%87%96%100% 100 wantedUnwanted (Miss-Hit) 42%13%4%0% 400 unwantedWanted (False Alarm) 32.2%55.3%77.5%93.75% Unwanted (Hit)67.8%44.7%22.5%6.25% Data set 4 Segments: Wanted (Hit)47.3%86.8%97%100% 30 wantedUnwanted (Miss-Hit) 52.7%13.2%2.6%0% 30 unwantedWanted (False Alarm) 17.5%45%82.5%92.5% Unwanted (Hit)82.5%55%17%7%

Basins Creation Pattern (a) 324 initials (b) 100 initials (c) 60 initials

Conclusion A system for voice recognition, based on neuro-computations, was designed The system succeeded in recognizing the wanted voice when the input was encoded by its amplitude.

Conclusion The MFCC method yielded very different inputs, therefore the ability of the system to recognize such input was proven partially. The system’s stability was proved

Suggestions for Future Projects Prepare the system for various types of inputs Perform automatic tuning by using statistical tools Prove that the system can perform several tasks simultaneously

THE END

Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz.

Similar presentations

Presentation on theme: "Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz.

Similar presentations

Presentation on theme: "Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz."— Presentation transcript:

Similar presentations

About project

Feedback