Speaker Classification through Deep Learning

Slides:



Advertisements
Similar presentations
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Advertisements

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Voice Recognition by a Realistic Model of Biological Neural Networks by Efrat Barak Supervised by Karina Odinaev Igal Raichelgauz.
Artificial Neural Networks Artificial Neural Networks are (among other things) another technique for supervised learning k-Nearest Neighbor Decision Tree.
Artificial Neural Networks (ANNs)
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Neural Networks Lab 5. What Is Neural Networks? Neural networks are composed of simple elements( Neurons) operating in parallel. Neural networks are composed.
Eng. Shady Yehia El-Mashad
Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
1 st Neural Network: AND function Threshold(Y) = 2 X1 Y X Y.
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Network Theory and Application Ashish Venugopal Sriram Gollapalli Ulas Bardak.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
NEURAL NETWORKS FOR DATA MINING
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Automated Interpretation of EEGs: Integrating Temporal and Spectral Modeling Christian Ward, Dr. Iyad Obeid and Dr. Joseph Picone Neural Engineering Data.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
A Hierarchical Deep Temporal Model for Group Activity Recognition
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Machine Learning Supervised Learning Classification and Regression
Big data classification using neural network
Intrusion Detection using Deep Neural Networks
ECE 417 Lecture 1: Multimedia Signal Processing
Outline Problem Description Data Acquisition Method Overview
Environment Generation with GANs
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
ANN-based program for Tablet PC character recognition
ARTIFICIAL NEURAL NETWORKS
Recurrent Neural Networks for Natural Language Processing
Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.
Progress Report WANG XUN 2015/10/02.
Conditional Random Fields for ASR
Intro to NLP and Deep Learning
Basic machine learning background with Python scikit-learn
Policy Compression for MDPs
ASAP and Deep ASAP: End-to-End Audio Sentiment Analysis Pipelines
Speech Recognition Christian Schulze
Introduction to Deep Learning for neuronal data analyses
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Bird-species Recognition Using Convolutional Neural Network
A critical review of RNN for sequence learning Zachary C
A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,
A First Look at Music Composition using LSTM Recurrent Neural Networks
شبکه عصبی تنظیم: بهروز نصرالهی-فریده امدادی استاد محترم: سرکار خانم کریمی دانشگاه آزاد اسلامی واحد شهرری.
Introduction to Deep Learning with Keras
Ungraded quiz Unit 4.
network of simple neuron-like computing elements
[Figure taken from googleblog
Neural Networks Geoff Hulten.
Introduction to Radial Basis Function Networks
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Deep neural networks for spike sorting: exploring options
Automatic Handwriting Generation
The Updated experiment based on LSTM
LHC beam mode classification
Andrew Karl, Ph.D. James Wisnowski, Ph.D. Lambros Petropoulos
Ungraded quiz Unit 3.
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Speaker Classification through Deep Learning Jacob Morris Alex Douglass Luke Woodbury 1

Overview Goals Potential Applications Learn more about deep learning! Create a neural network that will classify voice recordings based on gender, age, natural language, etc. Potential Applications Research Security 2

Software Dependencies Python 2.7 Keras 1.2.2 Theano Matplotlib 3

Hardware GeForce TitanX (Pascal) 12gb memory 4 https://6lli539m39y3hpkelqsm3c2fg-wpengine.netdna-ssl.com/wp-content/uploads/2016/08/Natoli-CPUvGPU-peak-DP-600x.png 4

Speech Accent Archive WAV files Categorizations 2300+ different speakers All recorded speaking same paragraph Categorizations Age Gender English Residence Natural Language Country Learning Style Etc. 5

The Essence of Deep Learning http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/img/spiral.1-2.2-2-2-2-2-2.gif

Artificial Neural Networks (ANN) http://cs231n.github.io/assets/nn1/neural_net2.jpeg 7

Recurrent Networks Layer "remembers" data 8 http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png 8

LSTM Long Short Term Memory 9 http://deephash.com/2016/10/16/lstm-journey-tensorflow/ 9

Problem Type Sequence Classification Supervised Learning Assign classification label(s) to input sequences Supervised Learning Each training sample includes the correct output for that sample 10

Variations of Model Topologies Inputs Sequence of amplitudes Discrete Fourier transform of the segment Hidden Layers Variable Outputs Any subset of data categories 11

Training Challenges Process of Exploration Many parameters to tune Results vague, must be interpreted Days required to train a new model 12

Terminology Sample Batch Epoch Base unit of training data 1/100 of a second of audio Batch Group of samples 4 seconds of consecutive samples Epoch Number of batches required to train on entire training data set In our case, 2310 batches

Terminology Sample Batch Epoch Base unit of training data 1/100 of a second of audio Batch Group of samples 4 seconds of consecutive samples Epoch Number of batches required to train on entire training data set In our case, 2310 batches

Loss Measure of how close an output signal is to its expected value Categorical Cross Entropy Emphasizes correct answer

Learning Rate Determines how big of adjustments to make for given loss values

Accuracy Considered correct if the expected output neuron’s activation value is the greatest among all neurons for that category

Initial Attempts Features Short sample lengths WAV inputs only Trained on training set of only 2 speakers 18

Results 19

False Hope Features Changes Short sample lengths Trained on training set of only 2 speakers Changes Both input types 20

Results 21

Hope Features Changes Short sample lengths Both input types Trained on training set of only 2 speakers Changes Train on single batch per speaker per pass through training set Reduced learning rate 22

Results 23

Confirmation Features Changes Short sample lengths Both input types Train on single batch per speaker per pass through training set Changes Trained on full training set of 2300+ speakers 24

Results 25

Refinement Features Changes Short sample lengths Both input types Train on single batch per speaker per pass through training set Changes True Validation Decaying learning rate Epoch duration increased 26

Results 27

UI 28

Conclusion 29

Works Cited Weinberger, Steven. (2015). Speech Accent Archive. George Mason University. Retrieved from http://accent.gmu.edu 30