Download presentation
Presentation is loading. Please wait.
Published byBlanche Greene Modified over 7 years ago
1
Suwicha Jirayucharoensak , Setha Pan-Ngum and Pasin Israsena
EEG-Based Emotion Recognition Using Deep Learning Network with Principle Component Based Covariate Shift Adaptation Suwicha Jirayucharoensak , Setha Pan-Ngum and Pasin Israsena Presenter: Dror Haor
2
Sparse auto-encoders Neural network to reconstruct the input
Seems easy , no ??? Well, yes, but… The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data
3
Sparse auto-encoders Neural network to reconstruct the input
Seems easy , no ??? Well, yes, but… Constraining the network yields interesting results The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data
4
Sparse auto-encoders Suppose we have a set of unlabeled training examples We want to train the network to output an estimation of the input 𝑥 1 , 𝑥 2 , 𝑥 3 ,… So far, we have described the application of neural networks to supervised learning, in which we have labeled training examples. Autoencoder is a general case of restricted bolzman machines (RBM) Autoencoders are used for feature extraction Autoencoders are non-supervised – Advantage! Show PCA vs. autoencoder 𝑥 = ℎ 𝑊,𝑏 ≈𝑥
5
Auto-encoders – examples
Web images search The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data
6
Auto-encoders – examples
Anomaly detection Train on true images and find outliers by error. Error
7
Auto-encoders – examples
Document visualization (200 words to 2 features) The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data Hinton and salakhufdiunov, Science, 2006
8
Auto-encoders – examples
Document visualization (200 words to 2 features) The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data Hinton and salakhufdiunov, Science, 2006
9
Auto-encoders – how it works?
train a sparse autoencoder on the raw inputs to find primary features 𝑥 𝑘 ℎ 𝑘 (1) Say that this is a greedy algorithm for training a stacked autoencoder.
10
Auto-encoders – how it works?
2. use primary features as the "raw input" to another sparse autoencoder to learn secondary features (next layer is smaller than the current layer) 2a. Repeat for N layers (if needed)
11
Auto-encoders – how it works?
3. Input last layer features into a softmax classifier
12
Auto-encoders – how it works?
Combine all layers
13
Emotion Recognition Talk about the motivation
14
Emotion Recognition BCI
Brain Computer Interface - BCI Better communication with computers for disable people. Detect patients' emotional responses to specific inexpressive faces.
15
Emotion Recognition Explain the motivation
recognize the emotions of children affected by autism to develop a system that works as a therapist Emotions were elicited by images and classified in three categories, namely: pleasant, unpleasant and neutral. Brain signals are a reliable information source due to the fact that the process of emotion interpretation starts in the central nervous system. Furthermore, an individual cannot control his brain signals to simulate a fake emotional state.
16
EEG - Raw data Time-Space-Frequency patterns Non-stationary
Signals measured during specific task
17
Features extraction Preprocessing Power spectral density (PSD)
PSD divided into 5 frequency bands for 32 electrodes PSD difference between 14 symmetrical electrodes Total of 230 input to the network Normalization Baseline subtraction Re-scaling into [0.1,0.9] range This normalization process is required since the DLN uses sigmoid as the activation function in the output layer. Some of the features below −2∗SD and above +2∗SD were truncated into 0.1 and 0.9, respectively
18
Data labeling 31 subjects DEAP 40 one-minute movies (per subject)
After each movie – self assessment of arousal/valence level Total of 31 subject X 40 min movie X 60 PSDs/min = training sets
19
Data labeling
20
Data labeling
21
Network structure Sparce-autoencoder Three stacked hidden layers
Unsupervised Sparce-autoencoder Three stacked hidden layers Two softmax classifiers Fine-tuning (back-prop.) Fine-tuning is back-prop. In this stage, they use each softmax stage + the autoencoder layers as one network and perform back-prop for fine-tuning of the softmax classifier. In the end, they return to the original auto-encoders weights with the fine-tuned softmax weights. Supervised
22
Autoencoder cost function
2 The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting.1 The weight decay parameter λ controls the relative importance of the two terms.
23
Autoencoder cost function
2 𝐾𝐿(𝜌|| 𝜌 𝑗 )=𝜌log 𝜌 𝜌 𝑗 +(1−𝜌)log 1−𝜌 1− 𝜌 𝑗 Kullback-Leibler (KL) divergence. Standard function for measuring difference between two Bernoulli distributions. If 𝜌= 𝜌 𝑗 KL=0 Say it’s a penalty function 𝜌 is the desired sparsity parameter (in this article – selected to be 0.1) , ̂𝜌 𝑗 is probability of firing activity
24
Autoencoder cost function
2 𝜆 is called the weight decay parameter Used to limit the network weights. 𝜆 is weight decay parameter ( selected to be 3e-3 in this paper), where beta was selected to b 3.
25
PCA based covariate shift adaptation
n data trials p features PCA (p to m dimensions) Average over w trials To auto-encoder
26
Results (4 DLN methods)
27
Results (4 DLN methods) Say that the PCA help the neural network learn linear correlations between the inputs and Is essentially another hidden layer.
28
Results (DLN Vs. SVM)
29
Conclusions Features reduction Unsupervised pre-training
Problem of inter-subject variation
30
Thank you! Conclusions Features reduction Unsupervised pre-training
Problem of inter-subject variation Thank you!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.