Presentation is loading. Please wait.

Presentation is loading. Please wait.

Suwicha Jirayucharoensak , Setha Pan-Ngum and Pasin Israsena

Similar presentations


Presentation on theme: "Suwicha Jirayucharoensak , Setha Pan-Ngum and Pasin Israsena"— Presentation transcript:

1 Suwicha Jirayucharoensak , Setha Pan-Ngum and Pasin Israsena
EEG-Based Emotion Recognition Using Deep Learning Network with Principle Component Based Covariate Shift Adaptation Suwicha Jirayucharoensak , Setha Pan-Ngum and Pasin Israsena Presenter: Dror Haor

2 Sparse auto-encoders Neural network to reconstruct the input
Seems easy , no ??? Well, yes, but… The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data

3 Sparse auto-encoders Neural network to reconstruct the input
Seems easy , no ??? Well, yes, but… Constraining the network yields interesting results The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data

4 Sparse auto-encoders Suppose we have a set of unlabeled training examples We want to train the network to output an estimation of the input 𝑥 1 , 𝑥 2 , 𝑥 3 ,… So far, we have described the application of neural networks to supervised learning, in which we have labeled training examples. Autoencoder is a general case of restricted bolzman machines (RBM) Autoencoders are used for feature extraction Autoencoders are non-supervised – Advantage! Show PCA vs. autoencoder 𝑥 = ℎ 𝑊,𝑏 ≈𝑥

5 Auto-encoders – examples
Web images search The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data

6 Auto-encoders – examples
Anomaly detection Train on true images and find outliers by error. Error

7 Auto-encoders – examples
Document visualization (200 words to 2 features) The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data Hinton and salakhufdiunov, Science, 2006

8 Auto-encoders – examples
Document visualization (200 words to 2 features) The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data Hinton and salakhufdiunov, Science, 2006

9 Auto-encoders – how it works?
train a sparse autoencoder on the raw inputs to find primary features 𝑥 𝑘 ℎ 𝑘 (1) Say that this is a greedy algorithm for training a stacked autoencoder.

10 Auto-encoders – how it works?
2. use primary features as the "raw input" to another sparse autoencoder to learn secondary features (next layer is smaller than the current layer) 2a. Repeat for N layers (if needed)

11 Auto-encoders – how it works?
3. Input last layer features into a softmax classifier

12 Auto-encoders – how it works?
Combine all layers

13 Emotion Recognition Talk about the motivation

14 Emotion Recognition BCI
Brain Computer Interface - BCI Better communication with computers for disable people. Detect patients' emotional responses to specific inexpressive faces.

15 Emotion Recognition Explain the motivation
recognize the emotions of children affected by autism to develop a system that works as a therapist Emotions were elicited by images and classified in three categories, namely: pleasant, unpleasant and neutral. Brain signals are a reliable information source due to the fact that the process of emotion interpretation starts in the central nervous system. Furthermore, an individual cannot control his brain signals to simulate a fake emotional state.

16 EEG - Raw data Time-Space-Frequency patterns Non-stationary
Signals measured during specific task

17 Features extraction Preprocessing Power spectral density (PSD)
PSD divided into 5 frequency bands for 32 electrodes PSD difference between 14 symmetrical electrodes Total of 230 input to the network Normalization Baseline subtraction Re-scaling into [0.1,0.9] range This normalization process is required since the DLN uses sigmoid as the activation function in the output layer. Some of the features below −2∗SD and above +2∗SD were truncated into 0.1 and 0.9, respectively

18 Data labeling 31 subjects DEAP 40 one-minute movies (per subject)
After each movie – self assessment of arousal/valence level Total of 31 subject X 40 min movie X 60 PSDs/min = training sets

19 Data labeling

20 Data labeling

21 Network structure Sparce-autoencoder Three stacked hidden layers
Unsupervised Sparce-autoencoder Three stacked hidden layers Two softmax classifiers Fine-tuning (back-prop.) Fine-tuning is back-prop. In this stage, they use each softmax stage + the autoencoder layers as one network and perform back-prop for fine-tuning of the softmax classifier. In the end, they return to the original auto-encoders weights with the fine-tuned softmax weights. Supervised

22 Autoencoder cost function
2 The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting.1 The weight decay parameter λ controls the relative importance of the two terms.

23 Autoencoder cost function
2 𝐾𝐿(𝜌|| 𝜌 𝑗 )=𝜌log 𝜌 𝜌 𝑗 +(1−𝜌)log 1−𝜌 1− 𝜌 𝑗 Kullback-Leibler (KL) divergence. Standard function for measuring difference between two Bernoulli distributions. If 𝜌= 𝜌 𝑗 KL=0 Say it’s a penalty function 𝜌 is the desired sparsity parameter (in this article – selected to be 0.1) , ̂𝜌 𝑗 is probability of firing activity

24 Autoencoder cost function
2 𝜆 is called the weight decay parameter Used to limit the network weights. 𝜆 is weight decay parameter ( selected to be 3e-3 in this paper), where beta was selected to b 3.

25 PCA based covariate shift adaptation
n data trials p features PCA (p to m dimensions) Average over w trials To auto-encoder

26 Results (4 DLN methods)

27 Results (4 DLN methods) Say that the PCA help the neural network learn linear correlations between the inputs and Is essentially another hidden layer.

28 Results (DLN Vs. SVM)

29 Conclusions Features reduction Unsupervised pre-training
Problem of inter-subject variation

30 Thank you! Conclusions Features reduction Unsupervised pre-training
Problem of inter-subject variation Thank you!


Download ppt "Suwicha Jirayucharoensak , Setha Pan-Ngum and Pasin Israsena"

Similar presentations


Ads by Google