LECTURE 35: Introduction to EEG Processing

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Understanding Convolutional Neural Networks for Object Recognition
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Neural networks and support vector machines
Big data classification using neural network
Back Propagation and Representation in PDP Networks
Learning to Compare Image Patches via Convolutional Neural Networks
Neural Network Architecture Session 2
Convolutional Neural Network
Deep Feedforward Networks
The Relationship between Deep Learning and Brain Function
Artificial Neural Networks
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Compact Bilinear Pooling
Deep Reinforcement Learning
LECTURE 11: Advanced Discriminant Analysis
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
Artificial Intelligence (CS 370D)
COMP24111: Machine Learning and Optimisation
Principal Solutions AWS Deep Learning
LECTURE ??: DEEP LEARNING
Intelligent Information System Lab
Supervised Training of Deep Networks
Convolutional Networks
Shunyuan Zhang Nikhil Malik
CS6890 Deep Learning Weizhen Cai
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
ECE 471/571 - Lecture 17 Back Propagation.
Introduction to Neural Networks
Goodfellow: Chap 6 Deep Feedforward Networks
Presented by Xinxin Zuo 10/20/2017
Deep learning Introduction Classes of Deep Learning Networks
Neuro-Computing Lecture 4 Radial Basis Function Network
Object Classification through Deconvolutional Neural Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Artificial Neural Networks
CSC 578 Neural Networks and Deep Learning
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Neural Networks Geoff Hulten.
LECTURE 33: Alternative OPTIMIZERS
Analysis of Trained CNN (Receptive Field & Weights of Network)
Convolutional Neural Networks
实习生汇报 ——北邮 张安迪.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC 578 Neural Networks and Deep Learning
Department of Computer Science Ben-Gurion University of the Negev
Introduction to Neural Networks
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Image recognition.
CSC 578 Neural Networks and Deep Learning
First-Order Methods.
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

LECTURE 35: Introduction to EEG Processing Objectives: Why Big Data? Fundamentals of an EEG Signal Machine Learning Paradigms Convolutional Neural Networks Long Short-term Memory Networks Performance Analysis Resources: NEDC: Data Wrangling VS: Deep Learning

The Visual System: Inspiration for CNNs The visual system contains a complex arrangement of cells Each cell is responsible for only a sub-region of the visual field, or receptive field The arrangement of these sub-regions is such, that the entire visual field is covered Convolutional Neural Networks (CNNs) were proposed to emulate the animal visual cortex, which exploits the spatially local correlations present in natural images. Before reaching the primary visual cortex, fibers on the optic nerve make a synapse in the lateral geniculate nucleus (LGN), cells from the fovea (in eye) project to layers composed parvocellular layers. These take care of the fine details that necessary to determine what an object is. Ganglion cells from the peripheral retina project to the Magnocellular (M) layers, which help determine where an object is.

CNNs Connectivity To exploit the spatially local correlations, the neurons in a layer receive inputs only from a subset of units in the previous layer (spatially contiguous visual field). The units (neurons) are unresponsive to changes outside of their receptive fields Higher layers, become more global Receptive Field = 3 These units have a receptive field of 3, therefore, they are only connected to 3 contiguous units in the previous layer

CNNs Convolutional Layer The convolutional layer is comprised of several “filters” that search for different patterns in the entire input Filter 2 A feature map can be generated with the information from the learned filters as follows: 𝒉 𝒊𝒋 𝒌 =𝒕𝒂𝒉𝒏 (𝑾 𝒌 ⋆𝒙 𝒊𝒋 + 𝒃 𝒌 ) Where 𝒉 𝒌 represents the 𝒌 𝒕𝒉 feature map in a hidden layer. Note that the weight and bias parameters are shared within the same filter Gradient descent is commonly used for the training of CNNs, but the gradient of the shared weights is given by the sum of the shared parameters Parameter sharing allows the search of the same pattern in the entire visual field Each hidden layer is formed of several feature maps

CNNs Convolutional Layer The figure contains two different CNN layers. Layer 𝒎−𝟏 contains four feature maps, while layer 𝒎 contains 2 ( 𝒉 𝟎 and 𝒉 𝟏 ). The blue and red squares in 𝒎 are computed from pixels of layer 𝒎−𝟏 that fall within their 2x2 receptive field (squares in 𝒎−𝟏). 𝑾 𝒊𝒋 𝒌𝒍 then denotes the weight connecting each pixel of the 𝒌 𝒕𝒉 feature map at layer 𝒎 with the pixel at coordinates (𝒊,𝒋) of the 𝒍 𝒕𝒉 feature map at layer 𝒎−𝟏. It is important to know that the frame length of the filter for image recognition, called stride, is usually set to 1 or 2. To control the spatial size of the output, zero-padding around the borders is commonly performed

CNNs ReLU Layer An activation layer is added after one or more convolutional layers. Typically, for the image recognition tasks, a Rectified Linear Unit activation function (ReLU) is used. This function is given by 𝒇 𝒙 =𝒎𝒂𝒙(𝟎,𝒙) 𝒙=𝟎 Using this activation function increases the non-linear properties of the decision function without affecting the receptive fields of the convolutional layer.

CNNs Pooling Layer Another typical layer in a CNN is a pooling layer Pooling layers reduce the resolution through a local maximum, which also reduces the amount of computations and parameters in the network The pooling layer needs two hyperparameters: 𝑭: Spatial extent (size) 𝑺: Stride (frame length) Common parameters used in literature are 𝑭=𝟐×𝟐 𝑺=𝟐 The most common pooling operation is maxPooling, which partitions the input into a set of non-overlapping section and, for each sub-region outputs the max value. Pooling helps to make the representation become approximately invariant to small translations in the input.

CNNs Fully Connected Layer If classification is being performed, a fully-connected layer is added This layer corresponds to a traditional Multilinear Perceptron (MLP) As the name indicates it, the neurons in the fully connected layer have full connections to all activations in the previous layers Adding this layer allows the classification of the input described by the feature maps extracted by the previous layers This layer works in the same way as an MLP and activation functions used commonly include the sigmoid function and the tahn function

CNN: All Together Summarizing the layers shown so far, a CNN is depicted: Convolutional Layer ReLu Layer Pooling Layer Fully Connected Layer

DRNN Training 𝜃 𝑗+1 =𝜃 𝑗 − 𝜂 0 1− 𝑗 𝑇 𝛻 θ 𝑗 𝛻 θ 𝑗 Use stochastic gradient decent for optimization: 𝜃 𝑗+1 =𝜃 𝑗 − 𝜂 0 1− 𝑗 𝑇 𝛻 θ 𝑗 𝛻 θ 𝑗 𝜃 𝑗 : the set of all trainable parameters after j updates 𝛻 θ 𝑗 : the gradient of a cost function with respect to this parameter set, as computed on a randomly sampled part of the training set. T: the number of batches 𝜂 0 : the learning rate is set at an initial value which decreases linearly with each subsequent parameter update. Incremental layer-wise method: train the full network with BPTT and linearly reduce the learning rate to zero before a new layer is added. After adding a new layer the previous output weights will be discarded, and new output weights are initialized connecting from the new top layer. For DRNN-AO, we test the influence of each layer by setting it to zero, assuring that model is efficiently trained. http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

Gradient Descent Algorithms For Optimization http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

Gradient Descent Algorithms For Optimization SGD: Stochastic Gradient Descent http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientD escent/ RMSprop: Root Mean Square Propagation http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) Adagrad: Adaptive Gradient Algorithm http://jmlr.org/papers/v12/duchi11a.html Adadelta: an extension of Adagrad that seeks to reduce its aggressive, monotonically decreasing learning rate https://arxiv.org/abs/1212.5701 Adam: Adaptive Moment Estimation – keeps separate learning rates for each weight as well as an exponentially decaying average of previous gradients.  http://arxiv.org/abs/1412.6980v8 Adamax: A variant of Adam that scales the gradient inversely proportionally to the ℓ2 norm of the past gradients https://arxiv.org/abs/1412.6980v8 Nadam: Nesterov-accelerated Adaptive Moment Estimation http://cs229.stanford.edu/proj2015/054_report.pdf http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

Why different optimization algorithms? The main difference is actually how they treat the learning rate. Stochastic Gradient Descent: Theta (weights) is getting changed according to the gradient of the loss with respect to theta. alpha is the learning rate. If alpha is very small, convergence will be very slow. On the other hand, large alpha will lead to divergence. http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

Why different optimization algorithms? Due to the diversity of each training example, the gradient of the loss (L) changes quickly after each iteration. We are taking small steps but they are quite zig-zag (even though we slowly reach to a loss minima). To overcome this, we introduce momentum. Basically taking knowledge from previous steps about where we should be heading. We are introducing a new hyperparameter: http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

Adaptive Moment Estimation (Adam) Adam is another method that computes adaptive learning rates for each parameter. Like Adadelta and RMSprop, but in addition to storing an exponentially decaying average of past squared gradients 𝐠 𝐭 , Adam also keeps an exponentially decaying average of past gradients 𝐦 𝐭 , similar to momentum: http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

Generative Adversarial Networks http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

A Taxonomy of Architectures http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf

Summary Convolutional Neural Networks (CNN) attempt to exploit local correlations (e.g., spatial and temporal context). Many deep learning systems that process physical signals use CNNs for the first layer. Optimization algorithms play an important role in allowing deep learning systems to converge. Stochastic gradient descent and specifically Adam are popular approaches that are widely used. Alternate training methodologies are emerging that combine generative and discriminative training (what used to be called analysis by synthesis methods). Generative Adversarial Networks (GAN) are emerging as one such powerful approach. http://www.iro.umontreal.ca/~bengioy/papers/ftml.pdf