Department of Electrical and Computer Engineering

Slides:



Advertisements
Similar presentations
Greedy Layer-Wise Training of Deep Networks
Advertisements

Deep Learning Bing-Chen Tsai 1/21.
CS590M 2008 Fall: Paper Presentation
Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.
Deep Learning.
Structure learning with deep neuronal networks 6 th Network Modeling Workshop, 6/6/2013 Patrick Michl.
Lecture 14 – Neural Networks
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
How to do backpropagation in a brain
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y )Prof. K S Venkatesh.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
How to do backpropagation in a brain
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 18 Learning Boltzmann Machines Geoffrey Hinton.
Cognitive models for emotion recognition: Big Data and Deep Learning
Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]
CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Xintao Wu University of Arkansas Introduction to Deep Learning 1.
CSC2535: Lecture 4: Autoencoders, Free energy, and Minimum Description Length Geoffrey Hinton.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Neural networks and support vector machines
Big data classification using neural network
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Learning Deep Generative Models by Ruslan Salakhutdinov
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
Deep Feedforward Networks
Deep Learning Amin Sobhani.
an introduction to: Deep Learning
Learning with Perceptrons and Neural Networks
CSC321: Neural Networks Lecture 22 Learning features one layer at a time Geoffrey Hinton.
ECE 5424: Introduction to Machine Learning
Matt Gormley Lecture 16 October 24, 2016
Multimodal Learning with Deep Boltzmann Machines
ICS 491 Big Data Analytics Fall 2017 Deep Learning
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Intelligent Information System Lab
Neural networks (3) Regularization Autoencoder
Machine Learning Basics
Deep Learning Qing LU, Siyuan CAO.
Structure learning with deep autoencoders
Neural Networks and Backpropagation
Grid Long Short-Term Memory
Image Captions With Deep Learning Yulia Kogan & Ron Shiff
Recurrent Neural Networks
ECE 599/692 – Deep Learning Lecture 9 – Autoencoder (AE)
Neural Networks Geoff Hulten.
Artificial Intelligence Chapter 3 Neural Networks
Representation Learning with Deep Auto-Encoder
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (1) Traditional multi-layer perceptrons
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
实习生汇报 ——北邮 张安迪.
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (3) Regularization Autoencoder
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Department of Electrical and Computer Engineering CNT 6805 Network Science and Applications Lecture 2 Unsupervised Deep learning Dr. Dapeng Oliver Wu University of Florida Department of Electrical and Computer Engineering Fall 2016

Outline Introduction to Machine Learning Chronological Development of ideas Problems with Neural Networks What exactly is different in Deep Learning Energy Based Models and Training Applications to real world problems Scalability Issues

Learning to Learn Face Recognition- Object Recognition- Weather prediction- ML can be broadly classified into 3 major categories of problems Clustering, Regression and Classification

Chronological Development G0- Blind Guess G1- Linear Methods ( PCA, LDA, LR) What if relationship is nonlinear- G2- Neural Networks Uses multiple non-linear elements to approximate G3- Kernel Machines Linear Computations in infinite dim-space without ‘actually’ learning a mapping

Neural network Non-linear transformation at summing nodes in hidden and outer layers. -e.g.- Sigmoid- Output- estimates of posterior probability

Back-Propagation If S is a logistic function, then S’(x) = S(x)(1 – S(x))

Challenges with multi-layers NN Get stuck in local minima or plateaus due to random initialization. Vanishing gradient- Effect becomes smaller and smaller in lower layers Excellent training but poor in testing- A classic case of overfitting

Why Vanishing Gradient? Both sigmoid and its derivative < 1 Gradient calculated to train each layer : Lower layers remain undertrained.

Deep Learning – Early Phase Unsupervised pre-training followed by traditional supervised backpropagation Let the data speak for itself Try to derive the inherent features of input Why it clicks? Pre-training helps create a data-dependent prior and hence better regularization Gives a set of W’s that is better to start with Lower layers are better optimized and hence vanishing gradients do not affect much

Restricted Boltzmann Machine-I x-Visible (input) h-Hidden (latent) Energy given by Joint Probability where Z is the partition function given by Target is to maximize P(x), (or its log-likelihood) P(h|x) & P(x|h) factorizable For the binary case {0,1}, again sigmoid function arises as

Restricted Boltzmann Machine-II Gradient of log-likelihood looks like Where is called Free Energy If we average it over training set Q, the RHS looks like So, gradient= - training + model = - Observable + reconstruction

Sampling Approximations Generally Intractable. But approximations lead to a simpler sampling problem Update equation now looks like

Cont’d Now, we take the partial derivatives of w.r.t. to the parameter vector So, an unbiased update rule for weights looks like Usually once sufficient

Deep belief Network Conditional distributions for Layers 0,1….l-1 and joint for Each layer initialized as an RBM Training is done layer-by-layer greedily in a sequential order. It is then fed to a conventional Neural network

Deep Autoencoders Codes itself and then again reconstructs the output. Can be stacked to form DBNs Training procedure is similar layer-by-layer Except, in final step, may be supervised or unsupervised( just like backprop) Denoising AE Contractive AE Regularized AE

Dimensionality Reduction Original DBN LogisticPCA Just PCA

What does it learn? Higher layers- birdview - Invariant Features Denoising AE - Stacked RBMs (DBN)

Computational Considerations Part 1- unsupervised pretraining Matrix Multiplications Weight Update sequential (just like adaptive systems/filters) But can be parallelized over nodes/ dimensions Tricks- use minibatches- Update the weight only once per many epochs by taking average

Unsup Pre-Training: Rarely Used Now But with large number of labeled training examples, lower layers will eventually change Recent architectures prefer weight initialization like Glorot et al (2011) Gaussian distribution with Srivastava, Hinton, et al. (2014) proposes a dropout method to mitigate overfitting. He et al. (2015) derive optimal weight initialization for ReLU/ PReLU activations. ReLU: Rectified Linear Unit PReLU: Parametric Rectified Linear Unit

Dropout Neural Net Model (1) Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research. 2014 Jan 1;15(1):1929-58.

Dropout Neural Net Model (2)

Dropout Neural Net Model (3)

Example: Handwritten Digits Recognition https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py

Recurrent Neural Networks (RNN) Deep Learning for time series data Uses memory to process input sequences Output y can be any supervised target or even the future samples of x, as in prediction

Vanishing/Exploding Gradient- Both Temporally and Spatially Multi-layered RNNs have their lower layers undertrained Information from previous input are not properly carried- chained gradients ALSO, cannot handle long range dependency

Why, again? We cannot relate inputs from the distant past to the target output.

Long Short Term Memory Error signals trapped within a memory cell cannot change. Gates have to learn which error to trap and which ones to forget

Conclusion Practical breakthrough, Companies happy but theoreticians unconvinced. Deep Learning architectures have won many competitions in recent past. Plans to put concept to build artificial brain for big data