6.5.4 Back-Propagation Computation in Fully-Connected MLP.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Slides from: Doug Gray, David Poole
Lecture 11: Code Optimization CS 540 George Mason University.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Functions
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Multi-Layer Perceptron
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Evolutionary Computation Evolving Neural Network Topologies.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Recursion Data Structure Submitted By:- Dheeraj Kataria.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Today’s Lecture Neural networks Training
Machine Learning Supervised Learning Classification and Regression
Lecture 7 Learned feedforward visual processing
Multiple-Layer Networks and Backpropagation Algorithms
TensorFlow– A system for large-scale machine learning
Fall 2004 Backpropagation CS478 - Machine Learning.
Deep Feedforward Networks
The Gradient Descent Algorithm
Computing Gradient Hung-yi Lee 李宏毅
Intro to NLP and Deep Learning
Intro to NLP and Deep Learning
Derivation of a Learning Rule for Perceptrons
Introduction to CuDNN (CUDA Deep Neural Nets)
Neural Networks and Backpropagation
Classification / Regression Neural Networks 2
Prof. Carolina Ruiz Department of Computer Science
INF 5860 Machine learning for image classification
Artificial neural networks – Lecture 4
Goodfellow: Chap 6 Deep Feedforward Networks
An open-source software library for Machine Intelligence
Artificial Neural Network & Backpropagation Algorithm
of the Artificial Neural Networks.
Neural Network - 2 Mayank Vatsa
Word Embedding Word2Vec.
Convolutional networks
Capabilities of Threshold Neurons
Backpropagation.
Chapter 4 Gates and Circuits.
LECTURE 15: REESTIMATION, EM AND MIXTURES
A Dynamic System Analysis of Simultaneous Recurrent Neural Network
Artificial Intelligence 10. Neural Networks
3. Feedforward Nets (1) Architecture
Backpropagation.
TensorFlow: A System for Large-Scale Machine Learning
Deep Learning Libraries
Image recognition.
November 1, 2010 Dr. Itamar Arel College of Engineering
Artificial Neural Networks / Spring 2002
Derivatives and Gradients
Prof. Carolina Ruiz Department of Computer Science
Patterson: Chap 1 A Review of Machine Learning
ONNX Training Discussion
Presentation transcript:

6.5.4 Back-Propagation Computation in Fully-Connected MLP

Algorithm 6.3 Forward propagation through a typical deep neural network and the computation of the cost function.

6.5.4 Back-Propagation Computation in Fully-Connected MLP Algorithm 6.4 Backward computation for the deep neural network of algorithm 6.3, which uses in addition to the input x a target y.

6.5.5 Symbol-to-Symbol Derivatives Symbols : variables that do not have specific values Symbolic representations : algebraic and graph-based representations Symbol-to-number di ff erentiation Symbol-to-Symbol di ff erentiation

6.5.6 General Back-Propagation

dynamic programming : Each node in the graph has a corresponding slot in a table to store the gradient for that node. By filling in these table entries in order, back-propagation avoids repeating many common subexpressions General Back-Propagation

6.5.7 Example: Back-Propagation for MLP Training Single hidden layer matrix to a vector

6.5.7 Example: Back-Propagation for MLP Training

6.5.8 Complications Most software implementations need to support operations that can return more than one tensor. Not described how to control the memory consumption of back- propagation. Real-world implementations of back-propagation also need to handle various data types. Some operations have undefined gradients. Various other technicalities make real-world di ff erentiation more complicated.

6.5.9 Di ff erentiation outside the Deep Learning Community When the forward graph has a single output node and each partial derivative can be computed with a constant amount of computation, back-propagation guarantees that the number of computations for the gradient computation is of the same order as the number of computations for the forward computation. In general, finding the optimal sequence of operations to compute the gradient is NP-complete, in the sense that it may require simplifying algebraic expressions into their least expensive form.

6.5.9 Di ff erentiation outside the Deep Learning Community The overall computation is However, it can potentially be reduced by simplifying the computational graph constructed by back-propagation, and this is an NP-complete task. back-propagation can be extended to compute a Jacobian. It is a special case of reverse mode accumulation.

6.5.9 Di ff erentiation outside the Deep Learning Community Forward mode accumulation. When the number of outputs of the graph is larger than the number of inputs. avoids the need to store the values and gradients for the whole graph. trading o ff computational e ffi ciency for memory.

6.5.9 Di ff erentiation outside the Deep Learning Community Forward mode accumulation. For example, if D is a column vector while A has many rows => single output and many inputs cheaper to run the multiplications right-to-left However, if A has fewer rows than D has columns, =>cheaper to run the multiplications left-to-right, corresponding to the forward mode.

6.5.9 Di ff erentiation outside the Deep Learning Community Back-propagation is therefore not the only way or the optimal way of computing the gradient, but it is a very practical method that continues to serve the deeplearning community very well.

Higher-Order Derivatives Some software frameworks support the use of higher-order derivatives. Theano and TensorFlow it is rare to compute a single second derivative of a scalar function we are usually interested in properties of the Hessian matrix. then the Hessian matrix is of size n ×n the typical deep learning approach is to use Krylov methods

Higher-Order Derivatives Krylov methods Krylov methods are a set of iterative techniques for performing various operations like approximately inverting a matrix or finding approximations to its eigenvectors or eigenvalues, without using any operation other than matrix-vector products. Krylov methods on the Hessian, Hessian matrix H and an arbitrary vector. One simply computes