6.5.4 Back-Propagation Computation in Fully-Connected MLP.

Slides:

Advertisements

Similar presentations

Multi-Layer Perceptron (MLP)

Advertisements

Slides from: Doug Gray, David Poole

Lecture 11: Code Optimization CS 540 George Mason University.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Machine Learning Neural Networks

Lecture 14 – Neural Networks

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Radial Basis Functions

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.

Artificial Neural Networks

Biointelligence Laboratory, Seoul National University

Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.

Multiple-Layer Networks and Backpropagation Algorithms

Backpropagation An efficient way to compute the gradient Hung-yi Lee.

Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University

Classification / Regression Neural Networks 2

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Multi-Layer Perceptron

11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.

Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

Evolutionary Computation Evolving Neural Network Topologies.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Recursion Data Structure Submitted By:- Dheeraj Kataria.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Today’s Lecture Neural networks Training

Machine Learning Supervised Learning Classification and Regression

Lecture 7 Learned feedforward visual processing

Multiple-Layer Networks and Backpropagation Algorithms

TensorFlow– A system for large-scale machine learning

Fall 2004 Backpropagation CS478 - Machine Learning.

Deep Feedforward Networks

The Gradient Descent Algorithm

Computing Gradient Hung-yi Lee 李宏毅

Intro to NLP and Deep Learning

Intro to NLP and Deep Learning

Derivation of a Learning Rule for Perceptrons

Introduction to CuDNN (CUDA Deep Neural Nets)

Neural Networks and Backpropagation

Classification / Regression Neural Networks 2

Prof. Carolina Ruiz Department of Computer Science

INF 5860 Machine learning for image classification

Artificial neural networks – Lecture 4

Goodfellow: Chap 6 Deep Feedforward Networks

An open-source software library for Machine Intelligence

Artificial Neural Network & Backpropagation Algorithm

of the Artificial Neural Networks.

Neural Network - 2 Mayank Vatsa

Word Embedding Word2Vec.

Convolutional networks

Capabilities of Threshold Neurons

Backpropagation.

Chapter 4 Gates and Circuits.

LECTURE 15: REESTIMATION, EM AND MIXTURES

A Dynamic System Analysis of Simultaneous Recurrent Neural Network

Artificial Intelligence 10. Neural Networks

3. Feedforward Nets (1) Architecture

Backpropagation.

TensorFlow: A System for Large-Scale Machine Learning

Deep Learning Libraries

Image recognition.

November 1, 2010 Dr. Itamar Arel College of Engineering

Artificial Neural Networks / Spring 2002

Derivatives and Gradients

Prof. Carolina Ruiz Department of Computer Science

Patterson: Chap 1 A Review of Machine Learning

ONNX Training Discussion

Presentation transcript:

6.5.4 Back-Propagation Computation in Fully-Connected MLP

Algorithm 6.3 Forward propagation through a typical deep neural network and the computation of the cost function.

6.5.4 Back-Propagation Computation in Fully-Connected MLP Algorithm 6.4 Backward computation for the deep neural network of algorithm 6.3, which uses in addition to the input x a target y.

6.5.5 Symbol-to-Symbol Derivatives Symbols : variables that do not have specific values Symbolic representations : algebraic and graph-based representations Symbol-to-number di ff erentiation Symbol-to-Symbol di ff erentiation

6.5.6 General Back-Propagation

dynamic programming : Each node in the graph has a corresponding slot in a table to store the gradient for that node. By ﬁlling in these table entries in order, back-propagation avoids repeating many common subexpressions General Back-Propagation

6.5.7 Example: Back-Propagation for MLP Training Single hidden layer matrix to a vector

6.5.7 Example: Back-Propagation for MLP Training

6.5.8 Complications Most software implementations need to support operations that can return more than one tensor. Not described how to control the memory consumption of backpropagation. Real-world implementations of back-propagation also need to handle various data types. Some operations have undefined gradients. Various other technicalities make real-world di ﬀ erentiation more complicated.

6.5.9 Di ﬀ erentiation outside the Deep Learning Community When the forward graph has a single output node and each partial derivative can be computed with a constant amount of computation, back-propagation guarantees that the number of computations for the gradient computation is of the same order as the number of computations for the forward computation. In general, finding the optimal sequence of operations to compute the gradient is NP-complete, in the sense that it may require simplifying algebraic expressions into their least expensive form.

6.5.9 Di ﬀ erentiation outside the Deep Learning Community The overall computation is However, it can potentially be reduced by simplifying the computational graph constructed by back-propagation, and this is an NP-complete task. back-propagation can be extended to compute a Jacobian. It is a special case of reverse mode accumulation.

6.5.9 Di ff erentiation outside the Deep Learning Community Forward mode accumulation. When the number of outputs of the graph is larger than the number of inputs. avoids the need to store the values and gradients for the whole graph. trading o ff computational e ffi ciency for memory.

6.5.9 Di ﬀ erentiation outside the Deep Learning Community Forward mode accumulation. For example, if D is a column vector while A has many rows => single output and many inputs cheaper to run the multiplications right-to-left However, if A has fewer rows than D has columns, =>cheaper to run the multiplications left-to-right, corresponding to the forward mode.

6.5.9 Di ﬀ erentiation outside the Deep Learning Community Back-propagation is therefore not the only way or the optimal way of computing the gradient, but it is a very practical method that continues to serve the deeplearning community very well.

Higher-Order Derivatives Some software frameworks support the use of higher-order derivatives. Theano and TensorFlow it is rare to compute a single second derivative of a scalar function we are usually interested in properties of the Hessian matrix. then the Hessian matrix is of size n ×n the typical deep learning approach is to use Krylov methods

Higher-Order Derivatives Krylov methods Krylov methods are a set of iterative techniques for performing various operations like approximately inverting a matrix or finding approximations to its eigenvectors or eigenvalues, without using any operation other than matrix-vector products. Krylov methods on the Hessian, Hessian matrix H and an arbitrary vector. One simply computes