Backpropagation David Kauchak CS159 – Fall 2019.

Slides:



Advertisements
Similar presentations
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
Advertisements

Regularization David Kauchak CS 451 – Fall 2013.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Artificial Neural Networks
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Lecture 4 Neural Networks ICS 273A UC Irvine Instructor: Max Welling Read chapter 4.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks
Appendix B: An Example of Back-propagation algorithm
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
ADVANCED PERCEPTRON LEARNING David Kauchak CS 451 – Fall 2013.
Multi-Layer Perceptron
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks 2nd Edition Simon Haykin
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Fall 2004 Backpropagation CS478 - Machine Learning.
Gradient descent David Kauchak CS 158 – Fall 2016.
Deep Feedforward Networks
Artificial Neural Networks
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
Derivation of a Learning Rule for Perceptrons
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks CS 446 Machine Learning.
Classification with Perceptrons Reading:
LECTURE 28: NEURAL NETWORKS
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CSSE463: Image Recognition Day 17
CSE P573 Applications of Artificial Intelligence Neural Networks
CS621: Artificial Intelligence
CSC 578 Neural Networks and Deep Learning
Lecture 11. MLP (III): Back-Propagation
Linear Classifier by Dr
CSC 578 Neural Networks and Deep Learning
Artificial Intelligence 13. Multi-Layer ANNs
cs540 - Fall 2016 (Shavlik©), Lecture 18, Week 10
of the Artificial Neural Networks.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Capabilities of Threshold Neurons
Backpropagation.
LECTURE 28: NEURAL NETWORKS
CSSE463: Image Recognition Day 17
Neural Networks References: “Artificial Intelligence for Games”
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Backpropagation Disclaimer: This PPT is modified based on
CSSE463: Image Recognition Day 17
Backpropagation.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
David Kauchak CS51A Spring 2019
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019

Presentation transcript:

backpropagation David Kauchak CS159 – Fall 2019

Admin Assignment 5

Neural network inputs Individual perceptrons/neurons - often we view a slice of them

Neural network some inputs are provided/entered inputs - often we view a slice of them

Neural network inputs each perceptron computes and calculates an answer - often we view a slice of them

Neural network inputs those answers become inputs for the next level - often we view a slice of them

Neural network inputs finally get the answer after all levels compute - often we view a slice of them finally get the answer after all levels compute

A neuron/perceptron Input x1 Weight w1 Weight w2 Input x2 Output y activation function Input x3 Weight w3 Weight w4 Input x4

Activation functions hard threshold: sigmoid tanh x 𝑔 𝑖𝑛 = 1 𝑖𝑓 𝑖𝑛≥𝑇 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Training x1 x2 x1 xor x2 1 How do we learn the weights? ? Input x1 ? Output = x1 xor x2 T=? ? T=? ? ? x1 x2 x1 xor x2 1 T=? How do we learn the weights?

Learning in multilayer networks Challenge: for multilayer networks, we don’t know what the expected output/error is for the internal nodes! how do we learn these weights? w w w w w w w w w expected output? w w w perceptron/ linear model neural network

Backpropagation: intuition Gradient descent method for learning weights by optimizing a loss function calculate output of all nodes calculate the weights for the output layer based on the error “backpropagate” errors through hidden layers

Backpropagation: intuition We can calculate the actual error here

Backpropagation: intuition Key idea: propagate the error back to this layer

Backpropagation: intuition w2 w3 w1 error error for node is ~ wi * error

Backpropagation: intuition w5 w6 w4 ~w3 * error Calculate as normal, but weight the error

Backpropagation: the details Gradient descent method for learning weights by optimizing a loss function calculate output of all nodes calculate the updates directly for the output layer “backpropagate” errors through hidden layers squared error

Backpropagation: the details Notation: m: features/inputs d: hidden nodes hj: output from hidden nodes x1 h1 out … … xm hd How many weights?

Backpropagation: the details Notation: m: features/inputs d: hidden nodes hj: output from hidden nodes x1 h1 v1 out … … vd xm hd d weights: denote vk

Backpropagation: the details Notation: m: features/inputs d: hidden nodes hj: output from hidden nodes x1 h1 v1 out … … vd xm hd How many weights?

Backpropagation: the details Notation: m: features/inputs d: hidden nodes hk: output from hidden nodes w11 x1 h1 w21 v1 w31 out … … vd wdm xm hd d * m: denote wkj first index = hidden node second index = feature w23: weight from input 3 to hidden node 2 w4: all the m weights associated with hidden node 4

Backpropagation: the details Gradient descent method for learning weights by optimizing a loss function calculate output of all nodes calculate the updates directly for the output layer “backpropagate” errors through hidden layers

Finding the minimum You’re blindfolded, but you can see out of the bottom of the blindfold to the ground right by your feet. I drop you off somewhere and tell you that you’re in a convex shaped valley and escape is at the bottom/minimum. How do you get out?

Finding the minimum loss w How do we do this for a function?

One approach: gradient descent Partial derivatives give us the slope (i.e. direction to move) in that dimension loss w

One approach: gradient descent Partial derivatives give us the slope (i.e. direction to move) in that dimension Approach: pick a starting point (w) repeat: move a small amount towards decreasing loss (using the derivative) loss w

One approach: gradient descent Partial derivatives give us the slope (i.e. direction to move) in that dimension Approach: pick a starting point (w) repeat: pick a dimension move a small amount in that dimension towards decreasing loss (using the derivative)

Output layer weights how far from correct and which direction v1 out … vd hd how far from correct and which direction size and direction of the feature associated with this weight slope of the activation function where input is at

Output layer weights ? how far from correct and which direction v1 h1 … vd hd how far from correct and which direction ?

Output layer weights how far from correct and which direction v1 out … vd hd how far from correct and which direction prediction < label: increase the weight prediction > label: decrease the weight bigger difference = bigger change

Output layer weights h1 v1 out … vd hd slope of the activation function where input is at smaller step bigger step smaller step

Output layer weights h1 v1 out … vd hd size and direction of the feature associated with this weight

Backpropagation: the details Gradient descent method for learning weights by optimizing a loss function calculate output of all nodes calculate the updates directly for the output layer “backpropagate” errors through hidden layers

Backpropagation: hidden layer output layer hidden layer error output activation slope error output activation slope input input w11 x1 h1 w21 v1 w31 out … … vd wdm weight from hidden layer to output layer slope of wx xm hd

There’s a bit of math to show this, but it’s mostly just calculus…

Learning rate Output layer update: Hidden layer update: Adjust how large the updates we’ll make (a parameter to the learning approach – like lambda for n-gram models) Often will start larger and then get smaller

Backpropagation implementation for some number of iterations: randomly shuffle training data for each example: Compute all outputs going forward Calculate new weights and modified errors at output layer Recursively calculate new weights and modified errors on hidden layers based on recursive relationship Update model with new weights - Make sure to calculate all of the outputs and modified errors *before* updating any of the model weights

Many variations Momentum: include a factor in the weight update to keep moving in the direction of the previous update Mini-batch: Compromise between online and batch Avoids noisiness of updates from online while making more educated weight updates Simulated annealing: With some probability make a random weight update Reduce this probability over time …

Challenges of neural networks? Picking network configuration Can be slow to train for large networks and large amounts of data Loss functions (including squared error) are generally not convex with respect to the parameter space