CS623: Introduction to Computing with Neural Nets (lecture-5)

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
NEURAL NETWORKS Backpropagation Algorithm
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Lecture 14 – Neural Networks
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2010 Shreekanth Mandayam ECE Department Rowan University.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Spring 2002 Shreekanth Mandayam Robi Polikar ECE Department.
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.
CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Artificial Neural Networks
Appendix B: An Example of Back-propagation algorithm
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and NN based IR.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
Multi-Layer Perceptron
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
Pushpak Bhattacharyya Computer Science and Engineering Department
EEE502 Pattern Recognition
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 45– Backpropagation issues; applications 11 th Nov, 2010.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and Application.
Perceptrons Michael J. Watts
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Today’s Lecture Neural networks Training
Neural networks.
Fall 2004 Backpropagation CS478 - Machine Learning.
Artificial Neural Networks
Supervised Learning in ANNs
Chapter 2 Single Layer Feedforward Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
第 3 章 神经网络.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
CS621: Artificial Intelligence
Prof. Carolina Ruiz Department of Computer Science
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Chapter 3. Artificial Neural Networks - Introduction -
Perceptron as one Type of Linear Discriminants
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Pushpak Bhattacharyya Computer Science and Engineering Department
CS623: Introduction to Computing with Neural Nets (lecture-4)
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-9)
Pushpak Bhattacharyya Computer Science and Engineering Department
CS621 : Artificial Intelligence
CS621: Artificial Intelligence
Computer Vision Lecture 19: Object Recognition III
CS623: Introduction to Computing with Neural Nets (lecture-3)
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
Prof. Carolina Ruiz Department of Computer Science
Outline Announcement Neural networks Perceptrons - continued
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS623: Introduction to Computing with Neural Nets (lecture-5) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Backpropagation algorithm …. Output layer (m o/p neurons) j wji …. i Hidden layers …. …. Input layer (n i/p neurons) Fully connected feed forward network Pure FF network (no jumping of connections over layers)

Gradient Descent Equations

Backpropagation – for outermost layer

Backpropagation for hidden layers …. Output layer (m o/p neurons) k …. j Hidden layers …. i …. Input layer (n i/p neurons) k is propagated backwards to find value of j

Backpropagation – for hidden layers

General Backpropagation Rule General weight updating rule: Where for outermost layer for hidden layers

How does it work? Input propagation forward and error propagation backward (e.g. XOR) w2=1 w1=1 = 0.5 x1x2 -1 x1 x2 1.5 1

Issues in the training algorithm Algorithm is greedy. It always changes weight such that E reduces. The algorithm may get stuck up in a local minimum.

Local Minima Due to the Greedy nature of BP, it can get stuck in local minimum m and will never be able to reach the global minimum g as the error can only decrease by weight change.

Reasons for no progress in training Stuck in local minimum. Network paralysis. (High –ve or +ve i/p makes neurons to saturate.) (learning rate) is too small.

Diagnostics in action (1) 1) If stuck in local minimum, try the following: Re-initializing the weight vector. Increase the learning rate. Introduce more neurons in the hidden layer.

Diagnostics in action (1) contd. 2) If it is network paralysis, then increase the number of neurons in the hidden layer. Problem: How to configure the hidden layer ? Known: One hidden layer seems to be sufficient. [Kolmogorov (1960’s)]

Diagnostics in action(2) Kolgomorov statement: A feedforward network with three layers (input, output and hidden) with appropriate I/O relation that can vary from neuron to neuron is sufficient to compute any function. More hidden layers reduce the size of individual layers.

Diagnostics in action(3) 3) Observe the outputs: If they are close to 0 or 1, try the following: Scale the inputs or divide by a normalizing factor. Change the shape and size of the sigmoid.

Answers to Quiz-1 Q1: Show that of the 256 Boolean functions of 3 variables, only half are computable by a threshold perceptron Ans: The characteristic equation for 3 variables is W1X1+W2X2+W3X3= θ (E) The 8 Boolean value combinations when inserted in (E) will produce 8 hyperplanes passing through the origin in the < W1, W2, W3, θ> space.

Q1 (contd) The maximum number of function computable by this perceptron is the number of regions produced by the intersection of these 8 planes in the 4 dimensional space R8,4= R7,4 + R7,3 (1) R1,4= 2 and Rm,2= 2m, for m= 1,4 (boundary condition)

(8,4) 128 Answer (7,3) 44 84 (7,4) 32 (6,2) 52 (6,3) 12 (6,4) 30 22 (5,4) (5,3) (5,2) 10 14 (4,4) 16 (4,3) (4,2) 8 Each non-root node has 2 children as per the the recurrence relation. 8 (3,2) (3,4) 8 (3,3) 6 (2,4) (2,3) 4 (2,2) 4 4 The value of Rm,n is Stored beside the node (m,n) (1,4) 2 (1,2) 2 (1,3) 2

Answer to Quiz1 (contd) Q2. Prove if a perceptron with sin(x) as i-o relation can compute X-OR Ans: y θ W2 yu W1 yl

Q2 (contd) Input <0,0>: y < yl sin(θ) < yl (1) Input <0,1>: y > yu sin(W1+θ) > yu (2) Input <1,0>: y > yu sin(W2+θ) > yu (3) Input <1,1>: y < yl sin(W1+W2+θ) < yl (4)

Q2 (contd) Taking yl =0.1, yu= 0.9 W1=(Π/2)=W2 θ= 0 We can see that the perceptron can compute X-OR

Answer to Quiz-1 (contd) Q3: If in the perceptron training algorithm, the failed vector is again chosen by the algorithm, will there be any problem? Ans: In PTA, Wn= Wn-1+ Xfail After this, Xfail is chosen again for testing and is added if fails again. This continues until Wk.Xfail > 0. Will this terminate?

Q3 (contd) It will, because: Wn= Wn-1 + Xfail Wn-1= Wn-2 + Xfail . Therefore, Wn.Xfail= W0.Xfail+ n.(Xfail)2 Positive, growing with n. Will overtake – δ after some iterations. Hence “no problem” is the answer. -δ