CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

CS344: Principles of Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11, 12: Perceptron Training 30 th and 31 st Jan, 2012.
G53MLE | Machine Learning | Dr Guoping Qiu
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification Neural Networks 1
Overview over different methods – Supervised Learning
September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Prof. Carla P. Gomes Module: Neural Networks Expressiveness.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Where We’re At Three learning rules  Hebbian learning regression  LMS (delta rule) regression  Perceptron classification.
Artificial Neural Networks
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 6: 22/09/2004, 24/09/2004,
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Non-Bayes classifiers. Linear discriminants, neural networks.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
Linear Classification with Perceptrons
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
Chapter 2 Single Layer Feedforward Networks
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.
EEE502 Pattern Recognition
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Announcements 1. Textbook will be on reserve at library 2. Topic schedule change; modified reading assignment: This week: Linear discrimination, evaluating.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20: Neural Net Basics: Perceptron.
CS621 : Artificial Intelligence
Fall 2004 Backpropagation CS478 - Machine Learning.
CS623: Introduction to Computing with Neural Nets (lecture-5)
CSE 473 Introduction to Artificial Intelligence Neural Networks
CS621: Artificial Intelligence
Classification Neural Networks 1
CS623: Introduction to Computing with Neural Nets (lecture-2)
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Perceptron as one Type of Linear Discriminants
Neural Networks Chapter 5
Pushpak Bhattacharyya Computer Science and Engineering Department
CS623: Introduction to Computing with Neural Nets (lecture-4)
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Lecture Notes for Chapter 4 Artificial Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-9)
Pushpak Bhattacharyya Computer Science and Engineering Department
CS621: Artificial Intelligence
ARTIFICIAL INTELLIGENCE
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-3)
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence; Feedforward N/W 24 th March, 2011

PTA on NAND NAND: Y X 2 X 1 Y W 2 W X 2 X 1 Converted To W 2 W 1 W 0= Θ X 2 X 1 X 0=-1 Θ Θ

Preprocessing NAND Augmented: NAND-0 class Negated X 2 X 1 X 0 Y X 2 X 1 X V 0: V 1: V 2: V 3: Vectors for which W= has to be found such that W. Vi > 0

PTA Algo steps Algorithm: 1. Initialize and Keep adding the failed vectors until W. Vi > 0 is true. Step 0: W = W 1 = + {V 0 Fails} = W 2 = + {V 3 Fails} = W 3 = + {V 0 Fails} = W 4 = + {V 1 Fails} =

Trying convergence W 5 = + {V 3 Fails} = W 6 = + {V 1 Fails} = W 7 = + {V 0 Fails} = W 8 = + {V 3 Fails} = W 9 = + {V 2 Fails} =

Trying convergence W 10 = + {V 3 Fails} = W 11 = + {V 1 Fails} = W 12 = + {V 3 Fails} = W 13 = + {V 1 Fails} = W 14 = + {V 2 Fails} =

Converged! W 15 = + {V 3 Fails} = W 16 = + {V 2 Fails} = W 17 = + {V 3 Fails} = W 18 = + {V 1 Fails} = W 2 = -3, W 1 = -2, W 0 = Θ = -4

PTA convergence

Statement of Convergence of PTA Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Proof of Convergence of PTA Suppose w n is the weight vector at the n th step of the algorithm. At the beginning, the weight vector is w 0 Go from w i to w i+1 when a vector X j fails the test w i X j > 0 and update w i as w i+1 = w i + X j Since Xjs form a linearly separable function,  w* s.t. w*X j > 0  j

Proof of Convergence of PTA (cntd.) Consider the expression G(w n ) = w n. w* | w n | where w n = weight at nth iteration G(w n ) = |w n |. |w*|. cos  |w n | where  = angle between w n and w* G(w n ) = |w*|. cos  G(w n ) ≤ |w*| ( as -1 ≤ cos  ≤ 1)

Behavior of Numerator of G w n. w* = (w n-1 + X n-1 fail ). w*  w n-1. w* + X n-1 fail. w*  (w n-2 + X n-2 fail ). w* + X n-1 fail. w* …..  w 0. w* + ( X 0 fail + X 1 fail X n-1 fail ). w* w*.X i fail is always positive: note carefully Suppose |X j | ≥ , where  is the minimum magnitude. Num of G ≥ |w 0. w*| + n . |w*| So, numerator of G grows with n.

Behavior of Denominator of G |w n | =  w n. w n   (w n-1 + X n-1 fail ) 2   (w n-1 ) w n-1. X n-1 fail + (X n-1 fail ) 2 ≤  (w n-1 ) 2 + (X n-1 fail ) 2 (as w n-1. X n-1 fail ≤ 0 ) ≤  (w 0 ) 2 + (X 0 fail ) 2 + (X 1 fail ) 2 +…. + (X n-1 fail ) 2 |X j | ≤  (max magnitude) So, Denom ≤  (w 0 ) 2 + n  2

Some Observations Numerator of G grows as n Denominator of G grows as  n => Numerator grows faster than denominator If PTA does not terminate, G(w n ) values will become unbounded.

Some Observations contd. But, as |G(w n )| ≤ |w*| which is finite, this is impossible! Hence, PTA has to converge. Proof is due to Marvin Minsky.

Convergence of PTA proved Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Feedforward Network

Limitations of perceptron Non-linear separability is all pervading Single perceptron does not have enough computing power Eg: XOR cannot be computed by perceptron

Solutions Tolerate error (Ex: pocket algorithm used by connectionist expert systems). Try to get the best possible hyperplane using only perceptrons Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola Use layered network

Pocket Algorithm Algorithm evolved in 1985 – essentially uses PTA Basic Idea:  Always preserve the best weight obtained so far in the “pocket”  Change weights, if found better (i.e. changed weights result in reduced error).

XOR using 2 layers Non-LS function expressed as a linearly separable function of individual linearly separable functions.

Example - XOR x1x1 x2x2 x1x2x1x w 2 =1.5w 1 =-1 θ = 1 x1x1 x2x2  Calculation of XOR Calculation of x1x2x1x2 w 2 =1w 1 =1 θ = 0.5 x1x2x1x2 x1x2x1x2

Example - XOR w 2 =1w 1 =1 θ = 0.5 x1x2x1x2 x1x2x1x2 x1x1 x2x

Some Terminology A multilayer feedforward neural network has Input layer Output layer Hidden layer (assists computation) Output units and hidden units are called computation units.

Training of the MLP Multilayer Perceptron (MLP) Question:- How to find weights for the hidden layers when no target output is available? Credit assignment problem – to be solved by “Gradient Descent”