Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;

Similar presentations


Presentation on theme: "CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;"— Presentation transcript:

1 CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence; Feedforward N/W 24 th March, 2011

2 PTA on NAND NAND: Y X 2 X 1 Y 0 0 1 0 1 1 W 2 W 1 1 0 1 1 1 1 X 2 X 1 Converted To W 2 W 1 W 0= Θ X 2 X 1 X 0=-1 Θ Θ

3 Preprocessing NAND Augmented: NAND-0 class Negated X 2 X 1 X 0 Y X 2 X 1 X 0 0 0 -1 1 V 0: 0 0 -1 0 1 -1 1 V 1: 0 1 -1 1 0 -1 1 V 2: 1 0 -1 1 1 -1 0 V 3: -1 1 -1 Vectors for which W= has to be found such that W. Vi > 0

4 PTA Algo steps Algorithm: 1. Initialize and Keep adding the failed vectors until W. Vi > 0 is true. Step 0: W = W 1 = + {V 0 Fails} = W 2 = + {V 3 Fails} = W 3 = + {V 0 Fails} = W 4 = + {V 1 Fails} =

5 Trying convergence W 5 = + {V 3 Fails} = W 6 = + {V 1 Fails} = W 7 = + {V 0 Fails} = W 8 = + {V 3 Fails} = W 9 = + {V 2 Fails} =

6 Trying convergence W 10 = + {V 3 Fails} = W 11 = + {V 1 Fails} = W 12 = + {V 3 Fails} = W 13 = + {V 1 Fails} = W 14 = + {V 2 Fails} =

7 Converged! W 15 = + {V 3 Fails} = W 16 = + {V 2 Fails} = W 17 = + {V 3 Fails} = W 18 = + {V 1 Fails} = W 2 = -3, W 1 = -2, W 0 = Θ = -4

8 PTA convergence

9 Statement of Convergence of PTA Statement: Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

10 Proof of Convergence of PTA Suppose w n is the weight vector at the n th step of the algorithm. At the beginning, the weight vector is w 0 Go from w i to w i+1 when a vector X j fails the test w i X j > 0 and update w i as w i+1 = w i + X j Since Xjs form a linearly separable function,  w* s.t. w*X j > 0  j

11 Proof of Convergence of PTA (cntd.) Consider the expression G(w n ) = w n. w* | w n | where w n = weight at nth iteration G(w n ) = |w n |. |w*|. cos  |w n | where  = angle between w n and w* G(w n ) = |w*|. cos  G(w n ) ≤ |w*| ( as -1 ≤ cos  ≤ 1)

12 Behavior of Numerator of G w n. w* = (w n-1 + X n-1 fail ). w*  w n-1. w* + X n-1 fail. w*  (w n-2 + X n-2 fail ). w* + X n-1 fail. w* …..  w 0. w* + ( X 0 fail + X 1 fail +.... + X n-1 fail ). w* w*.X i fail is always positive: note carefully Suppose |X j | ≥ , where  is the minimum magnitude. Num of G ≥ |w 0. w*| + n . |w*| So, numerator of G grows with n.

13 Behavior of Denominator of G |w n | =  w n. w n   (w n-1 + X n-1 fail ) 2   (w n-1 ) 2 + 2. w n-1. X n-1 fail + (X n-1 fail ) 2 ≤  (w n-1 ) 2 + (X n-1 fail ) 2 (as w n-1. X n-1 fail ≤ 0 ) ≤  (w 0 ) 2 + (X 0 fail ) 2 + (X 1 fail ) 2 +…. + (X n-1 fail ) 2 |X j | ≤  (max magnitude) So, Denom ≤  (w 0 ) 2 + n  2

14 Some Observations Numerator of G grows as n Denominator of G grows as  n => Numerator grows faster than denominator If PTA does not terminate, G(w n ) values will become unbounded.

15 Some Observations contd. But, as |G(w n )| ≤ |w*| which is finite, this is impossible! Hence, PTA has to converge. Proof is due to Marvin Minsky.

16 Convergence of PTA proved Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

17 Feedforward Network

18 Limitations of perceptron Non-linear separability is all pervading Single perceptron does not have enough computing power Eg: XOR cannot be computed by perceptron

19 Solutions Tolerate error (Ex: pocket algorithm used by connectionist expert systems). Try to get the best possible hyperplane using only perceptrons Use higher dimension surfaces Ex: Degree - 2 surfaces like parabola Use layered network

20 Pocket Algorithm Algorithm evolved in 1985 – essentially uses PTA Basic Idea:  Always preserve the best weight obtained so far in the “pocket”  Change weights, if found better (i.e. changed weights result in reduced error).

21 XOR using 2 layers Non-LS function expressed as a linearly separable function of individual linearly separable functions.

22 Example - XOR x1x1 x2x2 x1x2x1x2 000 011 100 110 w 2 =1.5w 1 =-1 θ = 1 x1x1 x2x2  Calculation of XOR Calculation of x1x2x1x2 w 2 =1w 1 =1 θ = 0.5 x1x2x1x2 x1x2x1x2

23 Example - XOR w 2 =1w 1 =1 θ = 0.5 x1x2x1x2 x1x2x1x2 x1x1 x2x2 1.5 11

24 Some Terminology A multilayer feedforward neural network has Input layer Output layer Hidden layer (assists computation) Output units and hidden units are called computation units.

25 Training of the MLP Multilayer Perceptron (MLP) Question:- How to find weights for the hidden layers when no target output is available? Credit assignment problem – to be solved by “Gradient Descent”


Download ppt "CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;"

Similar presentations


Ads by Google