CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Slides:

Advertisements

Similar presentations

Multi-Layer Perceptron (MLP)

Advertisements

Perceptron Lecture 4.

Beyond Linear Separability

CS344: Principles of Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11, 12: Perceptron Training 30 th and 31 st Jan, 2012.

NEURAL NETWORKS Perceptron

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Simple Neural Nets For Pattern Classification

The back-propagation training algorithm

September 21, 2010Neural Networks Lecture 5: The Perceptron 1 Supervised Function Approximation In supervised learning, we train an ANN with a set of vector.

An Illustrative Example

Artificial Neural Networks

September 23, 2010Neural Networks Lecture 6: Perceptron Learning 1 Refresher: Perceptron Training Algorithm Algorithm Perceptron; Start with a randomly.

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

CS623: Introduction to Computing with Neural Nets (lecture-10) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Learning in feed forward layered Networks: The Tiling Algorithm Authors – Marc Mezard and Jean Pierre Nadal. Dated 21 st June 1989 Published at J. Phys.

Artificial Neural Networks

Appendix B: An Example of Back-propagation algorithm

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;

Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.

CS 478 – Tools for Machine Learning and Data Mining Backpropagation.

CS621: Artificial Intelligence Lecture 11: Perceptrons capacity Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 6: 22/09/2004, 24/09/2004,

Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Linear Separability,

A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;

Non-Bayes classifiers. Linear discriminants, neural networks.

Akram Bitar and Larry Manevitz Department of Computer Science

CSE & CSE6002E - Soft Computing Winter Semester, 2011 Neural Networks Videos Brief Review The Next Generation Neural Networks - Geoff Hinton.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.

Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,

CS621 : Artificial Intelligence

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.

Chapter 2 Single Layer Feedforward Networks

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.

Pushpak Bhattacharyya Computer Science and Engineering Department

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36, 37: Hardness of training.

Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.

CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

CS623: Introduction to Computing with Neural Nets (lecture-17) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Neural NetworksNN 21 Architecture We consider the architecture: feedforward NN with one layer It is sufficient to study single layer perceptrons with.

CS623: Introduction to Computing with Neural Nets (lecture-7) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Neural networks.

Fall 2004 Backpropagation CS478 - Machine Learning.

Chapter 2 Single Layer Feedforward Networks

CS623: Introduction to Computing with Neural Nets (lecture-5)

One-layer neural networks Approximation problems

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

CS621: Artificial Intelligence

CS623: Introduction to Computing with Neural Nets (lecture-2)

CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.

CS623: Introduction to Computing with Neural Nets (lecture-4)

CS623: Introduction to Computing with Neural Nets (lecture-9)

CS621: Artificial Intelligence

Neuro-Computing Lecture 2 Single-Layer Perceptrons

CS623: Introduction to Computing with Neural Nets (lecture-5)

CS623: Introduction to Computing with Neural Nets (lecture-3)

CS344 : Introduction to Artificial Intelligence

CS 621 Artificial Intelligence Lecture 29 – 22/10/05

CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.

Prof. Pushpak Bhattacharyya, IIT Bombay

CS621: Artificial Intelligence Lecture 14: perceptron training

CS621: Artificial Intelligence Lecture 18: Feedforward network contd

Akram Bitar and Larry Manevitz Department of Computer Science

CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.

Presentation transcript:

CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Transformation from set-spltting to positive linear confinement: for proving NP- completeness of the latter S = {s 1, s 2, s 3 } c 1 = {s 1, s 2 }, c 2 = {s 2, s 3 } x1x1 x2x2 x3x3 (0,0,0) +ve (0,0,1) -ve (0,1,0) -ve (1,0,0) -ve (1,1,0) +ve (0,1,1) +ve

Proving the transformation Statement –Set-splitting problem has a solution if and only if positive linear confinement problem has a solution. Proof in two parts: if part and only if part If part –Given Set-splitting problem has a solution. –To show that the constructed Positive Linear Confinement (PLC) problem has a solution –i.e. to show that since S 1 and S 2 exist, P 1 and P 2 exist which confine the positive points

Proof of if part P 1 and P 2 are as follows: –P 1 : a 1 x 1 + a 2 x a n x n = -1/2-- Eqn A –P 2 : b 1 x 1 + b 2 x b n x n = -1/2-- Eqn B a i = -1,if s i ε S 1 = n, otherwise b i = -1, if s i ε S 2 = n, otherwise

What is proved Origin (+ve point) is on one side of P 1 +ve points corresponding to c i ’s are on the same side as the origin. -ve points corresponding to S 1 are on the opposite side of P 1

Illustrative Example Example S = {s 1, s 2, s 3 } c 1 = {s 1, s 2 }, c 2 = {s 2, s 3 } Splitting : S 1 = {s 1, s 3 }, S 2 = {s 2 } +ve points: –(,+),(,+),(,+) -ve points: –(,-),(,-), (,-)

Example (contd.) The constructed planes are: P 1 : a 1 x 1 + a 2 x 2 + a 3 x 3 = -1/2 -x 1 + 3x 2 – x 3 = -1/2 P 2 : b 1 x 1 + b 2 x 2 + b 3 x 3 = -1/2 3x 1 – x 2 + 3x 3 = -1/2

Graphic for Example P1P1 P2 P2 + - S1S1 S2S2

Proof of only if part Given +ve and -ve points constructed from the set-splitting problem, two hyperplanes P 1 and P 2 have been found which do positive linear confinement To show that S can be split into S 1 and S 2

Proof (Only if part) contd. Let the two planes be: –P 1 : a 1 x 1 + a 2 x a n x n = θ 1 –P 2 : b 1 x 1 + b 2 x b n x n = θ 2 Then, –S 1 = {elements corresponding to -ve points separated by P 1 } –S 2 = {elements corresponding to -ve points separated by P 2 }

Proof (Only if part) contd. Suppose c i ⊂ S 1, then every element in c i is contained in S 1 Let e 1 i, e 2 i,..., e mi i be the elements of c i corresponding to each element Evaluating for each co-efficient, we get, –a 1 < θ 1,a 2 < θ 1,...,a mi < θ 1 -- (1) –But a 1 + a a m > θ 1 -- (2) –and 0 > θ 1 -- (3) CONTRADICTION

What has been shown Positive Linear Confinement is NP-complete. Confinement on any set of points of one kind is NP-complete (easy to show) The architecture is special- only one hidden layer with two nodes The neurons are special, 0-1 threshold neurons, NOT sigmoid Hence, can we generalize and say that FF NN training is NP-complete? Not rigorously, perhaps; but strongly indicated

Accelerating Backpropagation

Quickprop: Fahlman, ‘88 Assume a parabolic approximation to the error surface Actual Error surface Parabolic Approximation w E W t-1 WtWt True error minimum New wt by QP

Quickprop basically makes use of the second derivative E= aw 2 + bw + c First derivative, E’= δE/ δw= 2aw + b E’(t-1)= 2aw(t-1) + b --- (1) E’(t)= 2aw(t) + b --- (2) Solving, –a= [E’(t) – E’(t-1)]/2[w(t) – w(t-1)] –B= E’(t) - [E’(t) – E’(t-1)]w(t)/[w(t) – w(t-1)]

Next weight w(t+1) This is found by minimizing E(t+1) Set E’(t+1) to 0 2aw(t+1) + b = 0 Substituting for a and b –w(t+1)-w(t)= [w(t)-w(t-1)] X [E’(t)]/[E’(t-1)-E’(t)] –i.e. Δw(t) = Δw(t-1) E’(t) E’(t-1)-E’(t)

How to fix the hidden layer

By trial and error mostly No theory yet Read papers by Eric Baum Heuristics exist like the mean of the sizes of the i/p and the o/p layers (arithmetic, geometric and harmonic)

Grow the network: Tiling Algorithm A kind of divide and conquer strategy Given the classes in the data, run the perceptron training algorithm If linearly separable, convergence without any hidden layer If not, do as well as you can (pocket algorithm) This will produce classes with misclassified points

Tiling Algorithm (contd) Take the class with misclassified points and break into subclasses which contain no outliers Run PTA again after recruiting the required number of perceptrons Do this until homogenous classes are obtained Apply the same procedure for the first hidden layer to obtain the second hidden layer and so on

Illustration XOR problem Classes are (0, 0) (1, 1) (0, 1) (1, 0)

As best a classification as possible (0,0) (1,0) (0,1) (1,1) +ve -ve

Classes with error (0,0) (1,1) (0,1) (1,0) outlier

How to achieve this classification Give the labels as shown: eqv to an OR problem (0,0) (1,1) (0,1) (1,0) outlier + -

The partially developed n/w Get the first neuron in the hidden layer, which computes OR x1x1 x2x2 h1h

Break the incorrect class (1,1) (0,1) (1,0) outlier (1,0) (0,1) (1,1) (0,0) + - Don’t care: Make +

Solve classification for h 2 (1,1) (1,0) (0,1) (0,0) + - This is x 1 x 2

Next stage of the n/w x1x1 x2x2 h1h1 h2h Computes x 1 x 2 Computes x 1 +x 2

Getting the output layer Solve a tiling algo problem for the hidden layer yh2x1x2h2x1x2 h 1 (x 1 +x 2 ) x1x1 x2x2 (1,1) (0,0) (0,1) (1,0) + - AND problem

Final n/w x1x1 x2x2 h1h1 h2h Computes x 1 x 2 Computes x 1 +x 2 AND n/w y Computes x 1 x 2

Lab exercise Implement the tiling algorithm and run it for 1.XOR 2.Majority 3.IRIS data