Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.

Neural Networks I CMPUT 466/551 Nilanjan Ray

Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation –Examples

Projection Pursuit Regression Additive model with non-linear g m’s Features X is projected to parameters w m, which we have to find from training data Precursors to neural networks

Fitting a PPR Model Minimize squared-error loss function: Proceed in forward stages: M=1, 2,…etc. –At each stage, estimate g, given w (Say, by fitting a spline function) –Estimate w, given g (details provided in the next slide) The value of M is decided by cross-validation

Fitting a PPR Model… At stage m, given g, compute w (Gauss- Newton search) weights adjusted responses So, this is a weighted least-square technique Residual after stage m-1

Vanilla Neural Network hidden layer Input layer Output layer

The Sigmoid Function σ(0.5v) σ(10v) σ(sv) = 1/(1+exp(-sv)) a smooth (regularized) threshold function s controls the activation rate s↑, hard activation s↓, close to identity function

Multilayer Feed Forward NN http://www.teco.uni-karlsruhe.de/~albrecht/neuro/html/node18.html Examples architectures

NN: Universal Approximator A NN with one hidden units, can approximate arbitrarily well any functional continuous mapping from one finite dimensional space to another, provided number of hidden units is sufficiently large. Proof is based on Fourier expansion of a function (see Bishop).

NN: Kolomogorov’s Theorem Any continuous mapping f(x) from d input variables can be expressed by a neural networks with two hidden layers of nodes. The first layer contains d(2d+1) nodes, and the second layer contains (2d+1) nodes. So, why bother about topology at all? This ‘universal’ architecture is impractical because the functions represented by hidden units will be non-smooth and are unsuitable for learning. (see Bishop for more.)

The XOR Problem and NN Activation functions are hard thresholds at 0

Fitting Neural Networks Parameters to learn from training data Cost functions –Sum-of-squared errors for regression –Cross-entropy errors for classification

Gradient descent: Back- propagation

Back-propagation: Implementation Step 1: Initialize the parameters (weights) of NN Iterate –Forward pass: compute f k (X) for the current parameter values starting at the input layer and moving all the up to the output layer. –Backward pass: Start at the output layer; compute  i ; go down one layer at a time and compute s mi all the way down to the input layer –Update weights by gradient descent rule

Issues in Training Neural Networks Initial values of parameters –Back-propagation finds local minimum Overfitting –Neural networks have too many parameters –Early stop and regularization Scaling of the inputs –Inputs are typically scaled to have zero mean and unit standard deviation Number of hidden units and layers –Better to have too many than too few –With ‘traditional’ back-propagation a long NN gets stuck in local minima and does not learn well

Avoiding Overfitting Weight decay cost function: Weight elimination penalty function:

Example

Example: ZIP Code Recognition

Some Architectures for ZIP Code Recognition

Architectures and Parameters Net-1: No hidden layer, equivalent to multinomial logistic regression Net-2: One hidden layer, 12 hidden units fully connected Net-3: Two hidden layers, locally connected Net-4: Two hidden layers, locally connected with weight sharing Net-5: Two hidden layers, locally connected, two levels of weight sharing Weight sharing is also known as convolutional neural networks

More on Architectures and Results ArchitectureLinksWeights%Correct Net-1Single layer2570 80.0 Net-2Two layer3214 87.0 Net-3Locally connected 1226 88.5 Net-4Constrained2266113294.0 Net-5Constrained5194106098.4 Net-1:#Links/Weights-2570 = 16*16*10+10 Net-2:#Links/Weights-16*16*12+12+12*10+10=3214 Net-3:#Links/Weights-8*8*3*3+8*8+4*4*5*5+4*4+10*4*4+10=1226 Net-4:#Links- 2*8*8*3*3 + 2*8*8 + 4*4*5*5*2 + 4*4 + 10*4*4+10 = 2266 #Weights- 2*3*3 + 2*8*8 + 4*4*5*5*2 + 4*4 + 10*4*4+10 = 1132 Net-5:#Links- 2*8*8*3*3 + 2*8*8 + 4*4*4*5*5*2 + 4*4*4 + 4*4*4*10 + 10 = 5194 #Weights- 2*3*3 + 2*8*8 + 4*5*5*2 + 4*4*4 + 4*4*4*10 + 10 = 1060

Performance vs. Training Time

Some References C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Univ. Press, 1996. (For good understanding) S. Haykin, Neural Networks and Learning Machines, Prentice Hall, 2009. (For very basic reading, lots of examples etc.) Prominent Researchers: –Yann LeCun (http://yann.lecun.com/) –G.E. Hinton (http://www.cs.toronto.edu/~hinton/) –Yosua Bengio http://www.iro.umontreal.ca/~bengioy/yoshua_en/index. html

Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.

Similar presentations

Presentation on theme: "Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.

Similar presentations

Presentation on theme: "Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation."— Presentation transcript:

Similar presentations

About project

Feedback