Presentation is loading. Please wait.

Presentation is loading. Please wait.

Artificial Neural Networks

Similar presentations


Presentation on theme: "Artificial Neural Networks"— Presentation transcript:

1 Artificial Neural Networks
Introduction Design of Primitive Units Perceptrons The Backpropagation Algorithm What is machine learning?

2 In contrast to perceptrons, multilayer networks can learn not only
Basics In contrast to perceptrons, multilayer networks can learn not only multiple decision boundaries, but the boundaries may be nonlinear. Output nodes Internal nodes What is machine learning? Input nodes

3 Example x2 What is machine learning? x1

4 To make nonlinear partitions on the space we need to define
One Single Unit To make nonlinear partitions on the space we need to define each unit as a nonlinear function (unlike the perceptron). One solution is to use the sigmoid unit. x1 w1 x2 g(x) What is machine learning? w2 Σ w0 wn xn O = σ(g(x)) = 1 / 1 + e –g(x) X0=1

5 Function σ is called the sigmoid or logistic function.
More Precisely O(x1,x2,…,xn) = σ ( WX ) where: σ ( WX ) = 1 / 1 + e -WX Function σ is called the sigmoid or logistic function. It has the following property: d σ(y) / dy = σ(y) (1 – σ(y)) What is machine learning?

6 Backpropagation Algorithm
Goal: To learn the weights for all links in an interconnected multilayer network. We begin by defining our measure of error: E(W) = ½ Σm Σk (tmk – omk) 2 k varies along the output nodes and m over the training examples. The idea is to use again a gradient descent over the space of weights to find a global minimum (no guarantee). What is machine learning?

7 The idea is to find a minimum in the space of weights and
Gradient Descent The idea is to find a minimum in the space of weights and the error function E: E(W) What is machine learning? w1 w2

8 Output Nodes Output nodes What is machine learning?

9 Create a network with nin input nodes, nhidden internal nodes,
Algorithm Create a network with nin input nodes, nhidden internal nodes, and nout output nodes. Initialize all weights to small random numbers. Until error is small do: For each example X do Propagate example X forward through the network Propagate errors backward through the network What is machine learning?

10 Given example X, compute the output of every node until
Propagating Forward Given example X, compute the output of every node until we reach the output nodes: Output nodes Compute sigmoid function Internal nodes What is machine learning? Input nodes Example X

11 Propagating Error Backward
For each output node k compute the error: δk = Ok (1-Ok)(tk – Ok) For each hidden unit h, calculate the error: δh = Oh (1-Oh) Σk Wkh δk Update each network weight: Wji = Wji + ΔWji where ΔWji = η δj Xji (Wji and Xji are the input and weight of node i to node j) What is machine learning?

12 Remarks on Backpropagation
It implements a gradient descent search over the weight space. It may become trapped in local minima. In practice, it is very effective. 4. How to avoid local minima? Add momentum. Use stochastic gradient descent. Use different networks with different initial values for the weights. What is machine learning?

13 Generalization and Overfitting
One obvious stopping point for backpropagation is to continue iterating until the error is below some threshold; this can lead to overfitting. Validation set error Error What is machine learning? Training set error Number of weight updates

14 Use a validation set and stop until the error is small in this set.
Solutions Use a validation set and stop until the error is small in this set. Use 10 fold cross validation. Use weight decay; the weights are decreased slowly on each iteration. What is machine learning?

15 Historical Background
Paul Werbos (1974) Proposed the back propagation algorithm. Several neurons are trained together. Rediscovered by Rumelhart, Hinton, McClelland (1986) John Hopfield (1982) A neural network can find a minimum when it reaches a state of minimum energy. What is machine learning?

16 Scaling Input If one attribute is much larger than another attribute, the weights will be adjusted to represent such differences; that is not desirable. Solution: Standardize all features previous to training. The mean of each feature should be zero. The variance should be fixed (e.g 1.0 ) What is machine learning?

17 Training with Noise If training set is small, one can “produce” examples and use them as if they were normal examples by generating them from the same distribution. Assumption: Add d-dimensional Gaussian noise to the true training points. x2 What is machine learning? x1

18 Number of Hidden Units The number of hidden units is related to the “expressiveness” of the neural network (the complexity of the decision boundary). If examples are easy to discriminate few nodes are necessary. Conversely complex problems require many internal nodes. A rule of thumb is to choose roughly m / 10 weights, where m is the number of training examples. What is machine learning?

19 Learning Rates Different learning rates affect significantly the performance of a neural network. Optimal Learning Rate: Leads to the error minimum in one learning step. It’s been found that a principled method to set the learning rate is to assign a value “separately” for each weight. What is machine learning?

20 What is machine learning?

21 The weight update rule can be modified so as to depend
Adding Momentum The weight update rule can be modified so as to depend on the last iteration. At iteration s we have the following: ΔWji (s) = η δj Xji + αΔWji (s-1) Where α ( 0 <= α <= 1) is a constant called the momentum. It increases the speed along a local minimum. It increases the speed along flat regions. What is machine learning?

22 Cascade-Correlation Main ideas:
Begin with a two-layer network and train it. If error is low enough stop. If not do the following: a. Fix all weights b. Add one node and connect it to all input and output units. c. Train the network by adjusting only the weights of the new node 4. Go to step 2. What is machine learning?

23 What is machine learning?

24 Recurrent Networks (Time Series Analysis)
Recurrent networks have found application in time series prediction. Main ideas: The output units are “fed back” and duplicated as auxiliary inputs. During classification a pattern is presented to the input units. The feedforward flow is done, and the outputs serve as auxiliary input nodes. This produces new activations, and new outputs. What is machine learning?

25 What is machine learning?


Download ppt "Artificial Neural Networks"

Similar presentations


Ads by Google