Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neural NetworksNN 11 Neural netwoks thanks to: www.cs.vu.nl/~elena/slides Basics of neural network theory and practice for supervised and unsupervised.

Similar presentations


Presentation on theme: "Neural NetworksNN 11 Neural netwoks thanks to: www.cs.vu.nl/~elena/slides Basics of neural network theory and practice for supervised and unsupervised."— Presentation transcript:

1 Neural NetworksNN 11 Neural netwoks thanks to: www.cs.vu.nl/~elena/slides Basics of neural network theory and practice for supervised and unsupervised learning. Most popular Neural Network models: architectures learning algorithms applications

2 Neural NetworksNN 12 Neural Networks A NN is a machine learning approach inspired by the way in which the brain performs a particular learning task: –Knowledge about the learning task is given in the form of examples. –Inter neuron connection strengths (weights) are used to store the acquired information (the training examples). –During the learning process the weights are modified in order to model the particular learning task correctly on the training examples.

3 Neural NetworksNN 13 Supervised Learning –Recognizing hand-written digits, pattern recognition, regression. –Labeled examples (input, desired output) –Neural Network models: perceptron, feed-forward, radial basis function, support vector machine. Unsupervised Learning –Find similar groups of documents in the web, content addressable memory, clustering. –Unlabeled examples (different realizations of the input alone) –Neural Network models: self organizing maps, Hopfield networks. Learning

4 Neural NetworksNN 14 Neurons

5 Neural NetworksNN 15 Network architectures Three different classes of network architectures –single-layer feed-forward neurons are organized –multi-layer feed-forward in acyclic layers –recurrent The architecture of a neural network is linked with the learning algorithm used to train

6 Neural NetworksNN 16 Single Layer Feed-forward Input layer of source nodes Output layer of neurons

7 Neural NetworksNN 17 Multi layer feed-forward Input layer Output layer Hidden Layer 3-4-2 Network

8 Neural NetworksNN 18 Recurrent Network with hidden neuron(s): unit delay operator z -1 implies dynamic system z -1 Recurrent network input hidden output

9 Neural NetworksNN 19 Neural Network Architectures

10 Neural NetworksNN 110 The Neuron The neuron is the basic information processing unit of a NN. It consists of: 1A set of synapses or connecting links, each link characterized by a weight: W 1, W 2, …, W m 2An adder function (linear combiner) which computes the weighted sum of the inputs: 3Activation function (squashing function) for limiting the amplitude of the output of the neuron.

11 Neural NetworksNN 111 The Neuron Input signal Synaptic weights Summing function Bias b Activation function Local Field v Output y x1x1 x2x2 xmxm w2w2 wmwm w1w1

12 Neural NetworksNN 112 Bias of a Neuron Bias b has the effect of applying an affine transformation to u v = u + b v is the induced field of the neuron v u

13 Neural NetworksNN 113 Bias as extra input Input signal Synaptic weights Summing function Activation function Local Field v Output y x1x1 x2x2 xmxm w2w2 wmwm w1w1 w0w0 x 0 = +1 Bias is an external parameter of the neuron. Can be modeled by adding an extra input.

14 Neural NetworksNN 114 Dimensions of a Neural Network Various types of neurons Various network architectures Various learning algorithms Various applications

15 Neural NetworksNN 115 Face Recognition 90% accurate learning head pose, and recognizing 1-of-20 faces

16 Neural NetworksNN 116 Handwritten digit recognition

17 Neural NetworksNN 117 Learning in NN Hebb 49: learning by modifying connetions Widrow & Hoff 60: comparing with target

18 Neural NetworksNN 118 Architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just one neuron:

19 Neural NetworksNN 119 Single layer perceptrons Generalization to single layer perceptrons with more neurons is easy because: The output units are independent among each other Each weight only affects one of the outputs

20 Neural NetworksNN 120 Perceptron: Neuron Model Uses a non-linear (McCulloch-Pitts) model of neuron: x1x1 x2x2 xnxn w2w2 w1w1 wnwn b (bias) vy  (v)  is the sign function:  (v) = +1IF v >= 0 -1IF v < 0 Is the function sign(v)

21 Neural NetworksNN 121 Perceptron: Applications The perceptron is used for classification: classify correctly a set of examples into one of the two classes C 1, C 2 : If the output of the perceptron is +1 then the input is assigned to class C 1 If the output is -1 then the input is assigned to C 2

22 Neural NetworksNN 122 Perceptron: Classification The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2 x2x2 C1C1 C2C2 x1x1 decision boundary w 1 x 1 + w 2 x 2 + b = 0 decision region for C1 w 1 x 1 + w 2 x 2 + b >= 0

23 Neural NetworksNN 123 Perceptron: Limitations The perceptron can only model linearly separable functions. The perceptron can be used to model the following Boolean functions: AND OR COMPLEMENT But it cannot model the XOR. Why?

24 Neural NetworksNN 124 Perceptron: Learning Algorithm Variables and parameters  (n) = input vector = [+1, x 1 (n), x 2 (n), …, x m (n)] T w(n) = weight vector = [b(n), w 1 (n), w 2 (n), …, w m (n)] T b(n) = bias y(n) = actual response d(n) = desired response  = learning rate parameter

25 Neural NetworksNN 125 The fixed-increment learning algorithm Initialization: set w(1) = 0 Activation: activate perceptron by applying input example (vector  (n) and desired response d(n) ) Compute actual response of perceptron: y(n) = sgn[w T (n)  (n)] Adapt weight vector: if d(n) and y(n) different then w(n + 1) = w(n) +  d(n)  (n) Where d(n) = +1 if  (n)  C 1 -1 if  (n)  C 2 Continuation: increment time step n by 1 and go to Activation step

26 Neural NetworksNN 126 Example Consider 2D training set C 1  C 2, where: C 1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C 2 = {(-1,-1), (-1,1), (0,1)} elements of class -1 Use the perceptron learning algorithm to classify these examples. w(1) = [1, 0, 0] T  = 1

27 Neural NetworksNN 127 Consider the augmented training set C’ 1  C’ 2, with first entry fixed to 1 (to deal with the bias as extra weight): (1, 1, 1), (1, 1, -1), (1, 0, -1) (1,-1, -1), (1,-1, 1), (1,0, 1) Replace  with -  for all   C 2 ’ and use the following simpler update rule: w(n) +  x(n)if w(n) x(n)  0 w(n+1) = w(n) otherwise Trick

28 Neural NetworksNN 128 Training set after application of trick: (1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-1,0, -1) Application of perceptron learning algorithm: Example End epoch 1

29 Neural NetworksNN 129 Example End epoch 2 At epoch 3 no updates are performed. (check!)  stop execution of algorithm. Final weight vector: (0, 2, -1).  decision hyperplane is 2x 1 - x 2 = 0.

30 Neural NetworksNN 130 Example ++ - - x1x1 x2x2 C2C2 C1C1 - + 1 1 1/2 Decision boundary: 2x 1 - x 2 = 0

31 Neural NetworksNN 131 Convergence of the learning algorithm Suppose datasets C 1, C 2 are linearly separable. The perceptron convergence algorithm converges after n 0 iterations, with n 0  n max on training set C 1  C 2. XOR is not l.s.

32 Neural NetworksNN 132 Convergence theorem (proof) Let w 0 be such that w 0 T x(n) > 0  x(n)  C 1. w 0 exists because C 1 and C 2 are linearly separable. Let  = min w 0 T x(n)x(n)  C 1. Then w 0 T w(k+1) = w 0 T x(1) + … + w 0 T x(k)  k  Cauchy-Schwarz inequality: ||w 0 || 2 ||w(k+1)|| 2  [w 0 T w(k+1)] 2 ||w(k+1)|| 2  (A) k 2  2 ||w 0 || 2

33 Neural NetworksNN 133 Now we consider another route: w(k+1) = w(k) + x(k) || w(k+1)|| 2 = || w(k)|| 2 + ||x(k)|| 2 + 2 w T (k)x(k) euclidean norm   0 because x(k) is misclassified  ||w(k+1)|| 2  ||w(k)|| 2 + ||x(k)|| 2 =0 ||w(2)|| 2  ||w(1)|| 2 + ||x(1)|| 2 ||w(3)|| 2  ||w(2)|| 2 + ||x(2)|| 2  ||w(k+1)|| 2  Convergence theorem (proof)

34 Neural NetworksNN 134 Let  = max ||x(n)|| 2 x(n)  C 1 ||w(k+1)|| 2  k  (B) For sufficiently large values of k: (B) becomes in conflict with (A). Then k cannot be greater than k max such that (A) and (B) are both satisfied with the equality sign. Perceptron convergence algorithm terminates in at most n max =iterations. convergence theorem (proof)  ||w 0 || 2  2

35 Neural NetworksNN 135 Adaline: Adaptive Linear Element Adaline: uses a linear neuron model and the Least-Mean- Square (LMS) learning algorithm The idea: try to minimize the square error, which is a function of the weights We can find the minimum of the error function E by means of the Steepest descent method

36 Neural NetworksNN 136 Steepest Descent Method start with an arbitrary point find a direction in which E is decreasing most rapidly make a small step in that direction  m w E w E E     ,, ))w( of(gradient 1 …

37 Neural NetworksNN 137 Least-Mean-Square algorithm (Widrow-Hoff algorithm) Approximation of gradient(E) Update rule for the weights becomes:

38 Neural NetworksNN 138 Summary of LMS algorithm Training sample: input signal vector  (n) desired response d(n) User selected parameter  >0 Initializationset ŵ(1) = 0 Computationfor n = 1, 2, … compute e(n) = d(n) - ŵ T (n)  (n) ŵ(n+1) = ŵ(n) +   (n)e(n)

39 Neural NetworksNN 139 Comparison LMS and Perceptron Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning. LMS:Linear. Model of a neuron Perceptron: Non linear. Hard-Limiter activation function. McCulloch-Pitts model. LMS:Continuous. Learning Process Perceptron: A finite number of iterations.


Download ppt "Neural NetworksNN 11 Neural netwoks thanks to: www.cs.vu.nl/~elena/slides Basics of neural network theory and practice for supervised and unsupervised."

Similar presentations


Ads by Google