Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models.

Similar presentations


Presentation on theme: "Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models."— Presentation transcript:

1 Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models

2 Robert J. Marks II Supervised Learning u Given: l Input (Stimulus)/Output (Response) Data u Object: l Train a machine to simulate the input/output relationship l Types u Classification (Discrete Outputs) u Regression (Continuous Outputs)

3 Robert J. Marks II Training a Classifier > classifier < Marks > classifier < not Marks > classifier < not Marks > classifier < not Marks > classifier < Marks > classifier < not Marks

4 Robert J. Marks II Recall from a Trained Classifier > Classifier > Marks Note: The test image does not appear in the training data. Learning = Memorization

5 Robert J. Marks II Classifier In Feature Space, After Training representation concept (truth) = training data = Marks = not Marks = test data (Marks)

6 Robert J. Marks II Supervised Regression (Interpolation) u Output data is continuous rather than discrete u Example - Load Forecasting l Training (from historical data): u Input: temperatures, current load, day of week, holiday(?), etc. u Output: next day’s load l Test u Input: forecasted temperatures, current load, day of week, holiday(?), etc. u Output: tomorrow’s load forecast

7 Robert J. Marks II Properties of Good Classifiers and Regression Machines u Good accuracy outside of training set u Explanation Facility l Generate rules after training u Fast training u Fast testing

8 Robert J. Marks II Some Classifiers and Regression Machines u Classification & Autoregression Trees (CART) u Nearest Neighbor Look-Up u Neural Networks l Layered Perceptron (or MLP’s) l Recurrent Perceptrons l Cascade Correlation Neural Networks l Radial Basis Function Neural Networks

9 Robert J. Marks II A Model of an Artificial Neuron w4w4 w3w3 w2w2 w5w5 w1w1 s1s1 s5s5 s4s4 s3s3 s2s2 s = state =  (sum)  (. ) = squashing function sum s n sum =  w n s n   ( sum )

10 Robert J. Marks II Squashing Functions sum  (sum) 1 sigmoid:  ( x ) = __________ 1 1 + e - x

11 Robert J. Marks II A Layered Perceptron interconnects neurons hidden layer output input

12 Robert J. Marks II Training u Given Training Data, l input vector set : { i n | 1 < n < N } l corresponding output (target) vector set: { t n | 1 < n < N } u Find the weights of the interconnects using training data to minimize error in the test data

13 Robert J. Marks II Error u Input, target & response l input vector set : { i n | 1 < n < N } l target vector set: { t n | 1 < n < N } l o n = neural network output when the input is i n. (Note: o n = t n ) u Error       o n - t n   n 1212

14 Robert J. Marks II Error Minimization Techniques u The error is a function of the l fixed training and test data l neural network weights u Find weights that minimize error (Standard Optimization) l conjugate gradient descent l random search l genetic algorithms l steepest descent (error backpropagation)

15 Robert J. Marks II Minimizing Error Using Steepest Descent u The main idea: Find the way downhill and take a step: E x minimum downhill = - _____ d E d x  = step size x x -   d E d x

16 Robert J. Marks II Example of Steepest Descent E ( x ) = _ x 2 ; minimum at x = 0 - ___ = - x x x -      x Solution to difference equation x p    x p-1 is x p    p  x 0. for |    | < 1, x 1   d E d x 1212 d E d x

17 Robert J. Marks II Training the Perceptron       o n - t n          w nk  i k -t n     i m   w mk  i k - t m   i j  o m - t m  1212 1212 2 n = 1 n=1 k=1 k=1 2 4 4 d  d w m j  o 1 o 2 i 1 i 2 i 3 i 4 w 11 w 24

18 Robert J. Marks II Weight Update   i j  o m - t m   for m = 4 and j = 2 w 24 w 24 -  i 4  o 2 - t 2  o 1 o 2 i 1 i 2 i 3 i 4 w 11 w 24 d  d w m j 

19 Robert J. Marks II No Hidden Alters = Linear Separation o =  (   w n i n ) For classifier, threshold: If o > ___, announce class #1 If o < ___, announce class #2 Classification boundary: o = ___, or  w n i n = 0. This is the equation of a plane ! 1212 1212 1212 n n o w1w1 w3w3 w2w2 i 1 i 2 i 3

20 Robert J. Marks II   w n i n = 0 = line through origin. n i 2 Classification Boundary i 1

21 Robert J. Marks II Adding Bias Term o w1w1 w2w2 w3w3 w4w4 i 1 i 2 i 3 1 Classification boundary still a line, but need not go through origin. i 2 i 1

22 Robert J. Marks II The Minsky-Papert Objection i 2 i 1 1 1 The simple operation of the exclusive or (XOR) cannot be resolved using a linear perceptron with bias.  More important problems can probably thus not be resolved with a linear perceptron with bias. ?

23 Robert J. Marks II The Layered Perceptron interconnect: weights = w jk ( l ) neurons: states = s j ( l ) hidden layer: l output: l = L input: l = 0

24 Robert J. Marks II Error Backpropagation d E d w jk ( l ) ______ = _____ ________ ________ Problem: For an arbitrary weight, w jk ( l ), update w jk ( l ) w jk ( l ) -  ______ A Solution:  Error Backpropagation  Chain rule for partial fractions d  d  d s j ( l ) d sum j ( l ) d w m j d s j ( l ) d sum j ( l ) d w m j

25 Robert J. Marks II Each Partial is Evaluated (Beautiful Math!!!) d s j ( l ) d 1 d sum j ( l ) d sum j ( l ) 1 + exp[ - sum j ( l ) ] = s j ( l ) [ 1 - s j ( l ) ] d sum j ( l ) d w m j d E d s j ( l ) ________ = _______ _________________ ________ = s ( l -1) =  j ( l ) =   n ( l +1) s n ( l +1) [ 1 - s n ( l +1) ] w nj ( l ) n m

26 Robert J. Marks II Weight Update d E d w jk ( l ) ______ = _____ ________ ________ w jk ( l ) w jk ( l ) -  ______ d  d  d s j ( l ) d sum j ( l ) d w m j d s j ( l ) d sum j ( l ) d w m j =  j ( l ) s j ( l +1) [ 1 - s j ( l +1) ] s k ( l -1)

27 Robert J. Marks II Step #1: Input Data & Feedforward s 1 (2) = o 1 s 2 (2) = o 2 s 1 (1) s 2 (1) s 3 (1) i 1 i 2 = s 2 (0) The states of all of the neurons are determined by the states of the neurons below them and the interconnect weights.

28 Robert J. Marks II Step #2: Evaluate output error, backpropagate to find  ’s for each neuron o 1, t 1 o 2, t 2  1 (2)  2 (2) s 1 (1) s 2 (1) s 3 (1)  1 (1)  2 (1)  3 (1) i 1 i 2 = s 2 (0)  1 (0)  2 (0) Each neuron now keeps track of two numbers. The  ’s for each neuron are determined by “back- propagating” the output error towards the input.

29 Robert J. Marks II Step #3: Update Weights o 1, t 1 o 2, t 2  1 (2)  2 (2) s 1 (1) s 2 (1) s 3 (1)  1 (1)  2 (1)  3 (1) i 1 i 2 = s 2 (0)  1 (0)  2 (0) w 32 (1) -   3 (1) s 3 (1) [ 1 - s 3 (1) ] s 2 (0) Weight updates are performed within the neural network architecture

30 Robert J. Marks II Neural Smithing u Bias u Momentum u Batch Training u Learning Versus Memorization u Cross Validation u The Curse of Dimensionality u Variations

31 Robert J. Marks II Bias u Bias is used with MLP l At input l Hidden layers (sometimes)

32 Robert J. Marks II Momentum u Steepest descent w jk ( l ) w jk ( l ) +   w jk ( l )  With Momentum,  w jk ( l ) = w jk ( l ) +   w jk ( l ) +  w jk ( l ) l New step effected by previous step l m is the iteration number l Convergence is improved mmm+1

33 Robert J. Marks II Back Propagation Batch Training u Accumulate error from all training data prior to weight update l True steepest descent l Update weights each epoch u Training Layered Perceptron One Data pair at a time l Randomize data to avoid structure l The Widrow-Hoff Algorithm

34 Robert J. Marks II Learning versus Memorization: Both have zero training error good generalization (learning) concept (truth) bad generalization (memorization) = training data = test data

35 Robert J. Marks II Alternate View: concept learning memorization (over fitting)

36 Robert J. Marks II Learning versus Memorization (cont.) u Successful Learning: l Recognizing data outside the training set, e.g. data in the test set. l i.e. the neural network must successfully classify (interpolate) inputs it has not seen before. u How can we assure learning? l Cross Validation l Choosing neural network structure u Pruning u Genetic Algorithms

37 Robert J. Marks II Cross Validation iterations (m) test error training error minimum

38 Robert J. Marks II The Curse of Dimensionality For many problems, the required number of training data increases to the power of the input’s dimension. Example: For N=2 inputs, suppose that 100 = 10 2 training data pairs For N=3 inputs, 10 3 = 1000 training data pairs are needed In general, 10 N training data pairs are needed for many important problems.

39 Robert J. Marks II Example: Classifying a circle in a square i1i1 i2i2 neural net o i 1 i 2 100 = 10 2 points are shown.

40 Robert J. Marks II Example: Classifying a sphere in a cube N=3 neural net o i 1 i 2 i 3 i3i3 i2i2 i1i1 10 layers each with 10 2 points = 10 3 points = 10 N points

41 Robert J. Marks II Variations u Architecture variation for MLP’s l Recurrent Neural Networks l Radial Basis Functions l Cascade Correlation l Fuzzy MLP’s u Training Algorithms

42 Robert J. Marks II Applications u Power Engineering u Finance u Bioengineering u Control u Industrial Applications u Politics

43 Robert J. Marks II Political Applications Robert Novak syndicated column Washington, February 18, 1996 UNDECIDED BOWLERS “President Clinton’s pollsters have identified the voters who will determine whether he will be elected to a second term: two-parent families whose members bowl for recreation.” “Using a technique they call the ‘neural network,’ Clinton advisors contend that these family bowlers are the quintessential undecided voters. Therefore, these are the people who must be targeted by the president.”

44 Robert J. Marks II “A footnote: Two decades ago, Illinois Democratic Gov. Dan Walker campaigned heavily in bowling alleys in the belief he would find swing voters there. Walker had national political ambitions but ended up in federal prison.” Robert Novak syndicated column Washington, February 18, 1996 (continued)

45 Robert J. Marks II Finis


Download ppt "Robert J. Marks II CIA Lab Baylor University School of Engineering CiaLab.org Artificial Neural Networks: Supervised Models."

Similar presentations


Ads by Google