Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.

Similar presentations


Presentation on theme: "Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single."— Presentation transcript:

1 Supervised learning network G.Anuradha

2 Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single discrete Perceptron training algorithm(SDPTA) Single Continuous perceptron training algorithm(SCPTA) Linearly separable and inseparable pattern classification

3 Architecture Earlier attempts to build intelligent and self learning systems using simple components Used to solve simple classification problems Used by Rosenblatt to explain the pattern- recognition abilities of biological visual systems. Sensory UnitAssociator UnitResponse Unit Binary activation functionActivation +1 0 -1

4

5 Quiz Which of the features would probably not be useful for classifying handwritten digits from binary images?  Raw pixels from images  Set of strokes that can be combined to form various digits  Day of the year on which the digits were drawn  Number of pixels set to one

6 Perceptron networks-Theory single-layer feed forward networks 1.It has 3 units:, 1.input(sensory), 2.hidden(associator unit) 3.Output (response unit) 2.Input-hidden fixed weights -1,0,1 assigned at random, binary activation fn: 3.Output unit (1,0,-1) activation, binary step fn: with threshold θ 4.Output of perceptron is

7 Perceptron theory 5.Weight updation between hidden and output unit 6.Checks out for error between hidden and output layer 7.Error=target-calculated 8.weights are adjusted in case of error α is the learning rate, ‘t’ is the target which is -1 or 1. No error-no weight change-training is stopped

8 Single classification perceptron network x1x1 xnxn X1X1 XnXn w1w1 wnwn Y y 1 b x0x0 xixi XiXi wiwi

9 Perceptron training algo for single output classes Step 0: initialize weights,bias,learning rate(between 0 and1) Step 1: perform step 2-6 until final stopping condition is false Step 2: perform steps 3-5 for each training pair indicated by s:t Step 3: input layer is applied with identity activation fn: –xi=si Step 4: calculate y in y=f(yin)

10 Perceptron training algo for single output classes Step 5: Weight and bias adjustment: Compare the value of actual and desired(target) If y≠t else w i (new)=w i (old) b(new)=b(old) Step 6: train the network until there is no weight change. This is the stopping condition for the network. If not met start from Step n2 EXAMPLE

11 Start Initialize weights and bias Set α (0 to 1) For each s:t Activate input units Xi=si Calculate net input Apply activation function y=f(yin) If y!=t W(new)=w(old) B(new)=b(old) If weight change s Stop Y

12 Perceptron training algo for multiple output classes Step 0: Initialize the weights, biases, and learning rate suitably Step 1: Check for stopping condition; if false then perform steps 2-6 Step 2: Perform steps 3 to 5 for each bipolar or binary training vector pair s:t Step 3: Set activation(identity) a each input unit i=1 to n xi=si

13 Perceptron training algo for multiple output classes Step 4: calculate output response Activations are applied over the net input to calculate the output response

14 Perceptron training algo for multiple output classes Step 5: Make adjustment in weights and bias for j=1 to m and i=1 to n If ti≠yj then else Step 6: Check for stopping condition. No change in weights then stop training process

15 Example of AND

16 Linear separability Perceptron network is used for linear separability concept. Separating line is based of threshold θ The condition for separating the response from region of positive to region of zero is w1x1+w2x2+b> θ The condition for separating the response from region of zero to region of negative is w1x1+w2x2+b<- θ

17 What binary threshold neurons cannot do A binary threshold output unit cannot even tell if two single bit features are the same! Positive cases (same): (1,1)  1; (0,0)  1 Negative cases (different): (1,0)  0; (0,1)  0 The four input-output pairs give four inequalities that are impossible to satisfy:

18 A geometric view of what binary threshold neurons cannot do Imagine “data-space” in which the axes correspond to components of an input vector. –Each input vector is a point in this space. –A weight vector defines a plane in data-space. –The weight plane is perpendicular to the weight vector and misses the origin by a distance equal to the threshold. 0,1 0,0 1,0 1,1 weight plane output =1 output =0 The positive and negative cases cannot be separated by a plane

19 Discriminating simple patterns under translation with wrap-around Suppose we just use pixels as the features. Can a binary threshold unit discriminate between different patterns that have the same number of on pixels? –Not if the patterns can translate with wrap-around! pattern A pattern B

20 Learning with hidden units For such linear separability problem we require an additional layer called as hidden layer. Networks without hidden units are very limited in the input-output mappings they can learn to model.. We need multiple layers of adaptive, non-linear hidden units.

21 Solution to EXOR problem

22 ADALINE A network with a single linear unit is called an ADALINE (ADAptive LINear Neuron) Input-output relationship is linear Uses bipolar activation for its input signals and its target output Weights between the input and output are adjustable and has only one output unit Trained using Delta rule (Least mean square) or (Widrow-Hoff rule)

23 Architecture Delta rule for Single output unit –Minimize the error over all training patterns. –Done by reducing the error for each pattern one at a time Delta rule for adjusting the weight for ith pattern is (i=1to n) Delta rule in case of several output units for adjusting the weight from ith input unit to jth output unit

24 Difference between Perceptron and Delta Rule PerceptronDelta Originates from hebbian assumption Derived from gradient- descent method Stops after a finite number of learning steps Continuous forever converging asymptotically to the solution Minimizes error over all training patterns

25 Architecture 1 X1 X2 Xn f(yin) Adaptive algorithm O/p error generator x0=1 x1 x2 xn b w1 w2 wn yin= x1wi yin t e=t-yin

26 Adaline Training Algorithm Step 0: Weights and bias are set to some random values other than zero. Learning rate α is set Step 1: Perform Steps 2-6 when stopping condition is false. Step 2: Perform steps 3-5 for each bipolar training pair s:t Step 3: Set activations for input units i=1 to n xi=si Step 4: Calculate the net input to the output unit

27 Adaline Training Algorithm Contd. Step 5: Update the weights and bias for i=1 to n Step 6: If highest weight change that occurred during training is smaller than a specified tolerance then stop the training else continue. (Stopping condition)

28 Testing Algorithm Step 0: Initialize the weights(from training algo) Step 1: Perform steps2-4 for each bipolar input vector x Step 2: Set the activations of the input units to x Step 3: Calculate the net input

29 Start Initialize weights and bias and α Input the specified tolerance error Es For each s:t Activate input units Xi=si Calculate net input Yin=b+Σxi wi Calculate error Ei=Σ(t-yin) 2 If Ei=Es Stop Y Y

30 Madaline Two or more adaline are integrated to develop madaline model Used for nonlinearly separable logic functions (EX-OR) function Used for adaptive noise cancellation and adaptive inverse control In noise cancellation the objective is to filter out an interference component by identifying a linear model of a measurable noise source and the corresponding immeasurable interference. ECG, echo elimination from long distance telephone transmission lines


Download ppt "Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single."

Similar presentations


Ads by Google