Non-Bayes classifiers. Linear discriminants, neural networks.

Discriminant functions(1) Bayes classification rule: Instead might try to find a function: is called discriminant function. - decision surface

Discriminant functions (2) Class 1 Class 2 Class 1 Class 2 Decision surface is a hyperplane Linear discriminant function:

Linear discriminant – perceptron cost function Replace Thus now decision function is and decision surface is Perceptron cost function: where

Linear discriminant – perceptron cost function Perceptron cost function: Class 1 Class 2 Value of is proportional to the sum of distances of all misclassified samples to the decision surface. If discriminant function separates classes perfectly, then Otherwise, and we want to minimize it. is continuous and piecewise linear. So we might try to use gradient descent algorithm.

Linear discriminant – Perceptron algorithm Gradient descent: At points where is differentiable Thus Perceptron algorithm converges when classes are linearly separable with some conditions on

Sum of error squares estimation Want to find discriminant function whose output is similar to Let denote as desired output function, 1 for one class and –1 for the other. Use sum of error squares as similarity criterion:

Sum of error squares estimation Minimize mean square error: Thus

Neurons

Artificial neuron.  f Above figure represent artificial neuron calculating:

Artificial neuron. Threshold functions f: 0 1 0 1 Step functionLogistic function

Combining artificial neurons Multilayer perceptron with 3 layers.

Discriminating ability of multilayer perceptron Since 3-layer perceptron can approximate any smooth function, it can approximate - optimal discriminant function of two classes.

Training of multilayer perceptron f f f Layer r-1 Layer r f f f

Training and cost function Desired network output: Trained network output: Cost function for one training sample: Total cost function: Goal of the training: find values of which minimize cost function.

Gradient descent Denote: Gradient descent: Since, we might want to update weights after processing each training sample separately:

Gradient descent Chain rule for differentiating composite functions: Denote:

Backpropagation If r=L, then If r<L, then

Backpropagation algorithm Initialization: initialize all weights with random values. Forward computations: for each training vector x(i) compute all Backward computations: for each i, j and r=L, L-1,…,2 compute Update weights:

MLP issues What is the best network configuration? How to choose proper learning parameter ? When training should be stopped? Choose another threshold function f or cost function J?

Non-Bayes classifiers. Linear discriminants, neural networks.

Similar presentations

Presentation on theme: "Non-Bayes classifiers. Linear discriminants, neural networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Non-Bayes classifiers. Linear discriminants, neural networks.

Similar presentations

Presentation on theme: "Non-Bayes classifiers. Linear discriminants, neural networks."— Presentation transcript:

Similar presentations

About project

Feedback