CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions assume that you don't threshold the shapes.ppt image. This week: This week: Monday: Neural networks Monday: Neural networks Tuesday: SVM Introduction and derivation Tuesday: SVM Introduction and derivation Thursday: Project info, SVM demo Thursday: Project info, SVM demo Friday: SVM lab Friday: SVM lab

Perceptron training Each misclassified sample is used to change the weight “a little bit” so that the classification is better the next time. Each misclassified sample is used to change the weight “a little bit” so that the classification is better the next time. Consider inputs in form x = [x 1, x 2, … x n ] Consider inputs in form x = [x 1, x 2, … x n ] Target label is y = {+1,-1} Target label is y = {+1,-1} Algorithm (Hebbian Learning) Randomize weights Randomize weights Loop until converge Loop until converge If wx + b > 0 and y is -1: If wx + b > 0 and y is -1: w i -=  *x i for all i w i -=  *x i for all i b -=  y b -=  y else if wx + b < 0 and y is +1: else if wx + b < 0 and y is +1: w i +=  *x i for all i w i +=  *x i for all i b +=  y b +=  y Else (it’s classified correctly, do nothing) Else (it’s classified correctly, do nothing)  is the learning rate (a parameter that can be tuned).  is the learning rate (a parameter that can be tuned).

Multilayer feedforward neural nets Many perceptrons Many perceptrons Organized into layers Organized into layers Input (sensory) layer Hidden layer(s): 2 proven sufficient to model any arbitrary function Output (classification) layer Powerful! Powerful! Calculates functions of input, maps to output layers Calculates functions of input, maps to output layers Example Example x1x1 x2x2 x3x3 y1y1 Sensory (HSV) Hidden (functions) Classification (apple/orange/banana) y2y2 y3y3 Q4

Backpropagation algorithm Initialize all weights randomly For each labeled example: For each labeled example: Calculate output using current network Update weights across network, from output to input, using Hebbian learning Iterate until convergence Iterate until convergence Epsilon decreases at every iteration Matlab does this for you. Matlab does this for you. matlabNeuralNetDemo.m matlabNeuralNetDemo.m x1x1 x2x2 x3x3 y1y1 y2y2 y3y3 a. Calculate output (feedforward) b. Update weights (feedback) R peat Q5

Parameters Most networks are reasonably robust with respect to learning rate and how weights are initialized Most networks are reasonably robust with respect to learning rate and how weights are initialized However, figuring out how to However, figuring out how to normalize your input normalize your input determine the architecture of your net determine the architecture of your net is a black art. You might need to experiment. One hint: is a black art. You might need to experiment. One hint: Re-run network with different initial weights and different architectures, and test performance each time on a validation set. Pick best. Re-run network with different initial weights and different architectures, and test performance each time on a validation set. Pick best.

References This is just the tip of the iceberg! See: This is just the tip of the iceberg! See: Sonka, pp. 404-407 Sonka, pp. 404-407 Laurene Fausett. Fundamentals of Neural Networks. Prentice Hall, 1994. Laurene Fausett. Fundamentals of Neural Networks. Prentice Hall, 1994. Approachable for beginner. Approachable for beginner. C.M. Bishop. Neural Networks for Pattern Classification. Oxford University Press, 1995. C.M. Bishop. Neural Networks for Pattern Classification. Oxford University Press, 1995. Technical reference focused on the art of constructing networks (learning rate, # of hidden layers, etc.) Technical reference focused on the art of constructing networks (learning rate, # of hidden layers, etc.) Matlab neural net help Matlab neural net help

Support Vector Machines How are they similar to neural nets? How are they similar to neural nets? How are they different? How are they different?

SVMs: “Best” decision boundary Consider a 2- class problem Consider a 2- class problem Start by assuming each class is linearly separable Start by assuming each class is linearly separable There are many separating hyperplanes… There are many separating hyperplanes… Which would you choose? Which would you choose?

SVMs: “Best” decision boundary The “best” hyperplane is the one that maximizes the margin, , between the classes. The “best” hyperplane is the one that maximizes the margin, , between the classes. Some training points will always lie on the margin Some training points will always lie on the margin These are called “support vectors” #2,4,9 to the left Why does this name make sense intuitively? Why does this name make sense intuitively? margin  Q1

Support vectors The support vectors are the toughest to classify The support vectors are the toughest to classify What would happen to the decision boundary if we moved one of them, say #4? What would happen to the decision boundary if we moved one of them, say #4? A different margin would have maximal width! A different margin would have maximal width! Q2

Problem Maximize the margin width Maximize the margin width while classifying all the data points correctly… while classifying all the data points correctly…

Mathematical formulation of the hyperplane On paper On paper Key ideas: Key ideas: Optimum separating hyperplane: Optimum separating hyperplane: Distance to margin: Distance to margin: Can show the margin width = Can show the margin width = Want to maximize margin Want to maximize margin Q3-4

Finding the optimal hyperplane We need to find w and b that satisfy the system of inequalities: We need to find w and b that satisfy the system of inequalities: where w minimizes the cost function: where w minimizes the cost function: (Recall that we want to minimize ||w 0 ||, which is equivalent to minimizing ||w o || 2 =w T w) (Recall that we want to minimize ||w 0 ||, which is equivalent to minimizing ||w o || 2 =w T w) Quadratic programming problem Quadratic programming problem Use Lagrange multipliers Use Lagrange multipliers Switch to the dual of the problem Switch to the dual of the problem

Non-separable data Allow data points to be misclassifed Allow data points to be misclassifed But assign a cost to each misclassified point. But assign a cost to each misclassified point. The cost is bounded by the parameter C (which you can set) The cost is bounded by the parameter C (which you can set) You can set different bounds for each class. Why? You can set different bounds for each class. Why? Can weigh false positives and false negatives differently

Can we do better? Cover’s Theorem from information theory says that we can map nonseparable data in the input space to a feature space where the data is separable, with high probability, if: Cover’s Theorem from information theory says that we can map nonseparable data in the input space to a feature space where the data is separable, with high probability, if: The mapping is nonlinear The mapping is nonlinear The feature space has a higher dimension The feature space has a higher dimension The mapping is called a kernel function. The mapping is called a kernel function. Lots of math would follow here Lots of math would follow here

Most common kernel functions Polynomial Polynomial Gaussian Radial-basis function (RBF) Gaussian Radial-basis function (RBF) Two-layer perceptron Two-layer perceptron You choose p, , or  i You choose p, , or  i My experience with real data: use Gaussian RBF! My experience with real data: use Gaussian RBF! EasyDifficulty of problemHard p=1, p=2, higher pRBF Q5

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Similar presentations

Presentation on theme: "CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Similar presentations

Presentation on theme: "CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions."— Presentation transcript:

Similar presentations

About project

Feedback