CS 551/651 Search and “Through the Lens” Lecture 13 Search and “Through the Lens” Lecture 13.

CS 551/651 Search and “Through the Lens” Lecture 13 Search and “Through the Lens” Lecture 13

Assign 1 Grading Sign up for a slot to demo to TA Sunday upon return from breakSunday upon return from break Monday upon return from breakMonday upon return from break Sign up for a slot to demo to TA Sunday upon return from breakSunday upon return from break Monday upon return from breakMonday upon return from break

Papers to read during break Spacetime ConstraintsSpacetime Constraints Evolved Virtual CreaturesEvolved Virtual Creatures NeuroanimatorNeuroanimator Spacetime ConstraintsSpacetime Constraints Evolved Virtual CreaturesEvolved Virtual Creatures NeuroanimatorNeuroanimator

Single-layer networks Training Training samples are used to tune the network weightsTraining samples are used to tune the network weights –Input / output pairs Network generates an output based on input (and weights)Network generates an output based on input (and weights) Network’s output is compared to correct outputNetwork’s output is compared to correct output Error in output is used to adapt the weightsError in output is used to adapt the weights Repeat process to minimize errorsRepeat process to minimize errorsTraining Training samples are used to tune the network weightsTraining samples are used to tune the network weights –Input / output pairs Network generates an output based on input (and weights)Network generates an output based on input (and weights) Network’s output is compared to correct outputNetwork’s output is compared to correct output Error in output is used to adapt the weightsError in output is used to adapt the weights Repeat process to minimize errorsRepeat process to minimize errors

Consider error in single-layer neural networks Sum of squared errors (across training data) For one sample: How can we minimize the error? Set derivative equal to zero (like in Calc 101)Set derivative equal to zero (like in Calc 101) –Solve for weights that make derivative == 0 Is that error affected by each of the weights in the weight vector?Is that error affected by each of the weights in the weight vector? Sum of squared errors (across training data) For one sample: How can we minimize the error? Set derivative equal to zero (like in Calc 101)Set derivative equal to zero (like in Calc 101) –Solve for weights that make derivative == 0 Is that error affected by each of the weights in the weight vector?Is that error affected by each of the weights in the weight vector?

Minimizing the error What is the derivative? The gradient,The gradient, –Composed of What is the derivative? The gradient,The gradient, –Composed of

Computing the partial Remember the Chain Rule:Remember the Chain Rule: For a network, h w, with inputs x and correct output yFor a network, h w, with inputs x and correct output y

Computing the partial g ( ) = the activation function

Computing the partial g’() = derivative of the activation function Chain rule again

Minimizing the error Gradient descent Learning rate

Why are modification rules more complicated in multilayer? We can calculate the error of the output neuron by comparing to training data We could use previous update rule to adjust W 3,5 and W 4,5 to correct that errorWe could use previous update rule to adjust W 3,5 and W 4,5 to correct that error But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust?But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust? We can calculate the error of the output neuron by comparing to training data We could use previous update rule to adjust W 3,5 and W 4,5 to correct that errorWe could use previous update rule to adjust W 3,5 and W 4,5 to correct that error But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust?But how do W 1,3 W 1,4 W 2,3 W 2,4 adjust?

Backprop at the output layer Output layer error is computed as in single-layer and weights are updated in same fashion Let Err i be the i th component of the error vector y – h WLet Err i be the i th component of the error vector y – h W –Let Output layer error is computed as in single-layer and weights are updated in same fashion Let Err i be the i th component of the error vector y – h WLet Err i be the i th component of the error vector y – h W –Let

Backprop in the hidden layer Each hidden node is responsible for some fraction of the error  i in each of the output nodes to which it is connected  i is divided among all hidden nodes that connect to output i according to their strengths  i is divided among all hidden nodes that connect to output i according to their strengths Error at hidden node j: Each hidden node is responsible for some fraction of the error  i in each of the output nodes to which it is connected  i is divided among all hidden nodes that connect to output i according to their strengths  i is divided among all hidden nodes that connect to output i according to their strengths Error at hidden node j:

Backprop in the hidden layer Error is: Correction is: Error is: Correction is:

Summary of backprop 1.Compute the  value for the output units using the observed error 2.Starting with the output layer, repeat the following for each layer until done Propagate  value back to previous layerPropagate  value back to previous layer Update the weights between the two layersUpdate the weights between the two layers 1.Compute the  value for the output units using the observed error 2.Starting with the output layer, repeat the following for each layer until done Propagate  value back to previous layerPropagate  value back to previous layer Update the weights between the two layersUpdate the weights between the two layers

Some general artificial neural network (ANN) info The entire network is a function g( inputs ) = outputsThe entire network is a function g( inputs ) = outputs –These functions frequently have sigmoids in them –These functions are frequently differentiable –These functions have coefficients (weights) Backpropagation networks are simply ways to tune the coefficients of a function so it produces desired outputBackpropagation networks are simply ways to tune the coefficients of a function so it produces desired output The entire network is a function g( inputs ) = outputsThe entire network is a function g( inputs ) = outputs –These functions frequently have sigmoids in them –These functions are frequently differentiable –These functions have coefficients (weights) Backpropagation networks are simply ways to tune the coefficients of a function so it produces desired outputBackpropagation networks are simply ways to tune the coefficients of a function so it produces desired output

Function approximation Consider fitting a line to data Coefficients: slope and y-interceptCoefficients: slope and y-intercept Training data: some samplesTraining data: some samples Use least-squares fitUse least-squares fit This is what an ANN does Consider fitting a line to data Coefficients: slope and y-interceptCoefficients: slope and y-intercept Training data: some samplesTraining data: some samples Use least-squares fitUse least-squares fit This is what an ANN does x y

Function approximation A function of two inputs… Fit a smooth curve to the available dataFit a smooth curve to the available data –Quadratic –Cubic –n th -order –ANN! A function of two inputs… Fit a smooth curve to the available dataFit a smooth curve to the available data –Quadratic –Cubic –n th -order –ANN!

Curve fitting A neural network should be able to generate the input/output pairs from the training dataA neural network should be able to generate the input/output pairs from the training data You’d like for it to be smooth (and well-behaved) in the voids between the training dataYou’d like for it to be smooth (and well-behaved) in the voids between the training data There are risks of over fitting the dataThere are risks of over fitting the data A neural network should be able to generate the input/output pairs from the training dataA neural network should be able to generate the input/output pairs from the training data You’d like for it to be smooth (and well-behaved) in the voids between the training dataYou’d like for it to be smooth (and well-behaved) in the voids between the training data There are risks of over fitting the dataThere are risks of over fitting the data

When using ANNs Sometimes the output layer feeds back into the input layer – recurrent neural networksSometimes the output layer feeds back into the input layer – recurrent neural networks The backpropagation will tune the weightsThe backpropagation will tune the weights You determine the topologyYou determine the topology –Different topologies have different training outcomes (consider overfitting) –Sometimes a genetic algorithm is used to explore the space of neural network topologies Sometimes the output layer feeds back into the input layer – recurrent neural networksSometimes the output layer feeds back into the input layer – recurrent neural networks The backpropagation will tune the weightsThe backpropagation will tune the weights You determine the topologyYou determine the topology –Different topologies have different training outcomes (consider overfitting) –Sometimes a genetic algorithm is used to explore the space of neural network topologies

Through The Lens Camera Control

Controlling virtual camera

Lagrange multipliers Lagrange Multipliers without Permanent Scarring Dan Klein – www.cs.berkeley.edy/~klein (now at Stanford)Dan Klein – www.cs.berkeley.edy/~klein (now at Stanford) Lagrange Multipliers without Permanent Scarring Dan Klein – www.cs.berkeley.edy/~klein (now at Stanford)Dan Klein – www.cs.berkeley.edy/~klein (now at Stanford)

More complicated example Maximize parabaloid subject to a unit circle Any solution to the maximization problem must sit on x 2 + y 2 = 1Any solution to the maximization problem must sit on x 2 + y 2 = 1 Maximize parabaloid subject to a unit circle Any solution to the maximization problem must sit on x 2 + y 2 = 1Any solution to the maximization problem must sit on x 2 + y 2 = 1

The central theme of Lagrange Multipliers At the solution points, the isocurve (a.k.a. level curve or contour) of the function to be maximized is tangent to the constraint curve

Tangent Curves Tangent curves == parallel normals Create the LagrangianCreate the Lagrangian Solve for where gradient = 0 to capture parallel normals and g(x) must equal 0Solve for where gradient = 0 to capture parallel normals and g(x) must equal 0 Tangent curves == parallel normals Create the LagrangianCreate the Lagrangian Solve for where gradient = 0 to capture parallel normals and g(x) must equal 0Solve for where gradient = 0 to capture parallel normals and g(x) must equal 0

Go to board for more development

CS 551/651 Search and “Through the Lens” Lecture 13 Search and “Through the Lens” Lecture 13.

Similar presentations

Presentation on theme: "CS 551/651 Search and “Through the Lens” Lecture 13 Search and “Through the Lens” Lecture 13."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 551/651 Search and “Through the Lens” Lecture 13 Search and “Through the Lens” Lecture 13.

Similar presentations

Presentation on theme: "CS 551/651 Search and “Through the Lens” Lecture 13 Search and “Through the Lens” Lecture 13."— Presentation transcript:

Similar presentations

About project

Feedback