Lecture 3 Review of Linear Algebra Simple least-squares.

Lecture 3 Review of Linear Algebra Simple least-squares

9 things you need to remember from Linear Algebra

Number 1 rule for vector and matrix multiplication u = Mv u i =  k=1 N M ik v k P = QR P ij =  k=1 N Q ik R kj Sum over nearest neighbor indices Name of index in sum irrelevant. You can call it anything (as long as you’re consistent)

Number 2 transpostion rows become columns and columns become rows (A T ) ij = A ji and rule for transposition of products (AB) T = B T A T Note reversal of order

Number 3 rule for dot product a  b = a T b =  i=1 N a i b i note a  a is sum of squared elements of a “the length of a”

Number 4 the inverse of a matrix A -1 A = I A A -1 = I (exists only when A is square) I is the identity matrix 1 0 0 0 1 0 0 0 1

Number 5 solving y=Mx using the inverse x = M -1 y

Number 6 multiplication by identity matrix M = IM = MI in component notation I ij =  ij  k=1 N  ik M kj = M ij  k=1 N  ik M kj = M ij Just a name … Cross out sum Cross out  ik And change k to i in rest of equation

Number 7 inverse of a 2  2 matrix abcdabcd A = d-b -ca A -1 = 1 ad-bc

Number 8 inverse of a diagonal matrix a 0 0 … 0 0 b 0 … 0 0 0 c … 0... 0 0 0 …z A = A -1 = 1/a 0 0 … 0 0 1/b 0 … 0 0 0 1/c … 0... 0 0 0 …1/z

Number 9 rule for taking a derivative use component-notation treat every element as a independent variable remember that since elements are independent dx i / dx j =  ij = identity matrix

Example: Suppose y = Ax How does y i vary as we change x j ? (That’s the meaning of the derivative dy i /dx j ) first write i-th component of y, y i =  k=1 N A ik x k (d/dx j ) y i = (d/dx j )  k=1 N A ik x k =  k=1 N A ik dx k /dx j =  k=1 N A ik  kj = A ij We’re using I and j, so use a different letter, say k, in the summation! So the derivative dy i /dx j is just A ij. This is analogous to the case for scalars, where the derivative dy/dx of the scalar expression y=ax is just dy/dx=a.

best fitting line the combination of a pre and b pre that have the smallest sum-of-squared-errors find it by exhaustive search ‘grid search’

Fitting line to noisy data y obs = a + bx Observations: the vector, y obs

Guess values for a, b y pre = a guess + b guess x a guess =2.0 b guess =2.4 Prediction error = observed minus predicted e = y obs - y pre Total error: sum of squared predictions errors E = Σ e i 2 = e T e

Systematically examine combinations of (a, b) on a 101  101 grid Error Surface Minimum total error E is here Note E is not zero b pre a pre

Error Surface Note E min is not zero Here are best- fitting a, b best-fitting line

Note some range of values where the error is about the same as the minimun value, E min Error Surface E min is here Error pretty close to E min everywhere in here All a’s in this range and b’s in this range have pretty much the same error

moral the shape of the error surface controls the accuracy by which (a,b) can be estimated

What controls the shape of the error surface? Let’s examine effect of increasing the error in the data

Error in data = 0.5 Error in data = 5.0 E min = 0.20 E min = 23.5 The minimum error increases, but the shame of the error surface is pretty much the same

What controls the shape of the error surface? Let’s examine effect of shifting the x-position of the data

010 5 Big change by simply shifting x-values of the data Region of low error is now tilted High b low a has low error Low b high a has low error But (high b, high a) and (low a, low b) have high error

Meaning of tilted region of low error error in (a pre, b pre ) are correlated

Best-fit line Best fit intercept erroneous intercept When the data straddle the origin, if you tweak the intercept up, you can’t compensate by changing the slope Best-fit line Uncorrelated estimates of intercept and slope

Best-fit line Best fit intercept Low slope line erroneous intercept When the data are all to the right of the origin, if you tweak the intercept up, you must lower the slope to compensate Same slope s Best-fit line Negatively correlation of intercept and slope

Best-fit line Best fit intercept erroneous intercept When the data are all to the right of the origin, if you tweak the intercept up, you must raise the slope to compensate Same slope as best-fit line Positive correlation of intercept and slope Best fit intercept

data near origin possibly good control on intercept but lousy control on slope -5 0 5 small big

data far from origin lousy control on intercept but possibly good control on slope small big 0 50 100

Set up for standard Least Squares y i = a + b x i y 1 1 x 1 a y 2 = 1 x 2 b … … … y N 1 x N d = G m

Standard Least-squares Solution m est = [G T G] -1 G T d

Derivation: use fact that minimum is at dE/dm i = 0 E =  k e k e k =  k (d k -  p G kp m p ) (d k -  q G kq m q ) =  k d k d k - 2  k d k  p G kp m p +  k  p G kp m p  q G kq m q dE/dm i = 0 - 2  k d k  p G kp (dm p /dm i ) +  k  p G kp (dm p/ dm i )  q G kq m q +  k  p G kp m p  q G kq (dm q /dm i ) = -2  k d k  p G kp  pi +  k  p G kp  pi  q G kq m q +  k  p G kp m p  q G kq  qi = -2  k d k G ki +  k G ki  q G kq m q +  k  p G kp m p G ki  2  k G ki d k + 2  q [  k G ki G kq ]m q = 0 or 2G T d + 2[G T G]m = 0 or m=[G T G] -1 G Td y

Why least-squares? Why not least-absolute length? Or something else?

Least-SquaresLeast Absolute Value a=1.00 b=2.02 a=0.94 b = 2.02

Lecture 3 Review of Linear Algebra Simple least-squares.

Similar presentations

Presentation on theme: "Lecture 3 Review of Linear Algebra Simple least-squares."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 3 Review of Linear Algebra Simple least-squares.

Similar presentations

Presentation on theme: "Lecture 3 Review of Linear Algebra Simple least-squares."— Presentation transcript:

Similar presentations

About project

Feedback