Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 Review of Linear Algebra Simple least-squares.

Similar presentations


Presentation on theme: "Lecture 3 Review of Linear Algebra Simple least-squares."— Presentation transcript:

1 Lecture 3 Review of Linear Algebra Simple least-squares

2 9 things you need to remember from Linear Algebra

3 Number 1 rule for vector and matrix multiplication u = Mv u i =  k=1 N M ik v k P = QR P ij =  k=1 N Q ik R kj Sum over nearest neighbor indices Name of index in sum irrelevant. You can call it anything (as long as you’re consistent)

4 Number 2 transpostion rows become columns and columns become rows (A T ) ij = A ji and rule for transposition of products (AB) T = B T A T Note reversal of order

5 Number 3 rule for dot product a  b = a T b =  i=1 N a i b i note a  a is sum of squared elements of a “the length of a”

6 Number 4 the inverse of a matrix A -1 A = I A A -1 = I (exists only when A is square) I is the identity matrix 1 0 0 0 1 0 0 0 1

7 Number 5 solving y=Mx using the inverse x = M -1 y

8 Number 6 multiplication by identity matrix M = IM = MI in component notation I ij =  ij  k=1 N  ik M kj = M ij  k=1 N  ik M kj = M ij Just a name … Cross out sum Cross out  ik And change k to i in rest of equation

9 Number 7 inverse of a 2  2 matrix abcdabcd A = d-b -ca A -1 = 1 ad-bc

10 Number 8 inverse of a diagonal matrix a 0 0 … 0 0 b 0 … 0 0 0 c … 0... 0 0 0 …z A = A -1 = 1/a 0 0 … 0 0 1/b 0 … 0 0 0 1/c … 0... 0 0 0 …1/z

11 Number 9 rule for taking a derivative use component-notation treat every element as a independent variable remember that since elements are independent dx i / dx j =  ij = identity matrix

12 Example: Suppose y = Ax How does y i vary as we change x j ? (That’s the meaning of the derivative dy i /dx j ) first write i-th component of y, y i =  k=1 N A ik x k (d/dx j ) y i = (d/dx j )  k=1 N A ik x k =  k=1 N A ik dx k /dx j =  k=1 N A ik  kj = A ij We’re using I and j, so use a different letter, say k, in the summation! So the derivative dy i /dx j is just A ij. This is analogous to the case for scalars, where the derivative dy/dx of the scalar expression y=ax is just dy/dx=a.

13 best fitting line the combination of a pre and b pre that have the smallest sum-of-squared-errors find it by exhaustive search ‘grid search’

14 Fitting line to noisy data y obs = a + bx Observations: the vector, y obs

15 Guess values for a, b y pre = a guess + b guess x a guess =2.0 b guess =2.4 Prediction error = observed minus predicted e = y obs - y pre Total error: sum of squared predictions errors E = Σ e i 2 = e T e

16 Systematically examine combinations of (a, b) on a 101  101 grid Error Surface Minimum total error E is here Note E is not zero b pre a pre

17 Error Surface Note E min is not zero Here are best- fitting a, b best-fitting line

18 Note some range of values where the error is about the same as the minimun value, E min Error Surface E min is here Error pretty close to E min everywhere in here All a’s in this range and b’s in this range have pretty much the same error

19 moral the shape of the error surface controls the accuracy by which (a,b) can be estimated

20 What controls the shape of the error surface? Let’s examine effect of increasing the error in the data

21 Error in data = 0.5 Error in data = 5.0 E min = 0.20 E min = 23.5 The minimum error increases, but the shame of the error surface is pretty much the same

22 What controls the shape of the error surface? Let’s examine effect of shifting the x-position of the data

23 010 5 Big change by simply shifting x-values of the data Region of low error is now tilted High b low a has low error Low b high a has low error But (high b, high a) and (low a, low b) have high error

24 Meaning of tilted region of low error error in (a pre, b pre ) are correlated

25 Best-fit line Best fit intercept erroneous intercept When the data straddle the origin, if you tweak the intercept up, you can’t compensate by changing the slope Best-fit line Uncorrelated estimates of intercept and slope

26 Best-fit line Best fit intercept Low slope line erroneous intercept When the data are all to the right of the origin, if you tweak the intercept up, you must lower the slope to compensate Same slope s Best-fit line Negatively correlation of intercept and slope

27 Best-fit line Best fit intercept erroneous intercept When the data are all to the right of the origin, if you tweak the intercept up, you must raise the slope to compensate Same slope as best-fit line Positive correlation of intercept and slope Best fit intercept

28 data near origin possibly good control on intercept but lousy control on slope -5 0 5 small big

29 data far from origin lousy control on intercept but possibly good control on slope small big 0 50 100

30 Set up for standard Least Squares y i = a + b x i y 1 1 x 1 a y 2 = 1 x 2 b … … … y N 1 x N d = G m

31 Standard Least-squares Solution m est = [G T G] -1 G T d

32 Derivation: use fact that minimum is at dE/dm i = 0 E =  k e k e k =  k (d k -  p G kp m p ) (d k -  q G kq m q ) =  k d k d k - 2  k d k  p G kp m p +  k  p G kp m p  q G kq m q dE/dm i = 0 - 2  k d k  p G kp (dm p /dm i ) +  k  p G kp (dm p/ dm i )  q G kq m q +  k  p G kp m p  q G kq (dm q /dm i ) = -2  k d k  p G kp  pi +  k  p G kp  pi  q G kq m q +  k  p G kp m p  q G kq  qi = -2  k d k G ki +  k G ki  q G kq m q +  k  p G kp m p G ki  2  k G ki d k + 2  q [  k G ki G kq ]m q = 0 or 2G T d + 2[G T G]m = 0 or m=[G T G] -1 G Td y

33 Why least-squares? Why not least-absolute length? Or something else?

34 Least-SquaresLeast Absolute Value a=1.00 b=2.02 a=0.94 b = 2.02


Download ppt "Lecture 3 Review of Linear Algebra Simple least-squares."

Similar presentations


Ads by Google