Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 Review of Linear Algebra Simple least-squares.

Similar presentations


Presentation on theme: "Lecture 3 Review of Linear Algebra Simple least-squares."— Presentation transcript:

1 Lecture 3 Review of Linear Algebra Simple least-squares

2 Set up for standard Least Squares y i = a + b x i y 1 1 x 1 a y 2 = 1 x 2 b … … … y N 1 x N d = G m

3 Standard Least-squares Solution m est = [G T G] -1 G T d

4 practice Set up a simple least-squares problem, identifying the vectors d and m and the matrix G Solve it using the least-squares formula, m est = [G T G] -1 G T d

5 Lecture 4 Probability and what it has to do with data analysis

6 the Gaussian or normal distribution p(x) = exp{ - (x-x) 2 / 2  2 ) 1  (2  )  expected value variance Memorize me !

7 x p(x) x x+2  x-2  95% Expectation = Median = Mode = x 95% of probability within 2  of the expected value Properties of the normal distribution

8 Functions of a random variable any function of a random variable is itself a random variable Errors propagate from observations to inferences

9 General rule given a distribution p(x) e.g. where x are observations and a function y(x) e.g. where y are inferences p(y) = p[x(y)] |dx/dy|

10 Suppose y(x) is a linear function y=Mx Then, regardless of the type of distribution, p(x): In the special case that p(x) is a normal distrbution p(y) is a normal distribution, too. y=Mx C y = M C x M T

11 Means and Variances Add Special case: y=Mx  y 1 = Ax 1 ± Bx 2 So that M = [A, B] C y = M C x M T y=Mx y = Ax 1 ± Bx 2  y 2 = A 2  x 1 2 + B 2  x 2 2 Note that variance always add

12 practice I would say … practice transforming a distribution of two variables, p(x 1,x 2 )  p(y 1,y 2 ) when the functions y 1 (x 1,x 2 ) and y 2 (x 1,x 2 ) are simple (but nonlinear) expressions and p(x 1,x 2 ) is simple, too. … but actually, even the simplest version would be too long for a midterm.

13 Lecture 5 Probability and Statistics

14 Rule for propagating error in least-squared M=[G T G] -1 G T Uncorrelated data with equal variance C d =  d 2 I C m = M C d M T =  d 2 [G T G] -1 C y = M C x M T

15 From this follows the famous rule for the error associated with the mean. If G = N -1 [1, 1, … 1] T  m =  d /  N the estimated mean is a normally-distributed random variable the width of this distribution,  m, decreases with the square root of the number of measurements

16 practice Set up a simple (e.g. linear) error-propagation problem by identifing the matrices M and C d Compute and interpret C m using the rule And then write down 95% confidence intervals C y = M C x M T

17 Lecture 6 Bootstraps Maximum Likelihood Methods

18 More or less the same thing in the 2 pots ? Take 1 cup p(y) Duplicate cup an infinite number of times Pour into new pot  p(y)

19 Bootstrap method random sampling with replacement use the original dataset x to create many new datasets x (i) compute a y(x) from each and empirically examine their distribution

20 The Principle of Maximum Likelihood Given a parameterized distribution p(x;m) Chose m so that it maximizes L(m)  L/  m i = 0 the dataset that was in fact observed is the most probable one that could have been observed L(m) =  i ln p(x i ; m)

21 Application to Normal Distribution Sample mean and sample variance are the maximum likelihood estimates of the true mean and variance of a normal distribution

22 practice I would say … use maximum likelihood to find the m associated with a parameterized distribution p(d,m) when p(d,m) is something fairly simple … but I think even the simplest such a problem would be too long for a midterm

23 Lecture 7 Advanced Topics in Least Squares

24 When the data are normally-distributed with variance C d Maximum likelihood implies generalized least- squares: Minimize (d-Gm) T C d -1 (d-Gm) Which has solution m = [G T C d -1 G] -1 G T C d -1 d and C m = [G T C d -1 G] -1

25 In the special case of uncorrelated data with different variances C d = diag(  1 2,  2 2, …  N 2 ) = d i ’=  i -1 d i multiply each data by the reciprocal of its error G ij ’ =  i -1 G ij multiply each row of the data kernel by the same amount Then solve by ordinary least squares  1 2 0 0 … 0  2 2 0 … 0 0  3 2 …...

26 practice Set up a simple least-squares problem when the data have non-uniform variance Solve it: work out a formula for the least-squares estimate of the unknowns, and their variance as well. Interpret the results, e.g. write down 95% confidence intervals for the unknowns

27 Lecture 8 Advanced Topics in Least Squares - Part Two -

28 prior information assumptions about the behavior of the unknowns that ‘fill in’ the data gaps

29 Overall Strategy 1. Represent the observed data as a normal probability distribution with d=d obs, C d 2. Represent prior information as a probability distribution with m=m A, C m … … 5. Apply maximum likelihood to the combined distribution

30 Generalized least-squares solution m est = m A + M [ d obs – Gm A ] where M = [G T C d -1 G + C m -1 ] -1 G T C d -1

31 Special case: uncorrelated data and prior constraints C d =  d 2 I and C m =  m 2 I M = [ G T G + (  d /  m ) 2 I ] -1 G T Called damped least-squares Unknown m’s filled in with their prior values m A

32 Another special case: Smoothness … Dm is a measure of roughness of m e.g. second derivative 1 -2 1 0 0 0 … 0 1 -2 1 0 0 … … 0 0 0 … 1 -2 1 D = d 2 m/dx 2  Dm

33 solution corresponds to generalized least-squares with the choices m A = 0 C m -1 = (D T D)

34 practice Set up a simple least-squares problem when prior information about the model parameters is available. Most importantly, specify m A and C m in sensible ways. Solve it: work out a formula for the estimate of the unknowns, and their variance as well. Interpret the results, e.g. write down 95% confidence intervals for the unknowns

35 Lecture 9 Interpolation and Splines

36 cubic splines – x xixi x i+1 yiyi y i+1 y cubic a+bx+cx 2 +dx 3 in this interval a different cubic in this interval

37 Properties curve goes thru point at end of its interval dy/dx match at interior points d 2 y/dx 2 match at interior points d 2 y/dx 2 =0 at end points

38 practice Memorize the properties of cubic splines

39 Lecture 10 Hypothesis Testing

40 The Null Hypothesis always a variant of this theme: the results of an experiment differs from the expected value only because of random variation

41 5 tests m obs = m prior when m prior and  prior are known normal distribution  obs =  prior when m prior and  prior are known chi-squared distribution m obs = m prior when m prior is known but  prior is unknown t distribution  1 obs =   obs when m 1 prior and m 2 prior are known F distribution m 1 obs = m  obs when  1 prior and   prior are unknown modified t distribution Not on midterm

42 practice Work through an example of each of the 4 tests identify which test is being used, and why indentify the Null hypothesis compute the probability that the results deviate from the Null Hypothesis only because of random noise interpret the results

43 Lecture 11 Linear Systems

44 output (“response”) of a linear system can be calculated by convolving its input (“forcing”) with its impulse response

45 t 0 h(t) 0 t  (t)=g(t) h(t) t  amplitude h(  ) t   (t) h(  )g(t-  ) Convolution integral  (t) =  -  t g(t-  ) h(  ) d 

46 how to do convolution by hand x=[x 0, x 1, x 2, x 3, x 4, …] T and y=[y 0, y 1, y 2, y 3, y 4, …] T x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x0y0x0y0 Reverse on time-series, line them up as shown, and multiply rows. This is first element of x * y

47 x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x 0 y 1 +x 1 y 0 Slide to increase the overlap by one, multiply rows and add products. This is the second element  Slide again, multiply and add. This is the third element x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x 0 y 2 +x 1 y 1 +x 2 y 0   Repeat until time-series no longer overlap

48 Mathematical equivalent ways to write the convolution  (t) =  -  t g(t-  ) h(  ) d  or alternatively  (t) =  0  g(  ) h(t-  ) d  h(  ) is “forward in time” g(  ) is “forward in time”

49 01…N01…N h0h1…hNh0h1…hN g 0 0 0 0 0 0 g 1 g 0 0 0 0 0 … g N … g 3 g 2 g 1 g 0 =  t   = G h Matrix formulations 01…N01…N g0g1…gNg0g1…gN h 0 0 0 0 0 0 h 1 h 0 0 0 0 0 … h N … h 3 h 2 h 1 h 0 =  t   = G g and

50 practice Do some convolutions by hand Make sketch-plots of the input, output and impulse response

51 Lecture 12 Filter Theory

52 y k =  p=-  k f k-p x p y k is obtained from x k by convolving by filter f k input output “digital” filter a generic way to construct a time-series

53 the z-transform turn a timeseries into a polynomial and vice versa time-series x=[x 0, x 1, x 2, x 3, x 4, …] T polynomial x(z) = x 0 + x 1 z + x 2 z 2 + x 3 z 3 + x 4 z 4 + … Z-transform Convolving time-series is equivalent to multiplying their z-transforms

54 If f = [1, -f 1 ] T then f inv = [1, f 1, f 1 2, f 1 3, …] T The inverse filter only exists when |f 1 |<1, for otherwise the elements of f inv grow without bound

55 any filter of length N can be written as a cascade of N-1 length-2 filters f = [f 0, f 1, f 2, f 3, … f N-1 ] T = [-r 1, 1] T * [-r 2, 1] T * … * [-r N-1, 1] T where r i are the roots of f(z)

56 In the general case, an inverse filter only exists when the roots r i of the corresponding f(z) satisfy |r i |>1 such a filter is said to be “minimum phase”

57 practice Given a relatively short filter, f (3 or 4 coefficients) Factor it into a cascade of 2-element filters, by computing the roots of f(z) Determine whether the filter f has an inverse


Download ppt "Lecture 3 Review of Linear Algebra Simple least-squares."

Similar presentations


Ads by Google