Lecture 3 Review of Linear Algebra Simple least-squares.

Lecture 3 Review of Linear Algebra Simple least-squares

Set up for standard Least Squares y i = a + b x i y 1 1 x 1 a y 2 = 1 x 2 b … … … y N 1 x N d = G m

Standard Least-squares Solution m est = [G T G] -1 G T d

practice Set up a simple least-squares problem, identifying the vectors d and m and the matrix G Solve it using the least-squares formula, m est = [G T G] -1 G T d

Lecture 4 Probability and what it has to do with data analysis

the Gaussian or normal distribution p(x) = exp{ - (x-x) 2 / 2  2 ) 1  (2  )  expected value variance Memorize me !

x p(x) x x+2  x-2  95% Expectation = Median = Mode = x 95% of probability within 2  of the expected value Properties of the normal distribution

Functions of a random variable any function of a random variable is itself a random variable Errors propagate from observations to inferences

General rule given a distribution p(x) e.g. where x are observations and a function y(x) e.g. where y are inferences p(y) = p[x(y)] |dx/dy|

Suppose y(x) is a linear function y=Mx Then, regardless of the type of distribution, p(x): In the special case that p(x) is a normal distrbution p(y) is a normal distribution, too. y=Mx C y = M C x M T

Means and Variances Add Special case: y=Mx  y 1 = Ax 1 ± Bx 2 So that M = [A, B] C y = M C x M T y=Mx y = Ax 1 ± Bx 2  y 2 = A 2  x 1 2 + B 2  x 2 2 Note that variance always add

practice I would say … practice transforming a distribution of two variables, p(x 1,x 2 )  p(y 1,y 2 ) when the functions y 1 (x 1,x 2 ) and y 2 (x 1,x 2 ) are simple (but nonlinear) expressions and p(x 1,x 2 ) is simple, too. … but actually, even the simplest version would be too long for a midterm.

Lecture 5 Probability and Statistics

Rule for propagating error in least-squared M=[G T G] -1 G T Uncorrelated data with equal variance C d =  d 2 I C m = M C d M T =  d 2 [G T G] -1 C y = M C x M T

From this follows the famous rule for the error associated with the mean. If G = N -1 [1, 1, … 1] T  m =  d /  N the estimated mean is a normally-distributed random variable the width of this distribution,  m, decreases with the square root of the number of measurements

practice Set up a simple (e.g. linear) error-propagation problem by identifing the matrices M and C d Compute and interpret C m using the rule And then write down 95% confidence intervals C y = M C x M T

Lecture 6 Bootstraps Maximum Likelihood Methods

More or less the same thing in the 2 pots ? Take 1 cup p(y) Duplicate cup an infinite number of times Pour into new pot  p(y)

Bootstrap method random sampling with replacement use the original dataset x to create many new datasets x (i) compute a y(x) from each and empirically examine their distribution

The Principle of Maximum Likelihood Given a parameterized distribution p(x;m) Chose m so that it maximizes L(m)  L/  m i = 0 the dataset that was in fact observed is the most probable one that could have been observed L(m) =  i ln p(x i ; m)

Application to Normal Distribution Sample mean and sample variance are the maximum likelihood estimates of the true mean and variance of a normal distribution

practice I would say … use maximum likelihood to find the m associated with a parameterized distribution p(d,m) when p(d,m) is something fairly simple … but I think even the simplest such a problem would be too long for a midterm

Lecture 7 Advanced Topics in Least Squares

When the data are normally-distributed with variance C d Maximum likelihood implies generalized least- squares: Minimize (d-Gm) T C d -1 (d-Gm) Which has solution m = [G T C d -1 G] -1 G T C d -1 d and C m = [G T C d -1 G] -1

In the special case of uncorrelated data with different variances C d = diag(  1 2,  2 2, …  N 2 ) = d i ’=  i -1 d i multiply each data by the reciprocal of its error G ij ’ =  i -1 G ij multiply each row of the data kernel by the same amount Then solve by ordinary least squares  1 2 0 0 … 0  2 2 0 … 0 0  3 2 …...

practice Set up a simple least-squares problem when the data have non-uniform variance Solve it: work out a formula for the least-squares estimate of the unknowns, and their variance as well. Interpret the results, e.g. write down 95% confidence intervals for the unknowns

Lecture 8 Advanced Topics in Least Squares - Part Two -

prior information assumptions about the behavior of the unknowns that ‘fill in’ the data gaps

Overall Strategy 1. Represent the observed data as a normal probability distribution with d=d obs, C d 2. Represent prior information as a probability distribution with m=m A, C m … … 5. Apply maximum likelihood to the combined distribution

Generalized least-squares solution m est = m A + M [ d obs – Gm A ] where M = [G T C d -1 G + C m -1 ] -1 G T C d -1

Special case: uncorrelated data and prior constraints C d =  d 2 I and C m =  m 2 I M = [ G T G + (  d /  m ) 2 I ] -1 G T Called damped least-squares Unknown m’s filled in with their prior values m A

Another special case: Smoothness … Dm is a measure of roughness of m e.g. second derivative 1 -2 1 0 0 0 … 0 1 -2 1 0 0 … … 0 0 0 … 1 -2 1 D = d 2 m/dx 2  Dm

solution corresponds to generalized least-squares with the choices m A = 0 C m -1 = (D T D)

practice Set up a simple least-squares problem when prior information about the model parameters is available. Most importantly, specify m A and C m in sensible ways. Solve it: work out a formula for the estimate of the unknowns, and their variance as well. Interpret the results, e.g. write down 95% confidence intervals for the unknowns

Lecture 9 Interpolation and Splines

cubic splines – x xixi x i+1 yiyi y i+1 y cubic a+bx+cx 2 +dx 3 in this interval a different cubic in this interval

Properties curve goes thru point at end of its interval dy/dx match at interior points d 2 y/dx 2 match at interior points d 2 y/dx 2 =0 at end points

practice Memorize the properties of cubic splines

Lecture 10 Hypothesis Testing

The Null Hypothesis always a variant of this theme: the results of an experiment differs from the expected value only because of random variation

5 tests m obs = m prior when m prior and  prior are known normal distribution  obs =  prior when m prior and  prior are known chi-squared distribution m obs = m prior when m prior is known but  prior is unknown t distribution  1 obs =   obs when m 1 prior and m 2 prior are known F distribution m 1 obs = m  obs when  1 prior and   prior are unknown modified t distribution Not on midterm

practice Work through an example of each of the 4 tests identify which test is being used, and why indentify the Null hypothesis compute the probability that the results deviate from the Null Hypothesis only because of random noise interpret the results

Lecture 11 Linear Systems

output (“response”) of a linear system can be calculated by convolving its input (“forcing”) with its impulse response

t 0 h(t) 0 t  (t)=g(t) h(t) t  amplitude h(  ) t   (t) h(  )g(t-  ) Convolution integral  (t) =  -  t g(t-  ) h(  ) d 

how to do convolution by hand x=[x 0, x 1, x 2, x 3, x 4, …] T and y=[y 0, y 1, y 2, y 3, y 4, …] T x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x0y0x0y0 Reverse on time-series, line them up as shown, and multiply rows. This is first element of x * y

x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x 0 y 1 +x 1 y 0 Slide to increase the overlap by one, multiply rows and add products. This is the second element  Slide again, multiply and add. This is the third element x 0, x 1, x 2, x 3, x 4, … … y 4, y 3, y 2, y 1, y 0  x 0 y 2 +x 1 y 1 +x 2 y 0   Repeat until time-series no longer overlap

Mathematical equivalent ways to write the convolution  (t) =  -  t g(t-  ) h(  ) d  or alternatively  (t) =  0  g(  ) h(t-  ) d  h(  ) is “forward in time” g(  ) is “forward in time”

01…N01…N h0h1…hNh0h1…hN g 0 0 0 0 0 0 g 1 g 0 0 0 0 0 … g N … g 3 g 2 g 1 g 0 =  t   = G h Matrix formulations 01…N01…N g0g1…gNg0g1…gN h 0 0 0 0 0 0 h 1 h 0 0 0 0 0 … h N … h 3 h 2 h 1 h 0 =  t   = G g and

practice Do some convolutions by hand Make sketch-plots of the input, output and impulse response

Lecture 12 Filter Theory

y k =  p=-  k f k-p x p y k is obtained from x k by convolving by filter f k input output “digital” filter a generic way to construct a time-series

the z-transform turn a timeseries into a polynomial and vice versa time-series x=[x 0, x 1, x 2, x 3, x 4, …] T polynomial x(z) = x 0 + x 1 z + x 2 z 2 + x 3 z 3 + x 4 z 4 + … Z-transform Convolving time-series is equivalent to multiplying their z-transforms

If f = [1, -f 1 ] T then f inv = [1, f 1, f 1 2, f 1 3, …] T The inverse filter only exists when |f 1 |<1, for otherwise the elements of f inv grow without bound

any filter of length N can be written as a cascade of N-1 length-2 filters f = [f 0, f 1, f 2, f 3, … f N-1 ] T = [-r 1, 1] T * [-r 2, 1] T * … * [-r N-1, 1] T where r i are the roots of f(z)

In the general case, an inverse filter only exists when the roots r i of the corresponding f(z) satisfy |r i |>1 such a filter is said to be “minimum phase”

practice Given a relatively short filter, f (3 or 4 coefficients) Factor it into a cascade of 2-element filters, by computing the roots of f(z) Determine whether the filter f has an inverse

Lecture 3 Review of Linear Algebra Simple least-squares.

Similar presentations

Presentation on theme: "Lecture 3 Review of Linear Algebra Simple least-squares."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 3 Review of Linear Algebra Simple least-squares.

Similar presentations

Presentation on theme: "Lecture 3 Review of Linear Algebra Simple least-squares."— Presentation transcript:

Similar presentations

About project

Feedback