Download presentation
Presentation is loading. Please wait.
Published byEsther May Modified over 9 years ago
1
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
2
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
3
Outline 1.Discussion of yesterday’s exercise 2.The mathematics of regression 3.Solution of the normal equations 4.Probability and likelihood 5.Sample exercise: Mauna Loa CO 2 6.Sample exercise: TransCom3 inversion
4
http://www.aos.princeton.edu/WWWPUBLIC/ sara/statistics_course/andy/R/ corr_exer.r18 July practical mauna_loa.rToday’s first example transcom3.rToday’s second example dot-RprofileRename to ~/.Rprofile (i.e., home dir) hclimate.indices.rGet SOI, NAO, PDO, etc. from CDC cov2cor.rConvert covariance to correlation ferret.palette.rUse nice ferret color palettes geo.axes.rFormat degree symbols, etc., for maps load.ncdf.rQuickly load a whole netCDF file svd.invert.rMultiple linear regression using SVD mat4.rRead and write Matlab.mat files (v4 only) svd_invert.mMultiple linear regression using SVD (Matlab) atm0_m1.matData for the TransCom3 example R-intro.pdfBasic R documentation faraway_pra_book.pdfJulian Faraway’s “Practical Regression and ANOVA in R” book
5
Multiple Linear Regression Data Parameters Basis Set
6
Basis Functions “Design matrix” A gives values of each basis function at each observation location. Basis Functions Observations Note that one column of (e.g., a i 1 ) may be all ones, to represent the “intercept”.
7
From the Cost Function to the Normal Equations “Least squares” optimization minimizes sum of squared residuals (misfits to data). For the time being, we assume that the residuals are IID: Expanding terms: Cost is minimized when derivative w.r.t. x vanishes: Rearranging: Optimal parameter values (note that A T A must be invertible):
8
x-hat is BLUE BLUE = Best Linear Unbiased Estimate (not shown here: “best”)
9
Practical Solution of Normal Equations using SVD If we could pre-multiply the forward equation by A -1, the “pseudo-inverse” of A, we could get our answer directly: For every M x N matrix A, there exists a singular value decomposition (SVD): U is M x M S is N x N V is N x N S is diagonal and contains the Singular Values The columns of U and V are orthogonal to one another: The pseudo- inverse is thus:
10
Practical Solution of Normal Equations using SVD If we could pre-multiply the forward equation by A -1, the “pseudo-inverse” of A, we could get our answer directly: The pseudo- inverse is: where
11
Practical Solution of Normal Equations using SVD If we could pre-multiply the forward equation by A -1, the “pseudo-inverse” of A, we could get our answer directly: The pseudo- inverse is: And the parameter uncertainty covariance matrix is: with
12
Gaussian Probability and Least Squares Residuals vector: Probability of r i : Likelihood of r : N.B.: Only true if residuals are uncorrelated (independent). PredictionsObservations
13
Maximum Likelihood Log-Likelihood of r : Goodness-of-fit: 2 for N-M degrees of freedom has a known distribution, so regression models such as this can be judged on the probability of getting a given value of 2.
14
Probability and Least Squares Why should we expect Gaussian residuals?
15
Random Processes z1 <- runif(5000)
16
Random Processes hist(z1)
17
Random Processes z1 <- runif(5000)z2 <- runif(5000) What is the distribution of (z1 + z2) ?
18
Triangular Distribution hist(z1+z2)
19
Central Limit Theorem There are more ways to get a central value than an extreme one.
20
Probability and Least Squares Why should we expect Gaussian residuals? (1) Because the Central Limit Theorem is on our side. (2) Note that the LS solution is always a minimum variance solution, which is useful by itself. The “maximum-likelihood” interpretation is more of a goal than a reality.
21
Weighted Least Squares: More General “Data” Errors Minimizing the 2 is equivalent to minimizing a cost function containing a covariance matrix C of data errors: The data error covariance matrix is often taken to be diagonal. This means that you put different levels of confidence on different observations (confidence assigned by assessing both measurement error and amount of trust in your basis functions and linear model). Note that this structure still assumes independence between the residuals.
22
Covariate Data Errors Recall cost function: Now allow off-diagonal covariances in C. N.B. ij = ji and ii = i 2. Multivariate normal PDF: J propagates without trouble into the likelihood expression. Minimizing J still maximizes the likelihood
23
Fundamental Trick for Weighted and Generalized Least Squares Transform system (A,b,C) with data covariance matrix C into system (A’,b’,C’), where C’ is the identity matrix: The Cholesky decomposition computes a “matrix square root” such that if R=chol(C), then C=RR. You can then solve the Ordinary Least Squares problem A’x = b’, using for instance the SVD method. Note that x remains in regular, untransformed space.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.