Download presentation
Presentation is loading. Please wait.
1
Linear Regression in Matrix Form
In this section we will talk about the matrix operation and application in Simple linear regression. If you are not familiar with matrix algebra, please read the textbook chapter 5 for a throughout review. Linear Regression in Matrix Form If you are not familiar with matrix algebra, please read KNNL Sections
2
Matrix has subscripts that show the dimension, for example A3by2 is a 3 row by 2 column matrix.
Matrixes and vectors will be written in bold face type. Matrix form Matrices and vectors will be written in bold face type. Subscripts will indicate the dimension of the matrix. For example, 𝑨 𝟑𝑿𝟐 = is a 3-row by 2-column matrix. Dimension subscripts will only be used when needed for clarity.
3
A) B) C) D) 𝑨𝑩= 38 28 𝑨𝑩= 8 30 2 28 𝑨𝑩= 33 52 21 32 𝑨𝑩= 38 28
Warm up exercise 𝑨𝑩= 𝑨𝑩= A) B) Let’s begin with some exercise. You may pause and work on the problem, and resume to reveal the answer. The answer is D). We multiple the rows of the first matrix with the column of the second matrix. 𝑨𝑩= 𝑨𝑩= C) D)
4
𝑨, 𝒀𝑎𝑛𝑑 𝑪 𝑎𝑟𝑒 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠, 𝑎𝑛𝑑 𝑨𝒀=𝑪
Warm up exercise 𝑨, 𝒀𝑎𝑛𝑑 𝑪 𝑎𝑟𝑒 𝑚𝑎𝑡𝑟𝑖𝑐𝑒𝑠, 𝑎𝑛𝑑 𝑨𝒀=𝑪 A) 𝒀=𝑪 𝑨 −𝟏 B) 𝒀=𝑪 𝑨 ′ The answer is C) C) 𝒀= 𝑨 −𝟏 𝑪 D) 𝒀= 𝑨 ′ 𝑪
5
B) A) C) D) 𝒀 ′ 𝒀=𝚺 𝒚 𝒊 𝟐 𝒀 ′ 𝒀= 2 2 3 2 𝒀 ′ 𝒀= 2 2 + 3 2
Warm up exercise 𝒀= 𝟐 𝟑 𝒀 ′ 𝒀=𝚺 𝒚 𝒊 𝟐 𝒀 ′ 𝒀= 𝒀 ′ 𝒀= B) A) The answer is A). In general, the matrix operation Y’Y can be used to get the sum of squares. C) 𝒀 ′ 𝒀= D) 𝒀 ′ 𝒀=
6
Warm up exercise 𝒀= 𝟐 𝟑 𝒀 ′ 𝟏 𝟏 𝟏 𝟏 𝒀= Σ 𝑦 𝑖 2 𝒀 ′ 𝟏 𝟏 𝟏 𝟏 𝒀= 𝒀 ′ 𝟏 𝟏 𝟏 𝟏 𝒀= A) B) The answer is A). In general, the matrix Y’ times a all-1-matrix times Y is used to get the square of the sum. C) D) 𝒀 ′ 𝟏 𝟏 𝟏 𝟏 𝒀= 𝒀 ′ 𝟏 𝟏 𝟏 𝟏 𝒀=
7
Yi = β0 + β1Xi + εi ε ∼ N (0, σ ) Y1 = β0 + β1X1 + ε1
Now here is a quick flash back on the scalar form in SLR model. The SLR Model in Scalar Form iid ε ∼ N (0, σ ) 2 Yi = β0 + β1Xi + εi where i Consider now writing an equation for each observation: Y1 = β0 + β1X1 + ε1 Y2 = β0 + β1X2 + ε2 . . . Yn = β0 + β1Xn + εn
8
The SLR Model in Matrix Form
Matrix form organize the Scala form as shown. The SLR Model in Matrix Form
9
Yn×1 = Xn×2β2×1 + εn×1 Y = X β + ε Y is the response vector
In matrix notation, the response vector contains the observed data for values of the response variable. The design matrix encodes information about the independent variable(s) and (later) their interactions. Note that the 1s’ column in the design matrix and the intercept. The parameter vector is a vector of model parameters with unknown values that we need to estimate. The error vector is a vector of random, stochastic errors, which represent the difference between the observed values of Yand the expected values of Yhatas defined by the deterministic model, Xβ. In matrix notation, the simple linear regression model is written as, Yn×1 = Xn×2β2×1 + εn×1 Or more simply as, Y = X β + ε where, Y is the response vector X is called the design matrix β is the parameter vector ε is the error vector
10
Write the matrix notation for X and Y.
Example (Flavor deterioration). The results shown below were obtain in a small-scale experiment to study the relation between F of storage temperature (X) and number of weeks before flavor deterioration of a food product begins occur (Y). Observation 1 2 3 4 5 Temp (X) 8 -4 -8 Week (Y) 7.8 9 10.2 11 11.7 Write the matrix notation for X and Y. Y = Xβ + ε In the flavor deterioration example, the result shown below were obtain in a small scale experiment to study the relation between F of storage temperature (X) and number of weeks before flavor deterioration of a food product begins occur (Y). We first write the matrix notation for X and Y. In R, if you define the data first in a list, you can convert to matrix form with the as.matrix function. Note that the design matrix has both the temperature column and the intercept column with all 1s.
11
Use the matrix notation to compute SST.
Example (Flavor deterioration). The results shown below were obtain in a small-scale experiment to study the relation between F of storage temperature (X) and number of weeks before flavor deterioration of a food product begins occur (Y). Observation 1 2 3 4 5 Temp (X) 8 -4 -8 Week (Y) 7.8 9 10.2 11 11.7 Use the matrix notation to compute SST. 𝑆𝑆𝑇=Σ 𝑌− 𝑌 𝟐 =Σ Y 2 − ΣY 2 n = 𝒀 ′ 𝒀− 𝟏 𝐧 𝐘 ′ 𝐉𝐘 =___________ Use the matrix notation to compute SST. Where 𝑱 is the 𝑛 𝑏𝑦 𝑛 matrix of 1s Y =
12
Use the matrix notation to compute SST.
Example (Flavor deterioration). The results shown below were obtain in a small-scale experiment to study the relation between F of storage temperature (X) and number of weeks before flavor deterioration of a food product begins occur (Y). Observation 1 2 3 4 5 Temp (X) 8 -4 -8 Week (Y) 7.8 9 10.2 11 11.7 Use the matrix notation to compute SST. 𝑆𝑆𝑇=Σ 𝑌− 𝑌 𝟐 =Σ Y 2 − ΣY 2 n = 𝒀 ′ 𝒀− 𝟏 𝐧 𝐘 ′ 𝐉𝐘 =_9.752__________ And the answer is Where 𝑱 is the 𝑛 𝑏𝑦 𝑛 matrix of 1s Y =
13
In scalar notation, we said that εi ∼ N (0, σ2) for all i = 1, . . . , n.
This is exactly equivalent to saying that the error vector ε is multivariate normal with avariance-covariance matrix of σ2Inxn. The diagnogal elements are the variance, and the other elements are covariance of the error terms. Variance-Covariance Matrix in SLR In scalar notation, we said that εi ∼ N (0, σ2) for all i = 1, , n. This is exactly equivalent to saying that the error vector ε is multivariate normal with a variance-covariance matrix of σ2Inxn. I is an identity matrix . σ2In Xn= 𝝈 𝟐 𝟎 𝟎 …𝟎 𝟎 𝝈 𝟐 𝟎 …𝟎 𝟎 𝟎 𝝈 𝟐 …𝟎 … 𝟎 𝟎 𝟎 … 𝝈 𝟐 I= 𝟏 𝟎 𝟎 …𝟎 𝟎 𝟏 𝟎 …𝟎 𝟎 𝟎 𝟏 …𝟎 … 𝟎 𝟎 𝟎 …𝟏 Scalar matrix
14
Cov 𝑈,𝑉 =𝜎 𝑈,𝑉 =𝐸[ 𝑈−𝐸 𝑈 𝑉−𝐸 𝑉 =𝐸 𝑈𝑉 −𝐸 U E(V)
The covariance of two random variables, U and V , describes the degree to which they “travel together.” When we draw large values from U , do we tend to also draw large values for V ? Small values for V ? Or is there no (linear) relationship? The variance of a random variable, 𝑈, is the variation between the individual data points in a sample from 𝑈 and their mean, 𝐸(𝑈) Variance-Covariance Matrix in SLR The variance of a random variable, 𝑈, is the variation between the individual data points in a sample from 𝑈 and their mean, 𝐸(𝑈) 𝑉𝑎𝑟 𝑈 = 𝜎 2 𝑈 =𝐸 𝑈−𝐸 𝑈 𝑈−𝐸 𝑈 ′ =𝐸 𝑈 2 − 𝐸 𝑈 2 The covariance of two random variables, U and V , describes the degree to which they “travel together.” When we draw large values from U , do we tend to also draw large values for V ? Small values for V ? Or is there no (linear) relationship? Cov 𝑈,𝑉 =𝜎 𝑈,𝑉 =𝐸[ 𝑈−𝐸 𝑈 𝑉−𝐸 𝑉 =𝐸 𝑈𝑉 −𝐸 U E(V) Correlation is a standardized version of covariance. 𝐶𝑜𝑟𝑟 𝑈,𝑉 = 𝐶𝑜𝑣 𝑈,𝑉 𝜎 𝑈 𝜎(𝑉)
15
For any set of random variables U1, U2,
For any set of random variables U1, U2, , Un, their variance-covariance matrix (denoted by σ2{U } or Σ) is defined as shown. Variance-Covariance Matrix in SLR For any set of random variables U1, U2, , Un, their variance-covariance matrix (denoted by σ2{U } or Σ) is defined to be σ2{U1} σ{U1, U2}… σ{U1, U 𝑛 } σ{U 2 , U 1 } σ2{U 2 } … σ{U 2 , U 𝑛 } σ{U 3 , U 1 } σ{U 3 , U 2 } … σ{U 3 , U 𝑛 } … σ{U 𝑛 , U 1 } σ{U 𝑛 , U 2 } … σ2{U 𝑛 } Σ{U } = where σ2{Ui} is the variance of Ui, and σ{Ui, Uj} is the covariance of Ui and Uj .
16
σ2{U1} 0 · · · 0 0 σ2{U2} . . . Σ = 0 σ {U } · · ·
The variance-covariance matrix of uncorrelated variables will therefore be a diagonal matrix, since all the covariances are 0. This is the case for SLR in the next page. If variables are (completely) uncorrelated, their covariance is 0. Variance-Covariance Matrix in SLR If variables are (completely) uncorrelated, their covariance is 0. The variance-covariance matrix of uncorrelated variables will therefore be a diagonal matrix , since all the covariances are 0. σ2{U1} 0 · · · 0 0 σ2{U2} .. . Σ = {U } . . . . . . . . . 0 σ {U } 2 · · · n
17
Σ{ε}n×n ε Observations are independent (cov=0) Constant variance (σ2)
Variance-Covariance Matrix in SLR Observations are independent (cov=0) Constant variance (σ2) ε1 = σ2In Xn = 𝝈 𝟐 𝟎 𝟎 …𝟎 𝟎 𝝈 𝟐 𝟎 …𝟎 𝟎 𝟎 𝝈 𝟐 …𝟎 … 𝟎 𝟎 𝟎 … 𝝈 𝟐 ε 2 Σ{ε}n×n = Cov .. . ε n
18
A useful theorem for the multivariate Normal distribution
A useful theorem for multivariate normal distribution is that its linear transform also follow the multivariate normal distribution. A useful theorem for the multivariate Normal distribution Let U ∼ N (µ, Σ) be a multivariate normal vector, and let V = c + DU be a linear transformation of U where c is a vector of constants and D is a matrix of constants. Then V ∼ N (c + Dµ, DΣDt).
19
Independent variables are always uncorrelated,
Remember: but uncorrelated variables are not necessarily independent (correlation only measures linear dependence). A useful exception to the rule: If two variables are uncorrelated and they are also both normal, this does mean they are independent. This is why we care about the normal assumption of the linear regression model. Remember: Independent variables are always uncorrelated, but uncorrelated variables are not necessarily independent (correlation only measures linear dependence). A useful exception to the rule: If two variables are uncorrelated and they are also both normal, this does mean they are independent.
20
On this page, we show SLR presentation in matrix form.
Distributional Assumptions on the random error in Matrix Form ε ∼ N (0, σ2I) ε1 = σ2In Xn = 𝝈 𝟐 𝟎 𝟎 …𝟎 𝟎 𝝈 𝟐 𝟎 …𝟎 𝟎 𝟎 𝝈 𝟐 …𝟎 … 𝟎 𝟎 𝟎 … 𝝈 𝟐 ε 2 Σ{ε}n×n = Cov .. . ε n In SLR, Y =𝑿𝜷+𝝐, where 𝝐~ 𝑵 𝟎, 𝝈 𝟐 𝑰 , and 𝑿𝜷 is constant Y ∼ N (Xβ, σ2I)
21
ε1 ε ε = [ε ε · · · ε ] = (Y − Xβ) (Y − Xβ) = ε ε
Now we show how to derive the formula of the slope and intercept. The idea is to minimize the errors, which is Y-Xbeta. Least Squares Parameter Estimation The residuals are ε = Y − Xβ. We want to minimize the sum of the squared residuals: ε1 ε 2 2 ε = [ε ε · · · ε ] .. = (Y − Xβ) t (Y − Xβ) = ε ε t i 1 2 n . n
22
d d (εtε) = ((Y − Xβ)t(Y − Xβ)) dβ dβ = −2Xt(Y − Xβ) −2Xt(Y − Xβ) = 0
(Partially) Solving for β gives the least square solution for b0 and b1. To find the solution, set the derivative with respect to the vector β equal to a zero vector and To find the solution, set the derivative with respect to the vector β equal to a zero vector and solve: d d (εtε) = ((Y − Xβ)t(Y − Xβ)) dβ dβ = −2Xt(Y − Xβ) (Partially) Solving for β yields the so-called Normal Equations : −2Xt(Y − Xβ) = 0 XtY = XtXβ b0 b1 𝜷 = 𝑿 𝒕 𝑿 −𝟏 𝑿 𝒕 𝒀 =
23
𝑏 1 = Σ X i − 𝑋 𝑌 𝑖 − 𝑌 Σ 𝑋 𝑖 − 𝑋 2 = 𝑆 𝑆 𝑋𝑌 𝑆 𝑆 𝑋𝑋
We have done nothing new nor magic: all we are doing is writing the same old formulas for b0 and b1 in matrix format. Do not worry if you cannot reproduce the following algebra, but you should try to follow it so that you believe me that is really not a new formula. b0 b1 𝜷 = 𝑿 𝒕 𝑿 −𝟏 𝑿 𝒕 𝒀 = is not a new formula. It is just another way of writing the same old formulas, 𝑏 1 = Σ X i − 𝑋 𝑌 𝑖 − 𝑌 Σ 𝑋 𝑖 − 𝑋 2 = 𝑆 𝑆 𝑋𝑌 𝑆 𝑆 𝑋𝑋 𝑏 0 = 𝑌 − 𝑏 1 𝑋 𝒀 − 𝑋 𝑆 𝑆 𝑋𝑌 𝑆 𝑆 𝑋 𝑆 𝑆 𝑋𝑌 𝑆 𝑆 𝑋 𝜷 = 𝑿 𝒕 𝑿 −𝟏 𝑿 𝒕 𝒀 = Now the formula in matrix form works for any arbitrary number of β’s!
24
b0 b1 𝜷= 𝑿 𝒕 𝑿 −𝟏 𝑿 𝒕 𝒀 = The details 𝑋 𝑡 𝑋= … 𝑋 1 𝑋 2 … 𝑋 𝑛 𝑋 𝑋 2 ⋮ ⋮ 𝑋 𝑛 = 𝑛 Σ 𝑋 𝑖 Σ 𝑋 𝑖 Σ 𝑋 𝑖 2 𝑋 𝑡 𝑋 −1 = 1 𝑛Σ 𝑋 𝑖 2 − Σ 𝑋 𝑖 Σ 𝑋 𝑖 2 −Σ 𝑋 𝑖 −Σ 𝑋 𝑖 n = 1 𝑛𝑆 𝑆 𝑋 Σ 𝑋 𝑖 2 −Σ 𝑋 𝑖 −Σ 𝑋 𝑖 n 𝑋 𝑡 𝑌= … 𝑋 1 𝑋 2 … 𝑋 𝑛 𝑌 1 𝑌 2 ⋮ 𝑌 𝑛 = Σ 𝑌 𝑖 Σ 𝑋 𝑖 𝑌 𝑖 Details are shown on the next two pages.
25
Plug these into the equation for b:
= 1 𝑛𝑆 𝑆 𝑋 Σ 𝑋 𝑖 2 Σ 𝑌 𝑖 −(Σ 𝑋 𝑖 )(Σ 𝑋 𝑖 𝑌 𝑖 ) − Σ 𝑋 𝑖 Σ 𝑌 𝑖 +𝑛(Σ 𝑋 𝑖 𝑌 𝑖 ) = 1 𝑆 𝑆 𝑋 Σ 𝑋 𝑖 2 𝒀 −( 𝑋 )(Σ 𝑋 𝑖 𝑌 𝑖 ) −𝑛 𝑿 𝒀 +(Σ 𝑋 𝑖 𝑌 𝑖 ) = 1 𝑆 𝑆 𝑋 Σ 𝑋 𝑖 2 𝒀 − 𝑌 𝑛 𝑋 𝑋 (𝑛 𝑋 𝑌 )−( 𝑋 )(Σ 𝑋 𝑖 𝑌 𝑖 ) 𝑆 𝑆 𝑋𝑌 = 1 𝑆 𝑆 𝑋 𝑆 𝑆 𝑋 𝒀 − 𝑋 𝑆 𝑃 𝑋𝑌 𝑆 𝑆 𝑋𝑌 = 𝒀 − 𝑋 𝑆 𝑆 𝑋𝑌 𝑆 𝑆 𝑋 𝑆 𝑆 𝑋𝑌 𝑆 𝑆 𝑋 = 𝑏 0 𝑏 1 Now let’s look at the pieces of the matrix formula for your reference. It is not require to know how to derive. Finally!
26
Compute estimators in matrix form
Example (Flavor deterioration). The results shown below were obtain in a small-scale experiment to study the relation between F of storage temperature (X) and number of weeks before flavor deterioration of a food product begins occur (Y). i 1 2 3 4 5 Xi 8 -4 -8 Yi 7.8 9 10.2 11 11.7 Compute estimators in matrix form Now back to the flavor deterioration example. We apply the matrix operation to estimate the beta. It takes some experience to get the inverse matrix, but I suggest you try do it for simple questions like this get some sense. Y = 5X1 X = 𝜷 = 𝑿 ′ 𝑿 −𝟏 𝑿 ′ 𝒀 = 5X2
27
Example (Flavor deterioration)
Example (Flavor deterioration). The results shown below were obtain in a small-scale experiment to study the relation between F of storage temperature (X) and number of weeks before flavor deterioration of a food product begins occur (Y). i 1 2 3 4 5 Xi 8 -4 -8 Yi 7.8 9 10.2 11 11.7 Compute estimators in matrix form The matrix for estimates is 9.94, and In scalar form, yhat =9.94 – 0.245X. The storage time is week shorter when temperature goes one degree higher. Y = 5X1 9.94 −0.245 X = 𝜷 = 𝑿 ′ 𝑿 −𝟏 𝑿 ′ 𝒀 = 5X2 In scalar form, 𝑌 =9.94−0.245 𝑋
28
The matrix X(X’X)-1X’ is called the hat matrix because it turns Y into Yhats. The hat matrix will give us many useful diagnostic tools later. The matrix H is symmetric and has a special property called idempotent: meaning the multiplication of H is always itself. Hat Matrix Yˆ = Xb = X(XtX)−1XtY = HY where H = X(XtX)−1Xt is called the hat matrix because it turns Y ’s into Yˆ ’s. The hat matrix will give us many useful diagnostic tools in MLR. In the flavor example H = 5X5 The matrix 𝐻 is symmetric and has the special property called idempotent: 𝐻𝐻=𝐻
29
=0.148 Predicted value, residual, SSE, and MSE 𝒆=𝒀− 𝒀 = 𝒀 ′ 𝑰−𝑯 𝒀
The hat matrix is used in computing SSE and MSE. Predicted value, residual, SSE, and MSE 𝒆=𝒀− 𝒀 = 𝒀 ′ 𝑰−𝑯 𝒀 𝑺𝑺𝑬= 𝒆 ′ 𝒆= =0.148 = 𝑴𝑺𝑬= 𝑺𝑺𝑬 𝒏−𝟐 = 𝟎.𝟏𝟒𝟖 𝟑 =𝟎.𝟎𝟒𝟗𝟑
30
Estimated Covariance Matrix of b
The matrix of parameter estimates, b, is a linear combination of the elements of Y . Therefore the estimates are (multivariate) normal if Y is (multivariate) normal. They will be approximately normal in general. Estimated Covariance Matrix of b The matrix of parameter estimates, b, is a linear combination of the elements of Y . Therefore the estimates are (multivariate) normal if Y is (multivariate) normal. They will be approximately normal in general (CLT again).
31
b = (XtX)−1XtY Recall: And Y ∼ N (Xβ, σ2I)
The parameter estimate is unbiased, and Recall: b = (XtX)−1XtY And Y ∼ N (Xβ, σ2I) Since b is a linear combination of Y 𝑬 𝒃 =(XtX)−1Xt 𝑬 𝒀 =(XtX)−1Xt 𝑿𝜷=𝜷 Therefore, b is an unbiased estimator of β.
32
σ2(XtX)−1 generally is NOT a diagonal matrix, so the estimates
The variance-covariance matrix is given by sigma2 times the inverse of (X’X). It generally is not a diaconal matrix. Since bo and b1 are not independent. In the previous topic, we use working hoteling method to estimate them simultaneously. The covariance matrix for b is, In the last step, XtX and its inverse are both symmetric, therefore (XtX)−1t = (XtX)−1 σ2(XtX)−1 generally is NOT a diagonal matrix, so the estimates b 0 and b 1 are generally not independent of each other. This is the basis for the Working-Hotelling confidence bands.
33
Example (Flavor deterioration)
Example (Flavor deterioration). The results shown below were obtain in a small-scale experiment to study the relation between F of storage temperature (X) and number of weeks before flavor deterioration of a food product begins occur (Y). Compute the covariance matrix for b 𝜷 = 𝑿 𝒕 𝑿 −𝟏 𝑿 𝒕 𝒀 = Y = Generally not 0! 𝚺 𝒃 = 𝝈 𝟐 𝑿 𝒕 𝑿 −𝟏 =𝑴𝑺𝑬 𝑿 𝒕 𝑿 −𝟏 We compute the variance covariance matrix for bo and b1. Note that the fact that it is a diagonal matrix just in this case. The non diagonal elements are usually not 0. The diagonal elements are the standard error (squared) of the estimates. = =0.0493 This is 𝑠 2 𝑏 0 and 𝑠 2 { 𝑏 1 } X =
34
= 𝑿 𝒉 ′ 𝚺 𝐛 𝐗 𝐡 Mean response
To estimate the mean response at 𝑿 𝒉 , define a matrix form = 𝑿 𝒉 ′ 𝚺 𝐛 𝐗 𝐡 Finally to estimate mean or single response in matrix form, we define the predictor in a matrix form, then multiple with the variance-covariance matrix. 𝚺 𝒃 = 𝝈 𝟐 𝑿 𝒕 𝑿 −𝟏 =𝑴𝑺𝑬 𝑿 𝒕 𝑿 −𝟏 Example (Flavor deterioration): find the point estimator and standard error of 𝑌 ℎ when X h =−6 𝑌 =11.41 𝒔 𝟐 𝒀 𝒉 = =0.021
35
Prediction of new observation
The estimated variance 𝑠 2 𝑝𝑟𝑒𝑑 in matrix form is Now that we have covered the matrix form in SLR, we can use this framework to do multiple linear regression with two or more explanatory variable. Next we will use this framework to do multiple regression where we have more than one explanatory variable (i.e., add another column to the design matrix and add another beta parameter).
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.