Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra.

Similar presentations


Presentation on theme: "BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra."— Presentation transcript:

1 BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra

2 Compact method of expressing mathematical operations (including statistics) Makes linear models easier to compute BIOL 582Matrix operations Scalar: a number Vector: an ordered list (array) of scalars (n rows x 1 cols ) Matrix: a rectangular array of scalars (n rows x p cols ) Nomenclature: elements (or variables) are italicized, matrices are bold. Lowercase = vector; Capital = matrix Many variants in scientific literature

3 Reverse rows and columns Represent by A t or A′ Vector transpose works identically BIOL 582Matrix operations: transpose

4 Matrices must have same dimensions Add/subtract element-wise Vector addition/subtraction works identically Addition Subtraction BIOL 582Matrix operations: addition and subtraction

5 inner Scalar multiplication: Multiply scalar by each element in matrix or vector Matrix/vector multiplication is a summed multiplication Inner dimensions allow multiplication Outer dimensions determine size of result Order of matrices makes a difference: AB ≠ BA AB n 1 × p 1 * n 2 × p 2 BIOL 582Matrix operations: multiplication Inner dimension must agree or multiplication cannot take place

6 Scalar multiplication: Matrix multiplication: BIOL 582Matrix operations

7 Inner (scalar) product: vector multiplication resulting in a scalar (weighted linear combination) Outer (matrix) product: vector multiplication resulting in a matrix Inner Product Outer Product Inner dimensions MUST AGREE!!! BIOL 582Matrix operations

8 BIOL 582Special matrices I: Identity matrix (equivalent to ‘1’ for matrices) 1: A matrix of all ones 0: A matrix of all zeros Diagonal: diagonal contains non-zero elements Square: n = p Symmetric: off-diagonal elements same:

9 Orthogonal: square matrix with property: VERY useful for statistics and other fields (e.g, morphometrics) Orthonormal Example: BIOL 582Special Matrices

10 Cannot divide matrices, so calculate the inverse (reciprocal) of denominator and multiply Inverses have property that: Inverses are tedious to calculate, so in practice we use a computer Only works for square matrices whose determinant ≠ 0 (singular) Determinant: combination of diagonal and off-diagonal elements BIOL 582Matrix operations: division

11 For the 2 x 2 case: Example: Confirm: BIOL 582Matrix operations: invserse

12 The linear equation Can be written in matrix form as where BIOL 582Linear Model using matrix operations

13 Why is it so simple? Consider just this part for a simple example of four subjects and two independent variables: BIOL 582Linear Model using matrix operations

14 The linear model is: The estimated coefficients (parameter estimates) are solved as: How/why? Try to solve for Cannot divide both sides by X Cannot multiply by inverse of X, unless X is square- symmetric BIOL 582Linear Model using matrix operations

15 Making X symmetric: This matrix can be inverted: So, multiplying both sides of by will assist inverting the necessary part Note the dimensions so far: (k x n)(n x 1) = (k x n)(n x k)(k x 1)  (k x 1) = (k x 1) Now multiply both sides by inverse above Which has dimensions: (k x n)(n x k) (k x 1) = (k x n)(n x k) (k x 1)  (k x 1) = (k x 1) BIOL 582Linear Model using matrix operations

16 The equation Simplifies to And the dimensions of each side remain (k x 1) One problem is that the predicted values of the response are unknown without knowing the parameter estimates. However, the best estimates of the response values are the values themselves, so the equation is written as What this means is that one does not have to calculate SS for x and y and solve each coefficient independently! BIOL 582Linear Model using matrix operations

17 Done for a simple linear model of head size as a function of log SVL BIOL 582Example in R using Snake data > snake<-read.csv("snake.data.csv") > attach(snake) > # number of responses > n<-length(HS) > X<-matrix(c(rep(1,n),log(SVL)), nrow=n, ncol=2) > X[1:10,] [,1] [,2] [1,] 1 3.532226 [2,] 1 4.062166 [3,] 1 4.075841 [4,] 1 4.359270 [5,] 1 4.387014 [6,] 1 4.432007 [7,] 1 4.437934 [8,] 1 4.443827 [9,] 1 4.480740 [10,] 1 4.488636 > dim(X) [1] 40 2

18 Done for a simple linear model of head size as a function of log SVL BIOL 582Example in R using Snake data > y<-matrix(HS, nrow=n, ncol=1) > y[1:10,];dim(y) [,1] [1,] 11.40 [2,] 15.30 [3,] 7.16 [4,] 10.50 [5,] 10.30 [6,] 8.25 [7,] 9.74 [8,] 13.10 [9,] 15.10 [10,] 14.10 [1] 40 1

19 Done for a simple linear model of head size as a function of log SVL BIOL 582Example in R using Snake data > B<-solve(t(X)%*%X)%*%t(X)%*%y > B [,1] [1,] -14.377543 [2,] 5.695249 > > # compare to canned function > lm.snake<-lm(HS~log(SVL),x=T) > lm.snake Call: lm(formula = HS ~ log(SVL), x = T) Coefficients: (Intercept) log(SVL) -14.378 5.695

20 Done for a simple linear model of head size as a function of log SVL BIOL 582Example in R using Snake data > # Predictions (fitted values) > y.hat<-X%*%B > y.hat[1:7,] [1] 5.739363 8.757504 8.835389 10.449585 10.607597 10.863840 10.897599 > > # Residuals > e<-y-y.hat > e[1:7,] [1] 5.66063702 6.54249646 -1.67538851 0.05041518 -0.30759682 -2.61383971 -1.15759944 > > # Compare to > predict(lm.snake)[1:7] 1 2 3 4 5 6 7 5.739363 8.757504 8.835389 10.449585 10.607597 10.863840 10.897599 > resid(lm.snake)[1:7] 1 2 3 4 5 6 7 5.66063702 6.54249646 -1.67538851 0.05041518 -0.30759682 -2.61383971 -1.15759944 >

21 After solving How does one determine if any or all coefficients are significant? Do the same thing for a reduced model and compare SSE First, how does one find SSE? First: Then Thus BIOL 582Analysis of variance using matrix operations

22 How is ? Using the snake example… BIOL 582Analysis of variance using matrix operations > SSE.f<-t(e)%*%e > SSE.f [,1] [1,] 236.9725

23 ANOVA step by step for the snake data BIOL 582Analysis of variance using matrix operations > # ANOVA by hand, with matrix operations > > X.f<-matrix(c(rep(1,n),log(SVL)),nrow=n,ncol=2) > X.r<-matrix(rep(1,n),nrow=n,ncol=1) > y<-matrix(HS, nrow=n, ncol=1) > B.f<-solve(t(X.f)%*%X.f)%*%t(X.f)%*%y > B.r<-solve(t(X.r)%*%X.r)%*%t(X.r)%*%y > e.f<-y-X.f%*%B.f > e.r<-y-X.r%*%B.r > SSE.f<-t(e.f)%*%e.f > SSE.r<-t(e.r)%*%e.r > > SSE.f [,1] [1,] 236.9725 > SSE.r [,1] [1,] 525.9513 > k.f<-ncol(X.f);k.r<-ncol(X.r) > F.snake<-((SSE.r-SSE.f)/(k.f-k.r))/(SSE.f/(n-k.f)) > F.snake [,1] [1,] 46.33955 > P.value<-1-pf(F.snake,(k.f-k.r),(n-k.f)) > P.value [,1] [1,] 4.487631e-08 > R2<-(SSE.r-SSE.f)/(SSE.r) # only because X.r includes only an intercept > R2 [,1] [1,] 0.5494403

24 ANOVA for the snake data, this time relying on lm functions BIOL 582Analysis of variance using matrix operations > # ANOVA first using lm, then matrix operations > > lm.f<-lm(HS~log(SVL),x=T) > lm.r<-lm(HS~1,x=T) > e.f<-resid(lm.f) > e.r<-resid(lm.r) > SSE.f<-t(e.f)%*%e.f > SSE.r<-t(e.r)%*%e.r > > SSE.f [,1] [1,] 236.9725 > SSE.r [,1] [1,] 525.9513 > > k.f<-ncol(X.f);k.r<-ncol(X.r) > F.snake<-((SSE.r-SSE.f)/(k.f-k.r))/(SSE.f/(n-k.f)) > F.snake [,1] [1,] 46.33955 > P.value<-1-pf(F.snake,(k.f-k.r),(n-k.f)) > P.value [,1] [1,] 4.487631e-08 > > R2<-(SSE.r-SSE.f)/(SSE.r) > R2 [,1] [1,] 0.5494403

25 ANOVA for the snake data, what R does should be clear now BIOL 582Analysis of variance using matrix operations > # ANOVA via model comparison method > > lm.f<-lm(HS~log(SVL),x=T) > lm.r<-lm(HS~1,x=T) > > anova(lm.r,lm.f) Analysis of Variance Table Model 1: HS ~ 1 Model 2: HS ~ log(SVL) Res.Df RSS Df Sum of Sq F Pr(>F) 1 39 525.95 2 38 236.97 1 288.98 46.34 4.488e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >

26 ANOVA for the snake data, what R does should be clear now BIOL 582Analysis of variance using matrix operations > # or just a model summary > summary(lm.f) Call: lm(formula = HS ~ log(SVL), x = T) Residuals: Min 1Q Median 3Q Max -4.4953 -1.6932 -0.3986 1.1925 6.5425 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -14.3775 3.5088 -4.098 0.000211 *** log(SVL) 5.6952 0.8366 6.807 4.49e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.497 on 38 degrees of freedom Multiple R-squared: 0.5494,Adjusted R-squared: 0.5376 F-statistic: 46.34 on 1 and 38 DF, p-value: 4.488e-08

27 It is worth looking at design matrices… BIOL 582Analysis of variance using matrix operations > X.f [,1] [,2] [1,] 1 3.532226 [2,] 1 4.062166 [3,] 1 4.075841 [4,] 1 4.359270 [5,] 1 4.387014 [6,] 1 4.432007 [7,] 1 4.437934 [8,] 1 4.443827 [9,] 1 4.480740 [10,] 1 4.488636 [11,] 1 4.509760 [12,] 1 4.514151 [13,] 1 3.918005 [14,] 1 4.146304 [15,] 1 4.309456 [16,] 1 4.423648 [17,] 1 4.479607 [18,] 1 4.490881 [19,] 1 4.499810 [20,] 1 4.567468 [21,] 1 4.603168 [22,] 1 4.614130 [23,] 1 4.668145 [24,] 1 4.700480 [25,] 1 4.720283 [26,] 1 3.303217 [27,] 1 3.411148 [28,] 1 3.540959 [29,] 1 4.326778 [30,] 1 4.398146 > lm.f$x (Intercept) log(SVL) 1 1 3.532226 2 1 4.062166 3 1 4.075841 4 1 4.359270 5 1 4.387014 6 1 4.432007 7 1 4.437934 8 1 4.443827 9 1 4.480740 10 1 4.488636 11 1 4.509760 12 1 4.514151 13 1 3.918005 14 1 4.146304 15 1 4.309456 16 1 4.423648 17 1 4.479607 18 1 4.490881 19 1 4.499810 20 1 4.567468 21 1 4.603168 22 1 4.614130 23 1 4.668145 24 1 4.700480 25 1 4.720283 26 1 3.303217 27 1 3.411148 28 1 3.540959 29 1 4.326778 30 1 4.398146 > X.r [,1] [1,] 1 [2,] 1 [3,] 1 [4,] 1 [5,] 1 [6,] 1 [7,] 1 [8,] 1 [9,] 1 [10,] 1 [11,] 1 [12,] 1 [13,] 1 [14,] 1 [15,] 1 [16,] 1 [17,] 1 [18,] 1 [19,] 1 [20,] 1 [21,] 1 [22,] 1 [23,] 1 [24,] 1 [25,] 1 [26,] 1 [27,] 1 [28,] 1 [29,] 1 [30,] 1 > lm.r$x (Intercept) 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 1 20 1 21 1 22 1 23 1 24 1 25 1 26 1 27 1 28 1 29 1 30 1

28 Now for an example for a single factor ANOVA BIOL 582Analysis of variance using matrix operations > # Single factor Anova example, relying more so on lm commands > lm.f<-lm(HS~Sex,x=T) > lm.f$x (Intercept) SexM 1 1 0 2 1 0 3 1 0 4 1 0 5 1 0 6 1 0 7 1 0 8 1 0 9 1 0 10 1 0 11 1 0 12 1 0 13 1 1 14 1 1 15 1 1 16 1 1 17 1 1 18 1 1 19 1 1 20 1 1 21 1 1 22 1 1 23 1 1 24 1 1 25 1 1 26 1 0 27 1 0 28 1 0 29 1 0 30 1 0 > lm.f<-lm(HS~Sex,x=T) > X.f<-lm.f$x > lm.r<-lm(HS~1,x=T) > X.r<-lm.r$x > y<-HS > B.f<-solve(t(X.f)%*%X.f)%*%t(X.f)%*%y > B.r<-solve(t(X.r)%*%X.r)%*%t(X.r)%*%y > e.f<-y-X.f%*%B.f > e.r<-y-X.r%*%B.r > SSE.f<-t(e.f)%*%e.f > SSE.r<-t(e.r)%*%e.r > > SSE.f [,1] [1,] 522.5028 > SSE.r [,1] [1,] 525.9513 > k.f<-ncol(X.f);k.r<-ncol(X.r) > F.snake<-((SSE.r-SSE.f)/(k.f-k.r))/(SSE.f/(n-k.f)) > F.snake [,1] [1,] 0.2508023 > P.value<-1-pf(F.snake,(k.f-k.r),(n-k.f)) > P.value [,1] [1,] 0.6193992 > > R2<-(SSE.r-SSE.f)/(SSE.r) > R2 [,1] [1,] 0.006556786

29 Now for an example for a single factor ANOVA BIOL 582Analysis of variance using matrix operations > summary(lm.f) Call: lm(formula = HS ~ Sex, x = T) Residuals: Min 1Q Median 3Q Max -5.911 -2.356 0.069 3.337 5.619 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 9.6811 0.8740 11.077 1.85e-13 *** SexM -0.5902 1.1785 -0.501 0.619 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.708 on 38 degrees of freedom Multiple R-squared: 0.006557,Adjusted R-squared: -0.01959 F-statistic: 0.2508 on 1 and 38 DF, p-value: 0.6194 > B.f [,1] (Intercept) 9.681111 SexM -0.590202 >


Download ppt "BIOL 582 Lecture Set 19 Matrices, Matrix calculations, Linear models using linear algebra."

Similar presentations


Ads by Google