R for Macroecology Tests and models

Homework Solutions to the color assignment problem?

Statistical tests in R This is why we are using R (and not C++)! t.test() aov() lm() glm() And many more

Read the documentation! With statistical tests, its particularly important to read and understand the documentation of each function you use They may do some complicated things with options, and you want to make sure they do what you want Default behaviors can change (with, e.g. sample size)

Returns from statistical tests Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > t.test(x,y) Welch Two Sample t-test data: x and y t = , df = 18, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

Returns from statistical tests Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > test = t.test(x,y) > str(test) List of 9 $ statistic : Named num attr(*, "names")= chr "t" $ parameter : Named num attr(*, "names")= chr "df" $ p.value : num $ conf.int : atomic [1:2] attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0..- attr(*, "names")= chr "difference in means" $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "x and y" - attr(*, "class")= chr "htest" t.test() returns a list

Returns from statistical tests Getting the results out This hopefully looks familiar after last week’s debugging > x = 1:10 > y = 3:12 > test = t.test(x,y) > test$p.value [1] > test$conf.int[2] [1] > test[[3]] [1]

Model specification Models in R use a common syntax: Y ~ X 1 + X 2 + X X i Means Y is a linear function of X 1:j

Linear models Basic linear models are fit with lm() Again, lm() returns a list > x = 1:10 > y = 3:12 > test = lm(y ~ x) > test Call: lm(formula = y ~ x) Coefficients: (Intercept) x 2 1

Linear models summary() is helpful for looking at a model > summary(test) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max e e e e e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.000e e e+15 <2e-16 *** x 1.000e e e+16 <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.883e-16 on 8 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.384e+32 on 1 and 8 DF, p-value: < 2.2e-16

Extracting coefficients, P values For the list (e.g. “test”) returned by lm(), test$coefficients will give the coefficients, but not the std.error or p value. Instead, use summary(test)$coefficients

Model specification - interactions Interactions are specified with a * or a : X 1 * X 2 means X 1 + X 2 + X 1 :X 2 (X 1 + X 2 + X 3 )^2 means each term and all second-order interactions - removes terms constants are included by default, but can be removed with “-1” more help available using ?formula

Quadratic terms Because ^2 means something specific in the context of a model, if you want to square one of your predictors, you have to do something special: > x = 1:10 > y = 3:12 > test = lm(y ~ x + x^2) > test$coefficients (Intercept) x 2 1 > test = lm(y ~ x + I(x^2)) > test$coefficients (Intercept) x I(x^2) e e e-17

A break to try things out t test anova linear models

Plotting a test object plot(t.test(x,y)) does nothing plot(lm(y~x)) plots diagnostic graphs

Forward and backward selection Uses the step() function Object – starting model Scope – specifies the range of models to consider Direction – backward, forward or both? Trace – print to the screen? Steps – set a maximum number of steps k – penalization for adding variables (2 means AIC)

> x1 = runif(100) > x2 = runif(100) > x3 = runif(100) > x4 = runif(100) > x5 = runif(100) > x6 = runif(100) > y = x1+x2+x3+runif(100) > model = step(lm(y~1),scope = y~x1+x2+x3+x4+x5+x6,direction = "both",trace = F) > summary(model) Call: lm(formula = y ~ x2 + x3 + x1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** x e-15 *** x < 2e-16 *** x e-15 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 96 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 96 DF, p-value: < 2.2e-16

All subsets selection leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE) x – a matrix of predictors y – a vector of the response method – how to compare models (Mallows C p, adjusted R 2, or R 2 ) nbest – number of models of each size to return

> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leaps(x = Xmat, y, method = "Cp", nbest = 2) $which FALSE TRUE FALSE FALSE FALSE FALSE 1 TRUE FALSE FALSE FALSE FALSE FALSE 2 TRUE FALSE TRUE FALSE FALSE FALSE 2 FALSE TRUE TRUE FALSE FALSE FALSE 3 TRUE TRUE TRUE FALSE FALSE FALSE 3 FALSE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE FALSE TRUE FALSE 5 TRUE TRUE TRUE TRUE TRUE FALSE 5 TRUE TRUE TRUE TRUE FALSE TRUE 6 TRUE TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" "5" "6" $size [1] $Cp [1] [9] > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)

> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2) > aicVals = NULL > for(i in 1:nrow(leapOut$which)) > { > model = as.formula(paste("y~",paste(c("x1","x2","x3","x4", "x5","x6")[leapOut$which[i,]],collapse = "+"))) > test = lm(model) > aicVals[i] = AIC(test) > } > aicVals [1] [8] > i = 8 > leapOut$which[i,] TRUE TRUE TRUE FALSE TRUE FALSE > c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]] [1] "x1" "x2" "x3" "x5" > paste("y~",paste(c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]],collapse = "+")) [1] "y~ x1+x2+x3+x5"

Comparing AIC of the best models > data.frame(leapOut$which,aicVals) X1 X2 X3 X4 X5 X6 aicVals 1 FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Practice with the mammal data VIF, lm(), AIC(), leaps()

