# R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

## Presentation on theme: "R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?"— Presentation transcript:

R for Macroecology Tests and models

Smileys

Homework  Solutions to the color assignment problem?

Statistical tests in R  This is why we are using R (and not C++)!  t.test()  aov()  lm()  glm()  And many more

Read the documentation!  With statistical tests, its particularly important to read and understand the documentation of each function you use  They may do some complicated things with options, and you want to make sure they do what you want  Default behaviors can change (with, e.g. sample size)

Returns from statistical tests  Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > t.test(x,y) Welch Two Sample t-test data: x and y t = -1.4771, df = 18, p-value = 0.1569 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.8446619 0.8446619 sample estimates: mean of x mean of y 5.5 7.5

Returns from statistical tests  Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > test = t.test(x,y) > str(test) List of 9 \$ statistic : Named num -1.48..- attr(*, "names")= chr "t" \$ parameter : Named num 18..- attr(*, "names")= chr "df" \$ p.value : num 0.157 \$ conf.int : atomic [1:2] -4.845 0.845..- attr(*, "conf.level")= num 0.95 \$ estimate : Named num [1:2] 5.5 7.5..- attr(*, "names")= chr [1:2] "mean of x" "mean of y" \$ null.value : Named num 0..- attr(*, "names")= chr "difference in means" \$ alternative: chr "two.sided" \$ method : chr "Welch Two Sample t-test" \$ data.name : chr "x and y" - attr(*, "class")= chr "htest" t.test() returns a list

Returns from statistical tests  Getting the results out  This hopefully looks familiar after last week’s debugging > x = 1:10 > y = 3:12 > test = t.test(x,y) > test\$p.value [1] 0.1569322 > test\$conf.int[2] [1] 0.8446619 > test[[3]] [1] 0.1569322

Model specification  Models in R use a common syntax:  Y ~ X 1 + X 2 + X 3... + X i  Means Y is a linear function of X 1:j

Linear models  Basic linear models are fit with lm()  Again, lm() returns a list > x = 1:10 > y = 3:12 > test = lm(y ~ x) > test Call: lm(formula = y ~ x) Coefficients: (Intercept) x 2 1

Linear models  summary() is helpful for looking at a model > summary(test) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -1.293e-15 -1.986e-16 1.343e-16 3.689e-16 6.149e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.000e+00 4.019e-16 4.976e+15 <2e-16 *** x 1.000e+00 6.477e-17 1.544e+16 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.883e-16 on 8 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.384e+32 on 1 and 8 DF, p-value: < 2.2e-16

Extracting coefficients, P values  For the list (e.g. “test”) returned by lm(), test\$coefficients will give the coefficients, but not the std.error or p value.  Instead, use summary(test)\$coefficients

Model specification - interactions  Interactions are specified with a * or a :  X 1 * X 2 means X 1 + X 2 + X 1 :X 2  (X 1 + X 2 + X 3 )^2 means each term and all second-order interactions  - removes terms  constants are included by default, but can be removed with “-1”  more help available using ?formula

Quadratic terms  Because ^2 means something specific in the context of a model, if you want to square one of your predictors, you have to do something special: > x = 1:10 > y = 3:12 > test = lm(y ~ x + x^2) > test\$coefficients (Intercept) x 2 1 > test = lm(y ~ x + I(x^2)) > test\$coefficients (Intercept) x I(x^2) 2.000000e+00 1.000000e+00 -4.401401e-17

A break to try things out  t test  anova  linear models

Plotting a test object  plot(t.test(x,y)) does nothing  plot(lm(y~x)) plots diagnostic graphs

Forward and backward selection  Uses the step() function Object – starting model Scope – specifies the range of models to consider Direction – backward, forward or both? Trace – print to the screen? Steps – set a maximum number of steps k – penalization for adding variables (2 means AIC)

> x1 = runif(100) > x2 = runif(100) > x3 = runif(100) > x4 = runif(100) > x5 = runif(100) > x6 = runif(100) > y = x1+x2+x3+runif(100) > model = step(lm(y~1),scope = y~x1+x2+x3+x4+x5+x6,direction = "both",trace = F) > summary(model) Call: lm(formula = y ~ x2 + x3 + x1) Residuals: Min 1Q Median 3Q Max -0.517640 -0.264732 -0.003983 0.241431 0.525636 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.47422 0.09381 5.055 2.06e-06 *** x2 0.90890 0.09887 9.192 8.08e-15 *** x3 1.11036 0.10555 10.520 < 2e-16 *** x1 1.00836 0.10865 9.281 5.23e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3059 on 96 degrees of freedom Multiple R-squared: 0.7701, Adjusted R-squared: 0.7629 F-statistic: 107.2 on 3 and 96 DF, p-value: < 2.2e-16

All subsets selection leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE) x – a matrix of predictors y – a vector of the response method – how to compare models (Mallows C p, adjusted R 2, or R 2 ) nbest – number of models of each size to return

> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leaps(x = Xmat, y, method = "Cp", nbest = 2) \$which 1 2 3 4 5 6 1 FALSE TRUE FALSE FALSE FALSE FALSE 1 TRUE FALSE FALSE FALSE FALSE FALSE 2 TRUE FALSE TRUE FALSE FALSE FALSE 2 FALSE TRUE TRUE FALSE FALSE FALSE 3 TRUE TRUE TRUE FALSE FALSE FALSE 3 FALSE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE FALSE TRUE FALSE 5 TRUE TRUE TRUE TRUE TRUE FALSE 5 TRUE TRUE TRUE TRUE FALSE TRUE 6 TRUE TRUE TRUE TRUE TRUE TRUE \$label [1] "(Intercept)" "1" "2" "3" "4" "5" "6" \$size [1] 2 2 3 3 4 4 5 5 6 6 7 \$Cp [1] 191.732341 197.672391 85.498456 87.117764 3.466464 81.286117 3.579295 5.078805 [9] 5.138063 5.509598 7.000000 > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)

> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2) > aicVals = NULL > for(i in 1:nrow(leapOut\$which)) > { > model = as.formula(paste("y~",paste(c("x1","x2","x3","x4", "x5","x6")[leapOut\$which[i,]],collapse = "+"))) > test = lm(model) > aicVals[i] = AIC(test) > } > aicVals [1] 159.13825 161.18166 113.95184 114.84992 52.81268 112.42959 52.81609 [8] 54.40579 54.34347 54.74159 56.19513 > i = 8 > leapOut\$which[i,] 1 2 3 4 5 6 TRUE TRUE TRUE FALSE TRUE FALSE > c("x1","x2","x3","x4","x5","x6")[leapOut\$which[i,]] [1] "x1" "x2" "x3" "x5" > paste("y~",paste(c("x1","x2","x3","x4","x5","x6")[leapOut\$which[i,]],collapse = "+")) [1] "y~ x1+x2+x3+x5"

Comparing AIC of the best models > data.frame(leapOut\$which,aicVals) X1 X2 X3 X4 X5 X6 aicVals 1 FALSE TRUE FALSE FALSE FALSE FALSE 159.13825 2 TRUE FALSE FALSE FALSE FALSE FALSE 161.18166 3 TRUE FALSE TRUE FALSE FALSE FALSE 113.95184 4 FALSE TRUE TRUE FALSE FALSE FALSE 114.84992 5 TRUE TRUE TRUE FALSE FALSE FALSE 52.81268 6 FALSE TRUE TRUE TRUE FALSE FALSE 112.42959 7 TRUE TRUE TRUE TRUE FALSE FALSE 52.81609 8 TRUE TRUE TRUE FALSE TRUE FALSE 54.40579 9 TRUE TRUE TRUE TRUE TRUE FALSE 54.34347 10 TRUE TRUE TRUE TRUE FALSE TRUE 54.74159 11 TRUE TRUE TRUE TRUE TRUE TRUE 56.19513

Practice with the mammal data  VIF, lm(), AIC(), leaps()

Download ppt "R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?"

Similar presentations