Presentation is loading. Please wait.

Presentation is loading. Please wait.

R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?

Similar presentations


Presentation on theme: "R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?"— Presentation transcript:

1 R for Macroecology Tests and models

2 Smileys

3 Homework  Solutions to the color assignment problem?

4 Statistical tests in R  This is why we are using R (and not C++)!  t.test()  aov()  lm()  glm()  And many more

5 Read the documentation!  With statistical tests, its particularly important to read and understand the documentation of each function you use  They may do some complicated things with options, and you want to make sure they do what you want  Default behaviors can change (with, e.g. sample size)

6 Returns from statistical tests  Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > t.test(x,y) Welch Two Sample t-test data: x and y t = , df = 18, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y

7 Returns from statistical tests  Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > test = t.test(x,y) > str(test) List of 9 $ statistic : Named num attr(*, "names")= chr "t" $ parameter : Named num attr(*, "names")= chr "df" $ p.value : num $ conf.int : atomic [1:2] attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0..- attr(*, "names")= chr "difference in means" $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "x and y" - attr(*, "class")= chr "htest" t.test() returns a list

8 Returns from statistical tests  Getting the results out  This hopefully looks familiar after last week’s debugging > x = 1:10 > y = 3:12 > test = t.test(x,y) > test$p.value [1] > test$conf.int[2] [1] > test[[3]] [1]

9 Model specification  Models in R use a common syntax:  Y ~ X 1 + X 2 + X X i  Means Y is a linear function of X 1:j

10 Linear models  Basic linear models are fit with lm()  Again, lm() returns a list > x = 1:10 > y = 3:12 > test = lm(y ~ x) > test Call: lm(formula = y ~ x) Coefficients: (Intercept) x 2 1

11 Linear models  summary() is helpful for looking at a model > summary(test) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max e e e e e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.000e e e+15 <2e-16 *** x 1.000e e e+16 <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.883e-16 on 8 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.384e+32 on 1 and 8 DF, p-value: < 2.2e-16

12 Extracting coefficients, P values  For the list (e.g. “test”) returned by lm(), test$coefficients will give the coefficients, but not the std.error or p value.  Instead, use summary(test)$coefficients

13 Model specification - interactions  Interactions are specified with a * or a :  X 1 * X 2 means X 1 + X 2 + X 1 :X 2  (X 1 + X 2 + X 3 )^2 means each term and all second-order interactions  - removes terms  constants are included by default, but can be removed with “-1”  more help available using ?formula

14 Quadratic terms  Because ^2 means something specific in the context of a model, if you want to square one of your predictors, you have to do something special: > x = 1:10 > y = 3:12 > test = lm(y ~ x + x^2) > test$coefficients (Intercept) x 2 1 > test = lm(y ~ x + I(x^2)) > test$coefficients (Intercept) x I(x^2) e e e-17

15 A break to try things out  t test  anova  linear models

16 Plotting a test object  plot(t.test(x,y)) does nothing  plot(lm(y~x)) plots diagnostic graphs

17 Forward and backward selection  Uses the step() function Object – starting model Scope – specifies the range of models to consider Direction – backward, forward or both? Trace – print to the screen? Steps – set a maximum number of steps k – penalization for adding variables (2 means AIC)

18 > x1 = runif(100) > x2 = runif(100) > x3 = runif(100) > x4 = runif(100) > x5 = runif(100) > x6 = runif(100) > y = x1+x2+x3+runif(100) > model = step(lm(y~1),scope = y~x1+x2+x3+x4+x5+x6,direction = "both",trace = F) > summary(model) Call: lm(formula = y ~ x2 + x3 + x1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) e-06 *** x e-15 *** x < 2e-16 *** x e-15 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 96 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 3 and 96 DF, p-value: < 2.2e-16

19 All subsets selection leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE) x – a matrix of predictors y – a vector of the response method – how to compare models (Mallows C p, adjusted R 2, or R 2 ) nbest – number of models of each size to return

20 > Xmat = cbind(x1,x2,x3,x4,x5,x6) > leaps(x = Xmat, y, method = "Cp", nbest = 2) $which FALSE TRUE FALSE FALSE FALSE FALSE 1 TRUE FALSE FALSE FALSE FALSE FALSE 2 TRUE FALSE TRUE FALSE FALSE FALSE 2 FALSE TRUE TRUE FALSE FALSE FALSE 3 TRUE TRUE TRUE FALSE FALSE FALSE 3 FALSE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE FALSE TRUE FALSE 5 TRUE TRUE TRUE TRUE TRUE FALSE 5 TRUE TRUE TRUE TRUE FALSE TRUE 6 TRUE TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" "5" "6" $size [1] $Cp [1] [9] > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)

21 > Xmat = cbind(x1,x2,x3,x4,x5,x6) > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2) > aicVals = NULL > for(i in 1:nrow(leapOut$which)) > { > model = as.formula(paste("y~",paste(c("x1","x2","x3","x4", "x5","x6")[leapOut$which[i,]],collapse = "+"))) > test = lm(model) > aicVals[i] = AIC(test) > } > aicVals [1] [8] > i = 8 > leapOut$which[i,] TRUE TRUE TRUE FALSE TRUE FALSE > c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]] [1] "x1" "x2" "x3" "x5" > paste("y~",paste(c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]],collapse = "+")) [1] "y~ x1+x2+x3+x5"

22 Comparing AIC of the best models > data.frame(leapOut$which,aicVals) X1 X2 X3 X4 X5 X6 aicVals 1 FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

23 Practice with the mammal data  VIF, lm(), AIC(), leaps()


Download ppt "R for Macroecology Tests and models. Smileys Homework  Solutions to the color assignment problem?"

Similar presentations


Ads by Google