Download presentation

Presentation is loading. Please wait.

Published byAmerica Whipp Modified over 4 years ago

1
R for Macroecology Tests and models

2
Smileys

3
Homework Solutions to the color assignment problem?

4
Statistical tests in R This is why we are using R (and not C++)! t.test() aov() lm() glm() And many more

5
Read the documentation! With statistical tests, its particularly important to read and understand the documentation of each function you use They may do some complicated things with options, and you want to make sure they do what you want Default behaviors can change (with, e.g. sample size)

6
Returns from statistical tests Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > t.test(x,y) Welch Two Sample t-test data: x and y t = -1.4771, df = 18, p-value = 0.1569 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -4.8446619 0.8446619 sample estimates: mean of x mean of y 5.5 7.5

7
Returns from statistical tests Statistical tests are functions, so they return objects > x = 1:10 > y = 3:12 > test = t.test(x,y) > str(test) List of 9 $ statistic : Named num -1.48..- attr(*, "names")= chr "t" $ parameter : Named num 18..- attr(*, "names")= chr "df" $ p.value : num 0.157 $ conf.int : atomic [1:2] -4.845 0.845..- attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] 5.5 7.5..- attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0..- attr(*, "names")= chr "difference in means" $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "x and y" - attr(*, "class")= chr "htest" t.test() returns a list

8
Returns from statistical tests Getting the results out This hopefully looks familiar after last week’s debugging > x = 1:10 > y = 3:12 > test = t.test(x,y) > test$p.value [1] 0.1569322 > test$conf.int[2] [1] 0.8446619 > test[[3]] [1] 0.1569322

9
Model specification Models in R use a common syntax: Y ~ X 1 + X 2 + X 3... + X i Means Y is a linear function of X 1:j

10
Linear models Basic linear models are fit with lm() Again, lm() returns a list > x = 1:10 > y = 3:12 > test = lm(y ~ x) > test Call: lm(formula = y ~ x) Coefficients: (Intercept) x 2 1

11
Linear models summary() is helpful for looking at a model > summary(test) Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -1.293e-15 -1.986e-16 1.343e-16 3.689e-16 6.149e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.000e+00 4.019e-16 4.976e+15 <2e-16 *** x 1.000e+00 6.477e-17 1.544e+16 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5.883e-16 on 8 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: 2.384e+32 on 1 and 8 DF, p-value: < 2.2e-16

12
Extracting coefficients, P values For the list (e.g. “test”) returned by lm(), test$coefficients will give the coefficients, but not the std.error or p value. Instead, use summary(test)$coefficients

13
Model specification - interactions Interactions are specified with a * or a : X 1 * X 2 means X 1 + X 2 + X 1 :X 2 (X 1 + X 2 + X 3 )^2 means each term and all second-order interactions - removes terms constants are included by default, but can be removed with “-1” more help available using ?formula

14
Quadratic terms Because ^2 means something specific in the context of a model, if you want to square one of your predictors, you have to do something special: > x = 1:10 > y = 3:12 > test = lm(y ~ x + x^2) > test$coefficients (Intercept) x 2 1 > test = lm(y ~ x + I(x^2)) > test$coefficients (Intercept) x I(x^2) 2.000000e+00 1.000000e+00 -4.401401e-17

15
A break to try things out t test anova linear models

16
Plotting a test object plot(t.test(x,y)) does nothing plot(lm(y~x)) plots diagnostic graphs

17
Forward and backward selection Uses the step() function Object – starting model Scope – specifies the range of models to consider Direction – backward, forward or both? Trace – print to the screen? Steps – set a maximum number of steps k – penalization for adding variables (2 means AIC)

18
> x1 = runif(100) > x2 = runif(100) > x3 = runif(100) > x4 = runif(100) > x5 = runif(100) > x6 = runif(100) > y = x1+x2+x3+runif(100) > model = step(lm(y~1),scope = y~x1+x2+x3+x4+x5+x6,direction = "both",trace = F) > summary(model) Call: lm(formula = y ~ x2 + x3 + x1) Residuals: Min 1Q Median 3Q Max -0.517640 -0.264732 -0.003983 0.241431 0.525636 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.47422 0.09381 5.055 2.06e-06 *** x2 0.90890 0.09887 9.192 8.08e-15 *** x3 1.11036 0.10555 10.520 < 2e-16 *** x1 1.00836 0.10865 9.281 5.23e-15 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3059 on 96 degrees of freedom Multiple R-squared: 0.7701, Adjusted R-squared: 0.7629 F-statistic: 107.2 on 3 and 96 DF, p-value: < 2.2e-16

19
All subsets selection leaps(x=, y=, wt=rep(1, NROW(x)), int=TRUE, method=c("Cp", "adjr2", "r2"), nbest=10, names=NULL, df=NROW(x), strictly.compatible=TRUE) x – a matrix of predictors y – a vector of the response method – how to compare models (Mallows C p, adjusted R 2, or R 2 ) nbest – number of models of each size to return

20
> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leaps(x = Xmat, y, method = "Cp", nbest = 2) $which 1 2 3 4 5 6 1 FALSE TRUE FALSE FALSE FALSE FALSE 1 TRUE FALSE FALSE FALSE FALSE FALSE 2 TRUE FALSE TRUE FALSE FALSE FALSE 2 FALSE TRUE TRUE FALSE FALSE FALSE 3 TRUE TRUE TRUE FALSE FALSE FALSE 3 FALSE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE TRUE FALSE FALSE 4 TRUE TRUE TRUE FALSE TRUE FALSE 5 TRUE TRUE TRUE TRUE TRUE FALSE 5 TRUE TRUE TRUE TRUE FALSE TRUE 6 TRUE TRUE TRUE TRUE TRUE TRUE $label [1] "(Intercept)" "1" "2" "3" "4" "5" "6" $size [1] 2 2 3 3 4 4 5 5 6 6 7 $Cp [1] 191.732341 197.672391 85.498456 87.117764 3.466464 81.286117 3.579295 5.078805 [9] 5.138063 5.509598 7.000000 > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2)

21
> Xmat = cbind(x1,x2,x3,x4,x5,x6) > leapOut = leaps(x = Xmat, y, method = "Cp", nbest = 2) > aicVals = NULL > for(i in 1:nrow(leapOut$which)) > { > model = as.formula(paste("y~",paste(c("x1","x2","x3","x4", "x5","x6")[leapOut$which[i,]],collapse = "+"))) > test = lm(model) > aicVals[i] = AIC(test) > } > aicVals [1] 159.13825 161.18166 113.95184 114.84992 52.81268 112.42959 52.81609 [8] 54.40579 54.34347 54.74159 56.19513 > i = 8 > leapOut$which[i,] 1 2 3 4 5 6 TRUE TRUE TRUE FALSE TRUE FALSE > c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]] [1] "x1" "x2" "x3" "x5" > paste("y~",paste(c("x1","x2","x3","x4","x5","x6")[leapOut$which[i,]],collapse = "+")) [1] "y~ x1+x2+x3+x5"

22
Comparing AIC of the best models > data.frame(leapOut$which,aicVals) X1 X2 X3 X4 X5 X6 aicVals 1 FALSE TRUE FALSE FALSE FALSE FALSE 159.13825 2 TRUE FALSE FALSE FALSE FALSE FALSE 161.18166 3 TRUE FALSE TRUE FALSE FALSE FALSE 113.95184 4 FALSE TRUE TRUE FALSE FALSE FALSE 114.84992 5 TRUE TRUE TRUE FALSE FALSE FALSE 52.81268 6 FALSE TRUE TRUE TRUE FALSE FALSE 112.42959 7 TRUE TRUE TRUE TRUE FALSE FALSE 52.81609 8 TRUE TRUE TRUE FALSE TRUE FALSE 54.40579 9 TRUE TRUE TRUE TRUE TRUE FALSE 54.34347 10 TRUE TRUE TRUE TRUE FALSE TRUE 54.74159 11 TRUE TRUE TRUE TRUE TRUE TRUE 56.19513

23
Practice with the mammal data VIF, lm(), AIC(), leaps()

Similar presentations

OK

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.

1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 4b, February 20, 2015 Lab: regression, kNN and K- means results, interpreting and evaluating models.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google