Download presentation

Presentation is loading. Please wait.

Published byTori Bermingham Modified over 2 years ago

1
R for psychologists Professor Thom Baguley, Psychology, Nottingham Trent University

2
2 0. Overview 1. What is R? 2. Why use R? 3. R basics 4. R graphics 5. Linear models in R 6. ANOVA and ANCOVA 6. Writing your own R functions 8. Simulation and bootstrapping

3
3 1. What is R? R is a software environment for statistical analysis “the lingua franca of computational statistics” De Leeuw & Mair (2007)

4
4 2. Why use R? It is: -free -open source -cross-platform (Mac, Linux, PC) It has: -excellent basic statistics functions -powerful and versatile graphics -hundreds of user-contributed packages -a large community of users

5
5 3. R basics Some common R objects: -characters (text strings e.g., 'a' or ’1’) -numbers (numbers like 2 or 1e+3) -vectors (1D set of numbers or other elements) -data frames (like vectors organized in columns) -matrices (r by c array of numbers) -lists (objects that contain other objects)

6
6 Assignment Input: >vector1 <- 6 >vector1 Output: [1] 6

7
7 Arithmetic >6 + 6 * 2 [1] 18 > vector1^2 [1] 36

8
8 Calling functions >c(1, 9, 25)> log(vector1) [1] [1] > rnorm(1, 100, 15)?rnorm [1] help(rnorm)

9
9 Loading data > dat1 <- read.csv('pride.csv') > library(foreign) > h05 <- read.spss('Hayden_2005.sav', to.data.frame = TRUE)

10
10 Addressing the contents of an object > vector1[1] [1] 6 > days <- h05$days > days[1:9] [1]

11
11 Addressing the contents of an object > vector1[1] [1] 6 > days <- h05$days > days[1:9] [1] Or combine the steps with h05$days[1:9]

12
12 Data frames > is.data.frame(h05) [1] TRUE > dim(h05) > names(h05) [1] 43 2 [1] "names" "days” -this data frame has 43 rows and 2 named columns -it can be helpful to think of a data frame as a set of named variables (vectors) bound into columns

13
13 Matrix objects (matrices) > cells <- c(3677,56,3924,31) > cat.names <- list(c("Before", "After"), c("Alive", "Dead")) > checklist <- matrix(cells, 2, 2, byrow=TRUE, dimnames= cat.names) > checklist Alive Dead Before After

14
14 The power of objects > chisq.test(checklist) Pearson's Chi-squared test with Yates' continuity correction data: checklist X-squared = , df = 1, p-value = > chisq.test(c(2, 18)) Chi-squared test for given probabilities data: c(2, 18) X-squared = 12.8, df = 1, p-value =

15
15 Calling functions: defaults and named arguments > chisq.test(checklist, correct=FALSE) Pearson's Chi-squared test data: checklist X-squared = , df = 1, p-value = the default argument (for matrix input is) correct=TRUE -setting correct=FALSE over-rides the default -because unnamed arguments to functions are matched in order, this argument must be named (otherwise naming the arguments is optional)

16
16 Exercise 1 (R Basics) -entering data -working with objects -simple statistical functions

17
17 4. R graphics n = 43 > boxplot(h05$days)

18
18 > hist(h05$days)> plot(density(h05$days))

19
19 Distribution functions e.g., Normal distribution dnorm(x, mean = 0, sd = 1) pnorm(q, mean = 0, sd = 1, lower.tail = TRUE) qnorm(p, mean = 0, sd = 1, lower.tail = TRUE) rnorm(n, mean = 0, sd = 1) > rdat <- rnorm(30, 100, 15)

20
20 > curve(dnorm(x), xlim=c(-4,4), col='purple') > curve(dt(x, 1), col='red', add=TRUE, lty=3)

21
21

22
22 Exercise 2 (R Graphics) -exploratory plots -plot parameters -plotting distribution functions -plotting a serial position curve (optional)

23
23 5. Linear models in R > expenses <- read.csv('expenses.csv') > t.test(majority ~ problem, data = expenses) Welch Two Sample t-test data: majority by problem t = , df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group 0 mean in group Data courtesy of Mark Thompson

24
24 > expenses <- read.csv('expenses.csv') > lm(majority ~ problem, data = expenses) Call: lm(formula = majority ~ problem, data = expenses) Coefficients: (Intercept) problem

25
25 > lm.out <- lm(majority ~ problem, data = expenses) > max(rstandard(lm.out)) [1] > AIC(lm.out) [1] > lm.out$coefficients (Intercept) problem > ?lm

26
26 > summary(lm.out) Call: lm(formula = majority ~ problem, data = expenses) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) < 2e-16 *** problem *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4395 on 644 degrees of freedom Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 644 DF, p-value:

27
27 > glm(problem~I(majority/10000), family='binomial', data = expenses) Call: glm(formula = problem ~ I(majority/10000), family = "binomial", data = expenses) Coefficients: (Intercept) I(majority/10000) Degrees of Freedom: 645 Total (i.e. Null); 644 Residual Null Deviance: Residual Deviance: AIC: 845.6

28
28 > lrm.out <- glm(problem ~ I(majority/10000), family='binomial', data = expenses) > plot(election$majority, lrm.out$fitted, col='dark green')

29
29 Exercise 3 (Linear models) - using a formula in a call to a model function -linear models -plotting a regression line -generalized linear models (optional) -plotting confidence bands (optional)

30
30 6. ANOVA > factor1 factor 1 [1] Levels: > library(foreign) > diag.data diag.fit <- lm(descript ~ group, data=diag.data)

31
31 Regression model (dummy coding) > diag.fit Call: lm(formula = descript ~ group, data = diag.data) Coefficients: (Intercept) groupPicture groupFull Diagram groupSegmented

32
32 aov() > aov(diag.fit) Call: aov(formula = diag.fit) Terms: group Residuals Sum of Squares Deg. of Freedom 3 36

33
33 anova() > summary(aov(diag.fit)) > anova(diag.fit) Analysis of Variance Table Response: descript Df Sum Sq Mean Sq F value Pr(>F) group * Residuals Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

34
34 Factorial ANOVA and ANCOVA > lm(DV ~ factor1 + factor2 + factor1:factor2, data=data2) > lm(DV ~ factor1 * factor2, data=data2) > lm(descript ~ group + time, diag.data) > lm(descript ~ 0 + group + scale(time, scale=F), diag.data) > diag.ancov <- lm(descript ~ group + scale(time, scale=F), diag.data)

35
35 Type I and Type II Sums of squares > drop1(diag.ancov, test = 'F') Single term deletions Model: descript ~ group + scale(time, scale = F) Df Sum of Sq RSS AIC F value Pr(F) group ** scale(time, scale = F) * -sequential sums of squares (Type I) tests are given by default -hierarchical sums of squares (Type II) tests are given by the drop1() function [Software such as SAS and SPSS defaults to unique (Type III) SS]

36
36 Repeated measures (within-subjects) factors -a weakness in the base R installation, but easily done using packages such as ez, nlme or lme4 (the latter two able to fit a wider range of repeated measures models) > pride.long <- read.csv("pride_long.csv") > pride.mixed <- aov(accuracy ~ emotion*condition + Error(factor(participant)/emotion), pride.long) > summary(pride.mixed) [Software such as SAS and SPSS defaults to unique (Type III) SS]

37
37 Multiple comparisons and contrasts -you can run contrasts by changing default contrasts in R -Fisher LSD, Tukey’s HSD and p adjustment (e.g., Hochberg, Holm, Bonferroni, FDR) in R base installation -powerful modified Bonferroni (e.g., Shaffer, Westfall) and general linear contrasts available in multcomp package -flexible contrast and estimable functions in gmodels package

38
38 Exercise 4 (ANOVA) - factors -aov() and anova() -ANCOVA -repeated measures (within-subjects) -contrasts and multiple comparisons (optional)

39
39 7. Writing your own functions sd.pool <- function(sd.1, n.1, sd.2, n.2 = n.1){ num <- (n.1-1)*sd.1^2+(n.2-1)*sd.2^2 denom <- n.1+n.2-2 output <- sqrt(num/denom) output } > sd.pool(6.1, 20, 5.3, 25) [1]

40
40 Exercise 5 (Write a function) -pick a simple statistical procedure -write a function to automate it e.g.,pooled standard deviation

41
41 8. Simulation and bootstrapping -distribution functions > rnorm(100, 10, 1) -R boot package -sample() and replicate()

42
42 Bootstrap CIs for a median or timmed mean

43
43 The bootstrap (and other Monte Carlo methods) Bootstrapping involves repeatedly resampling with replacement from a data set e.g., Data set = {0,1} 7 simulated samples: {1,0}, {0,1}, {0,0}, {0,0}, {0,1}, {0,0}, {1,1} Bootstrapping effectively treats the sample distribution as the population distribution > replicate(7, sample(x,2,replace=TRUE))

44
44 Bootstrapping using R > library(boot) > x1.boot <- boot(samp,function(x,i) median(x[i]),R=10000) > boot.ci(x1.boot) BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS Based on bootstrap replicates CALL : boot.ci(boot.out = x1.boot) Intervals : Level Normal Basic 95% ( , ) ( , ) Level Percentile BCa 95% ( , ) ( , ) Calculations and Intervals on Original Scale Normal 95% CI for mean (using t) [-2.00, 1.33]

45
45 Exercise 6 (Bootstrapping) -resample data -construct a simple percentile bootstrap -run a BC a bootstrap using the boot package (optional)

46
46

47
47 9. Advanced statistical modelling in R Multilevel modeling nlme package lme4 package MCMCglmm package Meta-analysis meta package metafor package … and many, many more other packages

48
48 metafor package > m.e <- c(10.7, 16.24, 10.03, 3.65, 5.73) > n.e <- c(31, 57, 9, 17, 10) > m.c <- c(2.87, 6.83, 7.18, 4.65, 7.47) > n.c <- c(14, 52, 9, 18, 10) > sd.pooled <- c(7.88, 9.84, 8.67, 6.34, 5.74) > diff <- m.c - m.e > se.diffs <- sd.pooled * sqrt(1/n.e + 1/n.c)

49
49 > install.packages('metafor') > library(metafor) > meta.out <- rma(yi=diff, sei=se.diffs, method = 'FE') > meta.out Fixed-Effects Model (k = 5) Test for Heterogeneity: Q(df = 4) = , p-val = Model Results: estimate se zval pval ci.lb ci.ub *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

50
50 > funnel(trimfill(meta.out))

51
51 > forest(rma(yi=diff, sei=se.diffs))

52
52

53
53

54
54

55
55

56
56

57
57

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google