Presentation is loading. Please wait.

Presentation is loading. Please wait.

ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?

Similar presentations


Presentation on theme: "ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?"— Presentation transcript:

1 ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?

2 Outline The three Ws of R: What, Where, and Why Commonly used operators Formatting your data for R Working with data in R Exporting data from R

3 What is R? “R is a language and environment for statistical computing and graphics.” (http://www.r-project.org) It’s a programming language first and a statistical analysis tool second Entirely syntax based – Similar to the SAS and SPSS syntax User can download “packages” which are similar to SPSS Modules

4 Where is R? Available for download at:  http://www.r-project.org/ http://www.r-project.org/ Works on PCs, Macs, and Linux OS Doesn’t require a ton of computer memory (I find that it runs smoother than both SAS and SPSS)

5 Why use R? It’s an open-source project aka FREE! It’s gaining traction in both industry (Google, Facebook, & Kickstarter) and academia It’s combination of programming flexibility and statistical analyses capabilities makes it one of the more powerful data analysis programs out there

6 Commonly used operators <- :Assignment operator # :Comment operator >, <, ==, | :Boolean operators +, -, *, ^ :Mathematical operators

7 Formatting your data for R: A Brief Intro R can read SAS, SPSS, STATA, txt files, and csv files I recommend that you store your data in a csv file  R can easily read csv files  Csv files can be imported to and exported from SAS and SPSS  Other statistical programs can easily read csv files I write all of my code in notepad (more habit than anything else), but R has many different GUIs

8 Formatting your data for R: Three easy steps 1) Turn your data file into a csv file 2) Use the read.csv() function  Dataset <- read.csv(‘Dataset location.csv’) 3) Dataset is now a user-defined object (in this particular case it’s a dataframe in R) that contains all of your data

9 Formatting your data for R: Common Mistakes (That I’ve made 100 times over) R cannot read \ (the backslash), thus when you write the location of your dataset you have to use either / or \\ R is case sensitive, so ‘C:\\Dataset.csv’ and ‘C:\\dataset.csv’ are not the same in R speak Always make sure you include the file extension (.csv,.txt,.whatever)!

10 Working with data in R: Things to check I always check the dimensions of my dataset  dim(Dataset) – this will return two numbers: row x column. Rows = number of cases and columns = number of variables Check the names of your dataset  names(Dataset) Check the descriptive statistics for anything out of the ordinary:  colMeans(Dataset[,1:10],na.rm=T)  sapply(Dataset[,1:10],sd,na.rm=T) Notice the brackets?

11 Working with data in R: Subsetting your dataset First, begin thinking about your dataset as a matrix  Rows = cases and columns = variables Dataset[5,1] means return the observation stored in row 5 column 1 Dataset[,1] means return all of the rows in column 1 Dataset[2,1:5] means return all of the observations in row 2 and columns 1 through 5

12 Working with data in R: Subsetting your dataset Alternatively, you can reference a column directly by using the $ operator:  Dataset$Var1 will return the entire Var1 column from Dataset What if I want to filter by some variable?  ds.Female <- Dataset[Dataset$Var11 == ‘Female’,]  The above creates a dataframe called ds.Female that filtered out any case where Var11 equaled ‘Male’

13 Working with data in R: Reverse Coding What do you do if you have some variables that need to be reverse coded?  (1 + highest scale value – Variable) is the general formula  Dataset$Var12 <- 8 – Dataset$Var10 – This does two things. 1) Creates another column in Dataset labeled Var12 and 2) Sets Var12 equal to 8 – Var10  Check with cor(Dataset$Var10, Dataset$Var12, use=‘complete.obs’)

14 Working with data in R: Internal R Functions mean(Dataset$Var1,na.rm=T) = mean(Dataset[,1],na.rm=T) sd(Dataset$Var4, na.rm=T) min(Dataset$Var5,na.rm=T) and max(Dataset$Var5, na.rm=T) cor(Dataset, use=‘complete.obs’)

15 Working with data in R: Internal R Functions modlm <- lm(Var2 ~ Var3, data=Dataset)  Ordinary Least Squares Regression, regressing variable 2 onto variable 3 modanova <- lm(Var4 ~ as.factor(Var11), data=Dataset)  OLS Regression, regressing variable 4 onto the categorical gender variable – This is an ANOVA! modanova1 <- aov(Var4 ~ as.factor(Var11), data=Dataset)  aov is R’s built in ANOVA function dif <- TukeyHSD(modanova1)  Tukey’s Honestly Significant Difference Test

16 Exporting Data from R write.csv(Dataset, ‘Location.csv’) BOOM goes the dynamite

17 Thank you!


Download ppt "ALEXANDER C. LOPILATO R: Because the names of other stat programs don’t make sense so why should this one?"

Similar presentations


Ads by Google