Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the.

Similar presentations


Presentation on theme: "Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the."— Presentation transcript:

1 Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the department of statistics and the center for statistical consultation and research of the university of Michigan October 27 th, 2010

2 R Class Agenda  Brief Introduction to R  Using R Help  Introduction to Functions Available in R  Working with Data  Importing/Exporting Data  Graphs  Simple Models  Writing Functions/Programming Ann Arbor ASA (Up and Running with R) 2

3 What is R?  R is a computing language commonly used for statistical analysis  R is open source which means that the source code is available to all users  R is a free software package, download it at http://www.r-project.org/http://www.r-project.org/ Ann Arbor ASA (Up and Running with R) 3

4 More About R  Most statistical analysis is done using pre- defined functions in R.  These functions are available in many different packages.  When you download R, you have access to many functions from the ‘base’ package.  More advanced functions will require that you download other packages. Ann Arbor ASA (Up and Running with R) 4

5 What can you do with R?  Topics in statistics are readily available such as linear modeling, linear mixed modeling, multivariate analysis, clustering, non- parametric methods, and classification  R is well known to produce high quality graphics. Simple plots are easy and with a little more practice, users can produce publishable graphics! Ann Arbor ASA (Up and Running with R) 5

6 Time to Launch R  Find R on your computer: Start>Statistical Software Packages>R  Go to the file menu and double click ‘New script’  Here is the editor window where we will type our script  It is more convenient to type here than in your workspace  Try typing in both the workspace and the editor window Ann Arbor ASA (Up and Running with R) 6

7 Data Objects in R  Users create different data objects in R  Data objects refer to variables, arrays of numbers, character strings, functions and other more complicated data manipulations  ‘<-’ allows you to assign data objects with names of your choice  Type ‘a<-7’ in your editor window  Submit this command by highlighting it and pressing ctrl+r  Practice creating different data objects and submit them to the workspace Ann Arbor ASA (Up and Running with R) 7

8 Data Objects in R  Type ‘objects ()’  This allows you to see that you have created the object ‘a’ during this R session  You can view previously submitted commands by using the up/down arrow on your computer  You can remove this object by typing ‘rm(a)’  Try removing some objects you created and then type ‘objects()’ to see if they are listed Ann Arbor ASA (Up and Running with R) 8

9 Getting Help in R  To get help on any specific function:  Type ‘help(name of function)’  OR type ‘?(name of function)’  Sometimes help is not available from the packages you have downloaded  Type ‘??(name of function)’  Try searching for help on ‘hist’ or ‘lm’  Two popular R resource websites:  Rseek.org  nabble.com Ann Arbor ASA (Up and Running with R) 9

10 A Simple Example to Get You Started  To set up a vector named x use the R command:  ‘x<-c(5,4,3,6)’  This is an assignment statement using the function c() which creates a vector by concatenating its arguments  Perform vector/matrix arithmetic:  ‘v<- 3*x - 5’ Ann Arbor ASA (Up and Running with R) 10

11 R Reference Card *created by Tom Short  There are thousands of available functions in R, but this Reference Card provides a strong working knowledge  Let’s take a minute to look at the organization of the Reference Card and try out a few of the functions available! Ann Arbor ASA (Up and Running with R) 11

12 Generating Sequences/Replicating Objects  Sequences : submit the following commands  ‘seq(-5, 5, by=.2)’  ‘seq(length=51, from=-5, by=.2)’  Both produce a sequence from -5 to 5 with a distance of.2 between objects  Replications : submit the following commands  ‘rep(x, times=5)’  ‘rep(x, each=5) ‘  Both produce x replicated 5 times Ann Arbor ASA (Up and Running with R) 12

13 Working with Data Sets  There are many data sets available for use in R  Type ‘data()’ to see what’s available  We will work with the trees data set  Type ‘data(trees)’  This data set is now ready to use in R  The following are useful commands:  ‘summary(trees)’ – summary of variables  ‘dim(trees)’ – dimension of data set  ‘names(trees)’ – see variable names  ‘attach(trees)’ – attach the variable names for use in R Ann Arbor ASA (Up and Running with R) 13

14 Extracting Data  R has saved the data set trees as a data frame object  Check this by typing ‘class(trees)’  R stores this data in matrix row/column format: data.frame[rows,columns]  Type ‘trees[c(1:2),2]’ – we see the first 2 rows and 2 nd column  Type ‘trees[3,c(“Height”,”Girth”)]’ – can also reference column names  Type ‘trees[-c(10:20),”Height”]’ – skips rows 10-20 for variable Height Ann Arbor ASA (Up and Running with R) 14

15 Extracting Data (continued)  The subset() command is very useful to extract data in a logical manner. 1 st argument is data, 2 nd argument is logical subset requirement  ‘subset(trees, Height>80)’ – subset where all tree heights >80  ‘subset(trees, Height 10) ‘– subset where all tree heights 10  ‘subset(trees, Height 11)’ – subset where all tree heights 11 Ann Arbor ASA (Up and Running with R) 15

16 Importing Data  The most common (and easiest) file to import is a text file with the read.table() command  R needs to be told where the file is located  You can set the working directory which tells R where all your files are located by typing ‘setwd("C:\\Users\\hicksk\\Desktop")’  OR you can physically point to the working directory by going to File<Change dir… and choosing the location of your files  OR you can include the physical location of your file in your read.table() command Ann Arbor ASA (Up and Running with R) 16

17 Using the read.table() command  Go to ASA Ann Arbor Chapter’s website here and look under the R Classes section, open ‘furniture.zip’ and save the files to your desktophere  Remember we must tell R where these files are located to read them in properly  read.table("C:\\Users\\hicksk\\Desktop\\furnit ure.txt",header=TRUE,sep=“”)  Important to use double slashes \\ rather than single slash \  Tell R whether you have column names on your data with header=TRUE or header=FALSE Ann Arbor ASA (Up and Running with R) 17

18 Using read.table() (cont’d)  Remember, another way of specifying the file’s location is to set the working directory first and then read in the file  setwd(“C:\\Users\\hicksk\\Desktop”)  read.table(“furniture.txt”,header=TRUE,sep=“”) OR we had the option of physically pointing the location by going to File>Change dir… and pointing to the file’s location. We would then be able to read the file similar to above by typing ‘read.table(“furniture.txt”,header=TRUE,sep=“”)’ Ann Arbor ASA (Up and Running with R) 18

19 read.table(), read.csv() and Missing Values  It is also popular to import csv files since excel files are easily converted to csv files  read.csv() and read.table() are very similar although they handle missing values differently  read.csv() automatically assign an ‘NA’ to missing values  read.table() will not load data with missing values, so you must assign ‘NA’ to missing values before reading it into R Ann Arbor ASA (Up and Running with R) 19

20 read.table(), read.csv() and Missing Values (cont’d)  Let’s remove a data entry from both “furniture.txt” and “furniture.csv”  From the first row, erase 100 from the Area column  Now try to read in the data from these two files using read.table() and read.csv()  You should see that you cannot read the data in using the read.table() command unless you input an entry for the missing value Ann Arbor ASA (Up and Running with R) 20

21 Other Options for Importing Data  When you download R, you should have automatically obtained the foreign package  By submitting ‘library(foreign)’, you will have many more options for importing data:  read.xport(), read.spss(), read.dta(), read.mtp()  For more information on these options, simply submit ‘help(read.XXXX)’ Ann Arbor ASA (Up and Running with R) 21

22 Exporting Data  You can export data by using the write.table() command  ‘write.table(trees, “treesDATA.txt”, row.names=FALSE, sep=“,”)’  Specify that we want the trees data set exported  Type in name of file to be exported. The default is that it will write the file to the working directory already specified unless you give a location.  row.names=FALSE tells R that we do not wish to preserve the row names  sep=“,” tells R that the data set is comma delimited Ann Arbor ASA (Up and Running with R) 22

23 Furniture Data Set  Let’s assign a name to the furniture data set as we read it in so we can do some analysis  furn<-read.table(“furniture.txt”,sep=“”,h=T)  To get a better understanding of our data set, use some useful commands:  dim(furn)  summary(furn)  names(furn)  attach(furn) Ann Arbor ASA (Up and Running with R) 23

24 Graphs in R Using the Furniture Data  R can produce both very simple and very complex graphs  We will only get a brief introduction today but I encourage you to investigate further  Let’s start by making a simple scatter plot of the Area and Cost variables from our furniture data set  plot(Area,Cost,main=“Area vs Cost”, xlab=“Area”,ylab=“Cost”)  We have told R to put Area on the x-axis, Cost on the y-axis and provided a title and label axes Ann Arbor ASA (Up and Running with R) 24

25 Graphs in R  Let’s look at the distribution of our variables using some different graphs in R  hist(Area) – histogram of Area  hist(Cost) – histogram of Cost  boxplot(Cost ~ Type) – boxplot of Cost by Type  We can make the boxplot much prettier  boxplot(Cost ~ Type, main=“Boxplot of Cost by Type”, col=c(“orange”, “green”, “blue”), xlab=“Type”, ylab = “Cost”) Ann Arbor ASA (Up and Running with R) 25

26 Graphs in R  We can also look at a scatter plot matrix of all variables in a data set by using the pairs() function  pairs(furn)  Or we can look at a correlation/covariance matrix of the numeric variables  cor(furn[,c(2:3)])  cov(furn[,c(2:3)]) Ann Arbor ASA (Up and Running with R) 26

27 Graphs in R/Simple Models  Let’s perform a simple linear regression using the furniture data set  m1<-lm(Cost ~ Area)  summary(m1)  coef(m1)  fitted.values(m1)  residuals(m1)  We can also plot the residuals against the fitted values  plot(fitted.values(m1), residuals(m1)) Ann Arbor ASA (Up and Running with R) 27

28 Graphs in R/Simple Models  Let’s continue with our scatter plot of Area and Cost  plot(Area, Cost, main = “Cost Regression Example”, xlab=“Cost”, ylab=“Area”)  abline(lm(Cost~Area), col=3, lty=1)  lines( lowess(Cost~Area), col=3, lty=2)  Now let’s interactively add a legend  legend(locator(1), c(“Linear”, “Lowess”), lty=c(1,2), col=2)  You can point to your graph and place the legend where you wish! Ann Arbor ASA (Up and Running with R) 28

29 Graphs in R/Simple Models  Now let’s identify different points on the graph  identify(Area, Cost, row.names(furn))  Makes it easy to identify outliers  We can use the locator() command to quantify differences between the regression fit and the loess line  locator(2)  Now let’s compare predicted values of Cost when Area is equal to 250 Ann Arbor ASA (Up and Running with R) 29

30 Multivariate Analysis  Now let’s do a multivariate regression using both Area and Type as predictors in the model  m2<-lm(Cost ~ Area + Type)  summary(m2)  Now let’s see if our multivariate model is significantly better than the simple model by using ANOVA  anova(m1, m2)  The ANOVA table compares the two nested regression models by testing the null hypothesis that the Type predictor did not need to be in the model. Since the p-value<.05, we have evidence to conclude that Type is an important predictor. Ann Arbor ASA (Up and Running with R) 30

31 Writing Functions  You can easily write your own programs and functions in R  Type in the following function named f1:  f1<-function(m,n) { result<-m + n return(result)}  Now type ‘f1(3,5)’ and you should see that your function ran for the values 3,5 as specified Ann Arbor ASA (Up and Running with R) 31

32 Working with If-Then Statements  Here’s an example of how if-then works in R:  You’ll see since 10>5, it printed “GO BLUE”  You can tell R to do multiple items using the following structure  if (logical condition) {do this and this and this} Ann Arbor ASA (Up and Running with R) 32

33 If-Else Conditions  We can make If-then statements slightly more complex using If-Else Conditions. Here’s an example:  if(4>5){print("Happy Halloween") print(" BOO’’)} else{ print(‘’Merry XMAS’’) print(‘’HO HO HO’’)} Ann Arbor ASA (Up and Running with R) 33

34 For Loop/While Loop  For loops can be quite helpful when writing functions. Here’s an example:  for (i in 1:5){ print(i+1)}  While loops are also quite handy. Here’s an example:  f2<-function (x){ while( x<5){x<- x+1 print(x)}}  f2(-5) Ann Arbor ASA (Up and Running with R) 34

35 Practice Problem #1  Create a sequence that starts at 0 and goes to 5 with a step of 0.5  Replicate ‘a b c’ 3 times  Replicate ‘a’ 3 times, ‘b’ 3 times, ‘c’ 3 times in one command Ann Arbor ASA (Up and Running with R) 35

36 Practice Problem #2  Make a histogram of the “Girth” variable from the ‘trees’ data set. Include a title.  Make a boxplot of the “Height” variable from the ‘trees’ data set. Color it blue and label your axes.  Make a scatter plot of Girth and Height. Add the regression line. Ann Arbor ASA (Up and Running with R) 36

37 Practice Problem #3  Create a simple linear model with Girth as the predictor and Height as the response. Extract the coefficients.  Now add Volume to the model. How can we tell if this model is preferred to the simpler model? Ann Arbor ASA (Up and Running with R) 37

38 Practice Problem #4  Fix x at a number smaller than 5. Use a ‘while loop’ to create a sequence that starts at x and increases by 2 until you reach 20.  Create a function that will return the product of any two numbers. Ann Arbor ASA (Up and Running with R) 38

39 Thank you for your attention! Additional R Resources:  R project home http://www.r-project.orghttp://www.r-project.org  R documentation http://www.r-project.org/other-docs.htmlhttp://www.r-project.org/other-docs.html  R help forum http://www.nabble.com/R-help-f13820.htmlhttp://www.nabble.com/R-help-f13820.html  R Journal http://journal.r-project.org/http://journal.r-project.org/  R Graphical Gallery http://addictedtor.free.fr/graphiques/http://addictedtor.free.fr/graphiques/  R Graphical Manual http://bm2.genes.nig.ac.jp/RGM2/http://bm2.genes.nig.ac.jp/RGM2/  R Seek http://www.rseek.org/http://www.rseek.org/ Ann Arbor ASA (Up and Running with R) 39

40 Acknowledgements/References  Thank you to Brady West for allowing the use of his R introductory materials.  http://www.r-project.org http://www.r-project.org  http://addictedtor.free.fr/graphiques/ http://addictedtor.free.fr/graphiques/ Ann Arbor ASA (Up and Running with R) 40


Download ppt "Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the."

Similar presentations


Ads by Google