Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the.

Slides:



Advertisements
Similar presentations
Summary Statistics/Simple Graphs in SAS/EXCEL/JMP.
Advertisements

Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Introduction to Programming using Matlab Session 2 P DuffourJan 2008.
R for Macroecology Aarhus University, Spring 2011.
MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Basics of Using R Xiao He 1. AGENDA 1.What is R? 2.Basic operations 3.Different types of data objects 4.Importing data 5.Basic data manipulation 2.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
Scripts and Flow Control. Scripts So far we have been entering commands directly into the command line But there is a better way Script files (and functions)
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
By Hrishikesh Gadre Session II Department of Mechanical Engineering Louisiana State University Engineering Equation Solver Tutorials.
EGR 106 – Week 2 – Arrays & Scripts Brief review of last week Arrays: – Concept – Construction – Addressing Scripts and the editor Audio arrays Textbook.
A Simple Guide to Using SPSS© for Windows
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 11 1 Microsoft Office Excel 2003 Tutorial 11 – Importing Data Into Excel.
Introduction to MATLAB Northeastern University: College of Computer and Information Science Co-op Preparation University (CPU) 10/22/2003.
Alternative text for elementary statistics –Elementary Concepts –Basic Statistics.
Introduction to MATLAB MECH 300H Spring Starting of MATLAB.
How to Use the R Programming Language for Statistical Analyses Part I: An Introduction to R Jennifer Urbano Blackford, Ph.D. Department of Psychiatry Kennedy.
SPSS Statistical Package for the Social Sciences is a statistical analysis and data management software package. SPSS can take data from almost any type.
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
Python plotting for lab folk Only the stuff you need to know to make publishable figures of your data. For all else: ask Sourish.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
Excel Tutorial: Inferential Statistics © Schachman.
8 Copyright © 2004, Oracle. All rights reserved. Creating LOVs and Editors.
INTRO TO PROGRAMMING Chapter 2. M-files While commands can be entered directly to the command window, MATLAB also allows you to put commands in text files.
Chapter 5 Review: Plotting Introduction to MATLAB 7 Engineering 161.
1 iSee Player Tutorial Using the Forest Biomass Accumulation Model as an Example ( Tutorial Developed by: (
An introduction to R: get familiar with R Guangxu Liu Bio7932.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Introduction to to R Emily Kalah Gade University of Washington Credit to Kristin Siebel for development of much of this PowerPoint.
Programming in R Getting data into R. Importing data into R In this session we will learn: Some basic R commands How to enter data directly into R How.
Arko Barman with modification by C.F. Eick COSC 4335 Data Mining Spring 2015.
MEGN 536 – Computational Biomechanics MATLAB: Getting Started Prof. Anthony J. Petrella Computational Biomechanics Group.
Math 15 Lecture 10 University of California, Merced Scilab Programming – No. 1.
GUI development with Matlab: GUI Front Panel Components 1 GUI front panel components In this section, we will look at -GUI front panel components -Programming.
Piotr Wolski Introduction to R. Topics What is R? Sample session How to install R? Minimum you have to know to work in R Data objects in R and how to.
A Brief Introduction to R Programming Darren J. Fitzpatrick, PhD The Bioinformatics Support Team 27/08/2015.
An Introduction to R graphics Cody Chiuzan Division of Biostatistics and Epidemiology Computing for Research I, 2012.
Linux Operations and Administration
Installing R CRAN: –(R homepage: –Windows 95 and later  Base –rw2001.exe.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
R packages/libraries Data input/output Rachel Carroll Department of Public Health Sciences, MUSC Computing for Research I, Spring 2014.
Introduction to Programming in R Department of Statistical Sciences and Operations Research Computation Seminar Series Speaker: Edward Boone
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Files: By the end of this class you should be able to: Prepare for EXAM 1. create an ASCII file describe the nature of an ASCII text Use and describe string.
Introduction to Matlab  Matlab is a software package for technical computation.  Matlab allows you to solve many numerical problems including - arrays.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Matlab Basic. MATLAB Product Family 2 3 Entering & Quitting MATLAB To enter MATLAB double click on the MATLAB icon. To Leave MATLAB Simply type quit.
Bioinformatics for biologists
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
R objects  All R entities exist as objects  They can all be operated on as data  We will cover:  Vectors  Factors  Lists  Data frames  Tables 
1 PEER Session 02/04/15. 2  Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g.,
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with the Department of Statistics and the Center.
1-2 What is the Matlab environment? How can you create vectors ? What does the colon : operator do? How does the use of the built-in linspace function.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Review > x[-c(1,4,6)] > Y[1:3,2:8] > island.data fishData$weight[1] > fishData[fishData$weight < 20 & fishData$condition.
Introduction to R Dr. Satish Nargundkar. What is R? R is a free software environment for statistical computing and graphics. It compiles and runs on a.
Introduction to R user-friendly and absolutely free
By Dr. Madhukar H. Dalvi Nagindas Khandwala college
DEPARTMENT OF COMPUTER SCIENCE
Topics Introduction to File Input and Output
Basics of R, Ch Functions Help Managing your Objects
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Topics Introduction to File Input and Output
Graphpad Prism 2.
Presentation transcript:

Ann Arbor ASA ‘Up and Running’ With R Prepared by volunteers of the Ann Arbor chapter of the American statistical association, in cooperation with the department of statistics and the center for statistical consultation and research of the university of Michigan October 27 th, 2010

R Class Agenda  Brief Introduction to R  Using R Help  Introduction to Functions Available in R  Working with Data  Importing/Exporting Data  Graphs  Simple Models  Writing Functions/Programming Ann Arbor ASA (Up and Running with R) 2

What is R?  R is a computing language commonly used for statistical analysis  R is open source which means that the source code is available to all users  R is a free software package, download it at Ann Arbor ASA (Up and Running with R) 3

More About R  Most statistical analysis is done using pre- defined functions in R.  These functions are available in many different packages.  When you download R, you have access to many functions from the ‘base’ package.  More advanced functions will require that you download other packages. Ann Arbor ASA (Up and Running with R) 4

What can you do with R?  Topics in statistics are readily available such as linear modeling, linear mixed modeling, multivariate analysis, clustering, non- parametric methods, and classification  R is well known to produce high quality graphics. Simple plots are easy and with a little more practice, users can produce publishable graphics! Ann Arbor ASA (Up and Running with R) 5

Time to Launch R  Find R on your computer: Start>Statistical Software Packages>R  Go to the file menu and double click ‘New script’  Here is the editor window where we will type our script  It is more convenient to type here than in your workspace  Try typing in both the workspace and the editor window Ann Arbor ASA (Up and Running with R) 6

Data Objects in R  Users create different data objects in R  Data objects refer to variables, arrays of numbers, character strings, functions and other more complicated data manipulations  ‘<-’ allows you to assign data objects with names of your choice  Type ‘a<-7’ in your editor window  Submit this command by highlighting it and pressing ctrl+r  Practice creating different data objects and submit them to the workspace Ann Arbor ASA (Up and Running with R) 7

Data Objects in R  Type ‘objects ()’  This allows you to see that you have created the object ‘a’ during this R session  You can view previously submitted commands by using the up/down arrow on your computer  You can remove this object by typing ‘rm(a)’  Try removing some objects you created and then type ‘objects()’ to see if they are listed Ann Arbor ASA (Up and Running with R) 8

Getting Help in R  To get help on any specific function:  Type ‘help(name of function)’  OR type ‘?(name of function)’  Sometimes help is not available from the packages you have downloaded  Type ‘??(name of function)’  Try searching for help on ‘hist’ or ‘lm’  Two popular R resource websites:  Rseek.org  nabble.com Ann Arbor ASA (Up and Running with R) 9

A Simple Example to Get You Started  To set up a vector named x use the R command:  ‘x<-c(5,4,3,6)’  This is an assignment statement using the function c() which creates a vector by concatenating its arguments  Perform vector/matrix arithmetic:  ‘v<- 3*x - 5’ Ann Arbor ASA (Up and Running with R) 10

R Reference Card *created by Tom Short  There are thousands of available functions in R, but this Reference Card provides a strong working knowledge  Let’s take a minute to look at the organization of the Reference Card and try out a few of the functions available! Ann Arbor ASA (Up and Running with R) 11

Generating Sequences/Replicating Objects  Sequences : submit the following commands  ‘seq(-5, 5, by=.2)’  ‘seq(length=51, from=-5, by=.2)’  Both produce a sequence from -5 to 5 with a distance of.2 between objects  Replications : submit the following commands  ‘rep(x, times=5)’  ‘rep(x, each=5) ‘  Both produce x replicated 5 times Ann Arbor ASA (Up and Running with R) 12

Working with Data Sets  There are many data sets available for use in R  Type ‘data()’ to see what’s available  We will work with the trees data set  Type ‘data(trees)’  This data set is now ready to use in R  The following are useful commands:  ‘summary(trees)’ – summary of variables  ‘dim(trees)’ – dimension of data set  ‘names(trees)’ – see variable names  ‘attach(trees)’ – attach the variable names for use in R Ann Arbor ASA (Up and Running with R) 13

Extracting Data  R has saved the data set trees as a data frame object  Check this by typing ‘class(trees)’  R stores this data in matrix row/column format: data.frame[rows,columns]  Type ‘trees[c(1:2),2]’ – we see the first 2 rows and 2 nd column  Type ‘trees[3,c(“Height”,”Girth”)]’ – can also reference column names  Type ‘trees[-c(10:20),”Height”]’ – skips rows for variable Height Ann Arbor ASA (Up and Running with R) 14

Extracting Data (continued)  The subset() command is very useful to extract data in a logical manner. 1 st argument is data, 2 nd argument is logical subset requirement  ‘subset(trees, Height>80)’ – subset where all tree heights >80  ‘subset(trees, Height 10) ‘– subset where all tree heights 10  ‘subset(trees, Height 11)’ – subset where all tree heights 11 Ann Arbor ASA (Up and Running with R) 15

Importing Data  The most common (and easiest) file to import is a text file with the read.table() command  R needs to be told where the file is located  You can set the working directory which tells R where all your files are located by typing ‘setwd("C:\\Users\\hicksk\\Desktop")’  OR you can physically point to the working directory by going to File<Change dir… and choosing the location of your files  OR you can include the physical location of your file in your read.table() command Ann Arbor ASA (Up and Running with R) 16

Using the read.table() command  Go to ASA Ann Arbor Chapter’s website here and look under the R Classes section, open ‘furniture.zip’ and save the files to your desktophere  Remember we must tell R where these files are located to read them in properly  read.table("C:\\Users\\hicksk\\Desktop\\furnit ure.txt",header=TRUE,sep=“”)  Important to use double slashes \\ rather than single slash \  Tell R whether you have column names on your data with header=TRUE or header=FALSE Ann Arbor ASA (Up and Running with R) 17

Using read.table() (cont’d)  Remember, another way of specifying the file’s location is to set the working directory first and then read in the file  setwd(“C:\\Users\\hicksk\\Desktop”)  read.table(“furniture.txt”,header=TRUE,sep=“”) OR we had the option of physically pointing the location by going to File>Change dir… and pointing to the file’s location. We would then be able to read the file similar to above by typing ‘read.table(“furniture.txt”,header=TRUE,sep=“”)’ Ann Arbor ASA (Up and Running with R) 18

read.table(), read.csv() and Missing Values  It is also popular to import csv files since excel files are easily converted to csv files  read.csv() and read.table() are very similar although they handle missing values differently  read.csv() automatically assign an ‘NA’ to missing values  read.table() will not load data with missing values, so you must assign ‘NA’ to missing values before reading it into R Ann Arbor ASA (Up and Running with R) 19

read.table(), read.csv() and Missing Values (cont’d)  Let’s remove a data entry from both “furniture.txt” and “furniture.csv”  From the first row, erase 100 from the Area column  Now try to read in the data from these two files using read.table() and read.csv()  You should see that you cannot read the data in using the read.table() command unless you input an entry for the missing value Ann Arbor ASA (Up and Running with R) 20

Other Options for Importing Data  When you download R, you should have automatically obtained the foreign package  By submitting ‘library(foreign)’, you will have many more options for importing data:  read.xport(), read.spss(), read.dta(), read.mtp()  For more information on these options, simply submit ‘help(read.XXXX)’ Ann Arbor ASA (Up and Running with R) 21

Exporting Data  You can export data by using the write.table() command  ‘write.table(trees, “treesDATA.txt”, row.names=FALSE, sep=“,”)’  Specify that we want the trees data set exported  Type in name of file to be exported. The default is that it will write the file to the working directory already specified unless you give a location.  row.names=FALSE tells R that we do not wish to preserve the row names  sep=“,” tells R that the data set is comma delimited Ann Arbor ASA (Up and Running with R) 22

Furniture Data Set  Let’s assign a name to the furniture data set as we read it in so we can do some analysis  furn<-read.table(“furniture.txt”,sep=“”,h=T)  To get a better understanding of our data set, use some useful commands:  dim(furn)  summary(furn)  names(furn)  attach(furn) Ann Arbor ASA (Up and Running with R) 23

Graphs in R Using the Furniture Data  R can produce both very simple and very complex graphs  We will only get a brief introduction today but I encourage you to investigate further  Let’s start by making a simple scatter plot of the Area and Cost variables from our furniture data set  plot(Area,Cost,main=“Area vs Cost”, xlab=“Area”,ylab=“Cost”)  We have told R to put Area on the x-axis, Cost on the y-axis and provided a title and label axes Ann Arbor ASA (Up and Running with R) 24

Graphs in R  Let’s look at the distribution of our variables using some different graphs in R  hist(Area) – histogram of Area  hist(Cost) – histogram of Cost  boxplot(Cost ~ Type) – boxplot of Cost by Type  We can make the boxplot much prettier  boxplot(Cost ~ Type, main=“Boxplot of Cost by Type”, col=c(“orange”, “green”, “blue”), xlab=“Type”, ylab = “Cost”) Ann Arbor ASA (Up and Running with R) 25

Graphs in R  We can also look at a scatter plot matrix of all variables in a data set by using the pairs() function  pairs(furn)  Or we can look at a correlation/covariance matrix of the numeric variables  cor(furn[,c(2:3)])  cov(furn[,c(2:3)]) Ann Arbor ASA (Up and Running with R) 26

Graphs in R/Simple Models  Let’s perform a simple linear regression using the furniture data set  m1<-lm(Cost ~ Area)  summary(m1)  coef(m1)  fitted.values(m1)  residuals(m1)  We can also plot the residuals against the fitted values  plot(fitted.values(m1), residuals(m1)) Ann Arbor ASA (Up and Running with R) 27

Graphs in R/Simple Models  Let’s continue with our scatter plot of Area and Cost  plot(Area, Cost, main = “Cost Regression Example”, xlab=“Cost”, ylab=“Area”)  abline(lm(Cost~Area), col=3, lty=1)  lines( lowess(Cost~Area), col=3, lty=2)  Now let’s interactively add a legend  legend(locator(1), c(“Linear”, “Lowess”), lty=c(1,2), col=2)  You can point to your graph and place the legend where you wish! Ann Arbor ASA (Up and Running with R) 28

Graphs in R/Simple Models  Now let’s identify different points on the graph  identify(Area, Cost, row.names(furn))  Makes it easy to identify outliers  We can use the locator() command to quantify differences between the regression fit and the loess line  locator(2)  Now let’s compare predicted values of Cost when Area is equal to 250 Ann Arbor ASA (Up and Running with R) 29

Multivariate Analysis  Now let’s do a multivariate regression using both Area and Type as predictors in the model  m2<-lm(Cost ~ Area + Type)  summary(m2)  Now let’s see if our multivariate model is significantly better than the simple model by using ANOVA  anova(m1, m2)  The ANOVA table compares the two nested regression models by testing the null hypothesis that the Type predictor did not need to be in the model. Since the p-value<.05, we have evidence to conclude that Type is an important predictor. Ann Arbor ASA (Up and Running with R) 30

Writing Functions  You can easily write your own programs and functions in R  Type in the following function named f1:  f1<-function(m,n) { result<-m + n return(result)}  Now type ‘f1(3,5)’ and you should see that your function ran for the values 3,5 as specified Ann Arbor ASA (Up and Running with R) 31

Working with If-Then Statements  Here’s an example of how if-then works in R:  You’ll see since 10>5, it printed “GO BLUE”  You can tell R to do multiple items using the following structure  if (logical condition) {do this and this and this} Ann Arbor ASA (Up and Running with R) 32

If-Else Conditions  We can make If-then statements slightly more complex using If-Else Conditions. Here’s an example:  if(4>5){print("Happy Halloween") print(" BOO’’)} else{ print(‘’Merry XMAS’’) print(‘’HO HO HO’’)} Ann Arbor ASA (Up and Running with R) 33

For Loop/While Loop  For loops can be quite helpful when writing functions. Here’s an example:  for (i in 1:5){ print(i+1)}  While loops are also quite handy. Here’s an example:  f2<-function (x){ while( x<5){x<- x+1 print(x)}}  f2(-5) Ann Arbor ASA (Up and Running with R) 34

Practice Problem #1  Create a sequence that starts at 0 and goes to 5 with a step of 0.5  Replicate ‘a b c’ 3 times  Replicate ‘a’ 3 times, ‘b’ 3 times, ‘c’ 3 times in one command Ann Arbor ASA (Up and Running with R) 35

Practice Problem #2  Make a histogram of the “Girth” variable from the ‘trees’ data set. Include a title.  Make a boxplot of the “Height” variable from the ‘trees’ data set. Color it blue and label your axes.  Make a scatter plot of Girth and Height. Add the regression line. Ann Arbor ASA (Up and Running with R) 36

Practice Problem #3  Create a simple linear model with Girth as the predictor and Height as the response. Extract the coefficients.  Now add Volume to the model. How can we tell if this model is preferred to the simpler model? Ann Arbor ASA (Up and Running with R) 37

Practice Problem #4  Fix x at a number smaller than 5. Use a ‘while loop’ to create a sequence that starts at x and increases by 2 until you reach 20.  Create a function that will return the product of any two numbers. Ann Arbor ASA (Up and Running with R) 38

Thank you for your attention! Additional R Resources:  R project home  R documentation  R help forum  R Journal  R Graphical Gallery  R Graphical Manual  R Seek Ann Arbor ASA (Up and Running with R) 39

Acknowledgements/References  Thank you to Brady West for allowing the use of his R introductory materials.   Ann Arbor ASA (Up and Running with R) 40