Presentation on theme: "Using R Harry R. Erwin, PhD School of Computing and Technology University of Sunderland."— Presentation transcript:
Using R Harry R. Erwin, PhD School of Computing and Technology University of Sunderland
Resources Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).
Lecture Outline R as a statistical calculator Creating data Graphing and plotting Statistical distributions Dataframes Summarising data
Using R We will work through a few examples of statistical calculations and creating data. y<-c(3,7,9,11) z<-scan() a<-1:6 b<-seq(0.5,0.0,-0.1) rep(value,count) creates a vector with value count times. gl(upTo,repeats) can be used to generate factor data
Graphics Examples of plot() ?par for help on graphics parameters
Working with Dataframes R works with data in dataframes, objects with rows and columns. Each row is an observation or a measurement Each column contain the values of a variable. Variable types include numbers, text (factors), dates, or logical variables. Columns have names. Rows have row.names.
Reading a Dataframe worms<-read.table("worms.txt", header = T, row.names = 1) attach(worms) names(worms) If the row.names or the names are bad, you can set them to values. worms summary(worms)
Selecting Rows or Columns worms[,1:3] for all the rows and columns 1-3. worms[5:15,] for the middle rows worms[Area>3 & Slope < 3,] for logical tests To sort a dataframe, you have to designate the columns to be sorted and the column to base the sort on: worms[order(worms[,1]),1:6] Example of a reverse sort
Vector Operations * is vector multiplication –If they are not the same length, the shorter vector is repeated as needed. To join vectors, use the c() function ?c Subscripting can be based on a number, vector, or test. To drop an element, subscript with a minus sign in front Vectors can be combined with cbind() and rbind()
Arrays, etc. Like vectors or dataframes with multiple dimensions Lists can be used to combine data of different types. val <- list(varname=value,…) Although vectors are subscripted using , lists are subscripted with [] Factors are special –citizen <- factor(c("US","US","UK”)) Examples from book.
Sorting and Ordering Never sort a dataframe column on its own. The other columns are not sorted. So don’t use sort() Instead use order(), since it leaves the dataframe unmodified. It returns a vector of subscripts, not values, but then you can apply the dataframe to the reordered vector to show it in the new order.
Table Suppose vals is a collection of vectors table(vals) reports the count of each unique value tapply takes three arguments –Variable or dataframe to be summarised –Variable by which the summary is classified –Function to apply Examples
Data Manipulation To convert a continuous variable into a categorical variable, use cut(vals,levels) You can also specify the break points split() can be used to generate a list of vectors on the basis of the levels of a factor. Example
Saving your Work history(Inf) savehistory("filename") save(list=ls(), file = "filename") Tidying up –rm(var) any temporary variables –detach(dataframes) rm(list=ls()) will clean up everything
Conclusions There are other tools and languages –Minitab –SAS –Spreadsheets Use what you’re comfortable with. But professional statisticians use R.