Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to R user-friendly and absolutely free

Similar presentations


Presentation on theme: "Introduction to R user-friendly and absolutely free"— Presentation transcript:

1 Introduction to R user-friendly and absolutely free
Ho Kim SCHOOL OF PUBLIC HEALTH, SNU

2 Useful sites R is a free software with powerful tools
The Comprehensive R Archives Network -> Download R for Windows -> base -> Download R for Windows Textbook : Simple R by John Verzani

3 Features of R R is free. R is open-source and runs on UNIX, Windows and Macintosh. R has an excellent built-in help system. R has excellent graphing capabilities. Students can easily migrate to the commercially supported S-Plus program if commercial software is desired. R's language has a powerful, easy to learn syntax with many built-in statistical functions. The language is easy to extend with user-written functions. R is a computer programming language. For programmers it will feel more familiar than others and for new computer users, the next leap to programming will not be so large.

4 Starting the R

5 Data manipulation Data input Data management Data types Importing data
Exporting data Viewing data Value labels Missing data Variables Operators Sorting data Merging data Subsetting data Source: (Quick r)

6 Data types Vectors Matrices Factors [Data Input]
a <- c(1,2,5.3,6,-2,4) #numeric vector b <- c("one","two","three") #character vector c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector a[c(2,4)] #2nd and 4th elements of vector Factors gender <- c(rep("male",20), rep("female", 30)) gender <- factor(gender) # R now treats gender as a nominal variable summary(gender) Matrices # generates 5 x 4 numeric matrix y<-matrix(1:20, nrow=5,ncol=4) y[,4] # 4th column of matrix y[3,] # 3rd row of matrix y[2:4,1:3] # rows 2,3,4 of columns 1,2,3

7 Data types Dataframes Lists # example of a list with 4 components -
[Data Input] Data types Dataframes d <- c(1,2,3,4) e <- c("red", "white", "red", NA) f <- c(TRUE,TRUE,TRUE,FALSE) mydata <- data.frame(d,e,f) names(mydata) <- c("ID","Color","Passed") # variable names Lists # example of a list with 4 components - # a string, a numeric vector, a matrix, and a scaler w <- list(name="Fred", mynumbers=a, mymatrix=y, age=5.3)

8 Importing data From CSV file From Excel From txt file [Data Input]
malaria <-read.table("C:\\R_data\\malaria.csv", header=TRUE, sep=",") From Excel library(RODBC) channel <- odbcConnectExcel("C:\\R_data\\malaria.xls") malaria <- sqlFetch(channel, "mal") *odbcConnectExcel is only usable with 32-bit Windows From txt file malaria <- read.table("C:\\ R_data\\malaria.txt", header=TRUE, sep="\t")

9 Exporting data To an CSV file To a tab delimited text file
[Data Input] Exporting data To an CSV file write.table(malaria, "C:\\ R_data\\mal01.csv", row.names=F) To a tab delimited text file write.table(malaria, "C:\\ R_data\\mal02.txt", sep="\t", row.names=F)

10 Viewing data ls() # list objects in the working environment
names(malaria) # list the variables in malaria str(malaria) # list the structure of malaria levels(malaria $v1) # list levels of factor v1 in malaria malaria$v1<-factor(malaria$mal) dim(malaria) # dimensions of an malaria class(malaria) # class of an malaria (numeric, matrix, dataframe, etc) malaria # print malaria head(malaria, n=10) # print first 10 rows of malaria tail(malaria, n=5) # print last 5 rows of malaria summary(malaria)

11 Value labels # variable v1 is coded 1, 2 or 3
# we want to attach value labels 1=red, 2=blue, 3=green v1<-c(1,1,1,2,2,3) v2 <- factor(v1, levels = c(1,2,3), labels = c("red", "blue", "green"))

12 Missing data Testing for missing values Recoding values to missing
y <- c(1,2,3,NA) is.na(y) # returns a vector (F F F T) Recoding values to missing malaria[malaria$age==99,“age"] <- NA Excluding missing values from analyses x <- c(1,2,NA,3) mean(x) # returns NA mean(x, na.rm=TRUE) # returns 2

13 Help > help(mean) > ?mean

14 Data manipulation Data input Data management Data types Importing data
Exporting data Viewing data Value labels Missing data Variables Operators Sorting data Merging data Subsetting data

15 Variables Recoding variables [Data management]
# create 2 age categories malaria$agecat <- ifelse(malaria$age >7, c(“student"), c(“baby")) attach(malaria) malaria$agecat2[age > 7] <- "student" malaria$agecat2[age <= 7] <- "baby" detach(malaria)

16 Operators Comparison operators Logical operators == equals
[Data management] Operators Comparison operators == equals != not equals <= less than or equals >= greater than or equals = assignment (arrow ‘<-’ 와 같다) Logical operators & and | or ! not

17 Sorting Data Avoid “Attach” command when sorting the data
[Data management] Sorting Data # sort by mal newdata <- malaria[order(malaria$mal),] # sort by mal and age newdata2 <- malaria[order(malaria$mal, malaria$age),] #sort by mal (ascending) and age (descending) newdata3 <- malaria[order(malaria$mal, -malaria$age),] Avoid “Attach” command when sorting the data

18 Merging Data Raw dataset Adding rows Adding columns [Data management]
malaria2<-read.table("C:\\R_data\\malaria.csv", header=TRUE, sep=",") Adding rows extra<-read.table ("C:\\R_data\\extra15.csv",header=T, sep=",") malaria3<-rbind(malaria2,extra) Adding columns region<-read.table ("C:\\R_data\\region.csv", header=T, sep=",") malaria4<-merge(malaria3, region, by="subject")

19 Subsetting Data mal.1 <- subset(malaria,mal==1) summary(mal.1)
[Data management] Subsetting Data mal.1 <- subset(malaria,mal==1) summary(mal.1) mal.baby <- subset(malaria, mal == 1 & age < 8)


Download ppt "Introduction to R user-friendly and absolutely free"

Similar presentations


Ads by Google