Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to R. J. Charles Victor – Intro to R Workshop Plan The R interface The Console The Console The Script Editor The Script Editor The “Workspace”

Similar presentations


Presentation on theme: "Introduction to R. J. Charles Victor – Intro to R Workshop Plan The R interface The Console The Console The Script Editor The Script Editor The “Workspace”"— Presentation transcript:

1 Introduction to R

2 J. Charles Victor – Intro to R Workshop Plan The R interface The Console The Console The Script Editor The Script Editor The “Workspace” The “Workspace” R programming rules… R programming rules… How does R ‘think’ R Objects R Objects The data frame The data frame Importing Data Data Manipulation Simple Analyses

3 J. Charles Victor – Intro to R What is R? Programming environment Useful for statistics and powerful graphing capabilities Useful for statistics and powerful graphing capabilities But you will be programming, not clicking and pointing Free, ‘open’ software Free, ‘open’ software Users create programs which are made available to other users via web and installation interface Based on S, S-Plus programming

4 J. Charles Victor – Intro to R First Step Open R…

5 J. Charles Victor – Intro to R The R Console The main window Commands are written and submitted Commands are written and submitted Log of progress recorded Log of progress recorded Output (except graphs) produced Output (except graphs) produced Similar to STATA interface and function Similar to STATA interface and function Prompt ‘>’ indicates R is waiting for a command Try the following: Try the following: > x x <- c(1,2,3,4,5) [ENTER] > mean(x) [ENTER] You should find the following result You should find the following result [1] 3 R is telling you the mean of [1,2,3,4,5] is 3 R is telling you the mean of [1,2,3,4,5] is 3

6 J. Charles Victor – Intro to R The Script Editor Accessible from the File menu item Used to create a series of commands (ie program) that can be saved and run at a later date Used to create a series of commands (ie program) that can be saved and run at a later date Similar to DO editor in STATA Similar to DO editor in STATA Will make SAS and SPSS syntax users more comfortable Will make SAS and SPSS syntax users more comfortable Write commands, highlight and click on submit button Try opening the Script editor (‘New Script’) and repeating the same commands as before Try opening the Script editor (‘New Script’) and repeating the same commands as before X <- c(1,2,3,4,5) mean(x) Now highlight this code and click on the submit button Now highlight this code and click on the submit button

7 J. Charles Victor – Intro to R Script Editor Nothing Fancy – but VERY useful Saves programs

8 J. Charles Victor – Intro to R The Workspace The ‘Workspace’ is the R data and objects When exiting R, saving the workspace saves your data and work When exiting R, saving the workspace saves your data and work Let’s see our work thus far Let’s see our work thus far Type: ls() What do you see? What do you see? Try saving your work thus far Try saving your work thus far File -> Save Workspace

9 J. Charles Victor – Intro to R R Programming General Programming R is generally case-sensitive R is generally case-sensitive Character strings must be in quotes (only “ “) Character strings must be in quotes (only “ “) Hitting ENTER submits a command Hitting ENTER submits a command If you want a command to go over more than one line, add a ‘+ then hit enter Try the following: Try the following: > newy newy <- c(0,0,1, + + 1,0) Use ‘comments’ to identify what you have done Use ‘comments’ to identify what you have done Comments begin with “#” Comments begin with “#”

10 J. Charles Victor – Intro to R How does R think? R thinks of data elements as ‘objects’ Objects can be: Objects can be: Single variables Arrays of variables Entire Datasets Results from analyses (if saved as an object) When you save the ‘Workspace’ you save all of these objects So in a small sense, R works like Excel. So in a small sense, R works like Excel.

11 J. Charles Victor – Intro to R OK, I don’t understand this Object thing… For data analysts it is usually easiest to start by equating the term ‘object’ to mean ‘variable’ at first We have already created one variable called ‘x’ We have already created one variable called ‘x’ We can create another variable (object) called ‘y’ that has the values (20, 27, 18, 50, 99) We can create another variable (object) called ‘y’ that has the values (20, 27, 18, 50, 99) > y y <- c(20,27,18,50,99) To see all of the variables (objects) in memory we can use the ‘list’ command To see all of the variables (objects) in memory we can use the ‘list’ commandls() Or click on MISC -> LIST OBJECTS Or click on MISC -> LIST OBJECTS What do you see? What do you see?

12 DATA INPUT

13 J. Charles Victor – Intro to R Creating Data How? No Spreadsheet?? Create your own Create your own Class of 5 students, need average test score John Smith58 M Jaysharee Singh 82 F Emily Xu 90 F Ute VanDroglen65 F Charles Victor90 M

14 J. Charles Victor – Intro to R First Attempt to Enter Data Many ways to create and edit data in R First create variables (objects) First create variables (objects) Then compile the data set from the variables Then compile the data set from the variables Creating variables – 2 main ways Relatively few values Relatively few values VARIABLENAME <- c(VALUE1,VALUE2,VALUE3….) Character values in quotes z <- c(“ABC”,”DEF”) Character values in quotes z <- c(“ABC”,”DEF”) Many values Many values VARIABLENAME <- scan() ENTER VALUE1 VALUE2 VALUE3 VALUE4 …. VALUE8 ENTER VALUE9 VALUE10 ….. ENTER ENTER>

15 J. Charles Victor – Intro to R Try entering this data in Use method 1 for first name and last name and sex Use method 1 for first name and last name and sex Use method 2 for exam mark Use method 2 for exam mark John Smith58 Jaysharee Singh 82 Emily Xu 90 Ute VanDroglen65 Charles Victor90 After you create each variable, look at the variable to see that it is correct by typing the variable name at the command prompt > firstname

16 J. Charles Victor – Intro to R A few notes on entering values 1) Variable names can contain most special characters including ‘.’ 2) Missing values should be coded as NA 3) To create a variable whose values are a sequential list of numbers, use a colon (:) StudentID <- c(1:5)

17 J. Charles Victor – Intro to R Creating the Dataset Currently we just have 5 variables (objects) These objects are independent of each other (ie the first name John is not linked with the last name Smith) These objects are independent of each other (ie the first name John is not linked with the last name Smith) To ‘link’ these objects we need to compile these variables together in a dataset which R calls a ‘data frame’ In R a data frame is an object just like a variable, and thus it is created in a similar fashion In R a data frame is an object just like a variable, and thus it is created in a similar fashion DATA_NAME <- data.frame (VARIABLE1,VARIABLE2,VARIABLE3) Note: All variables must have the same number of observations Now take a look at the data by typing the dataset name

18 J. Charles Victor – Intro to R Back to ‘Objects’ Look at the objects now in memory ls() or click MISC -> List all object ls() or click MISC -> List all object You should see all of the variables + the dataset You should see all of the variables + the dataset You can now use the dataset similar to how we have used variables You can now use the dataset similar to how we have used variables To see a variable, type the variable name To see a variable, type the variable name To see the dataset, type the dataset name To see the dataset, type the dataset name

19 J. Charles Victor – Intro to RBUT… Once attached to a dataset, the variables (Studentid, firstname, lastname, mark, sex) are different than the ‘objects’ in R’s memory So we have So we have The object: mark The variable mark on the class dataset You may want to get rid of the ‘objects’ now that you have compiled them onto the dataset – (any changes made to the objects, will not be reflected on dataset) You may want to get rid of the ‘objects’ now that you have compiled them onto the dataset – (any changes made to the objects, will not be reflected on dataset) rm(studentid, firstname, lastname, mark, sex)

20 J. Charles Victor – Intro to R Importing Existing Data into R R has not been very foreign data friendly But this is changing - rapidly But this is changing - rapidly Optimally datasets need to be in the form of: Optimally datasets need to be in the form of: ASCII text Tab delimited Comma delimited Best to convert Excel data into one of these formats Best to convert Excel data into one of these formats

21 J. Charles Victor – Intro to R Importing: ASCII text Use command: read.table OBJECT <- read.table(“C:\\My Document\\FILE.TXT, header=T) Note: Pathways, have to have double slash: \\ Note: Pathways, have to have double slash: \\ If variable names are on the first row If variable names are on the first row Use header=T option Otherwise variables will be named V1 V2 V3… Try to import the heart_rx dataset Try to import the heart_rx dataset If you are unsure of the pathway you can use the command: file.choose() nested in the read.table If you are unsure of the pathway you can use the command: file.choose() nested in the read.table This will cause R to bring up a GUI to choose your file OBJECT <- read.table(file.choose(), header=F) Try to import the heart_rx_noheader dataset this way

22 J. Charles Victor – Intro to R Importing: Tab Delimited or Comma Separated or Database File Tab Delimited Use command: read.delim Use command: read.delim OBJECT <- read.delim(“C:\\My Document\\FILE.TXT”, header=T,sep=“\t”) Comma Separated Value (CSV) Use command: read.csv Use command: read.csv OBJECT <- read.csv(“C:\\My Document\\FILE.CSV”, header=T,sep=“,”)

23 J. Charles Victor – Intro to R Importing: Access, SPSS, Stata etc Best method: 3 rd party software to convert data to a Delimited or CSV file DBMS Copy is very popular DBMS Copy is very popular Stat Transfer is very good Stat Transfer is very good Some users have created read.spss read.spss read.xport (for SAS files) read.xport (for SAS files) read.dta (for STATA files) read.dta (for STATA files) But these commands need to be downloaded and installed (more on that later) But these commands need to be downloaded and installed (more on that later)

24 J. Charles Victor – Intro to R Importing: R Dataset If a workspace has been saved from a previous session, simply load the workspace by ‘clicking and pointing’ Or use the load command load(“PATHWAY\\FILENAME.Rdata”)

25 J. Charles Victor – Intro to R Creating a Dataset from a Dataset If you want to create a copy of a current dataset, this is a simple function in R. Simply create a new object (ie with a different name) from the existing dataset Simply create a new object (ie with a different name) from the existing dataset NEWDATA <- OLDDATA To create a new dataset from an edited version of an old dataset To create a new dataset from an edited version of an old dataset NEWDATA <- edit(olddata) This will bring up the data editor (more on this later), and any changes will be attributed to NEWDATA, but not to OLDDATA

26 DATA MANIPULATION 99% of the work (don’t underestimate)

27 J. Charles Victor – Intro to R Data Manipulation: General Most of your time should be spent in this phase R is probably not the ‘best’ package R is probably not the ‘best’ package Data manipulation includes (among other things) Renaming variables Renaming variables Getting rid of variables Getting rid of variables Creating variables Creating variables Changing variables (eg categorising age) Changing variables (eg categorising age) Changing values of specific observations (eg someone reports age of 180) Changing values of specific observations (eg someone reports age of 180) Getting rid of observations Getting rid of observations Merging datasets Merging datasets

28 J. Charles Victor – Intro to R A couple of things first…. R has MANY ways of accomplishing similar tasks due to its open software construction When referring to variables on a dataset you must either: When referring to variables on a dataset you must either: Use: d_name$v_name OR “Attach” the dataset Attach(d_name) Attach(d_name) But attaching the dataset does not allow for manipulation of dataset variables only the use of these variables

29 J. Charles Victor – Intro to R What is he talking about?? Lets create a new dataset with two variables x and y X will be the numbers 1 to 20 X will be the numbers 1 to 20 Y will be 20 random values from a normal distribution Y will be 20 random values from a normal distribution X <- c(1:20) Y <- rnorm(x) Testdata <- data.frame(x,y) Remove the x and y objects Remove the x and y objects rm(x,y) Print the dataset, and then x and y Print the dataset, and then x and ytestdataXY Notice we could not access x and y this way. Try: Notice we could not access x and y this way. Try: Testdata$x Testdata$y That worked, but is a lot of typing. So we could also: That worked, but is a lot of typing. So we could also: Attach(testdata) XY That worked too! So attaching a dataset, allows us to access the variables on the dataset, without using the $ format – but only for visualizing and analysing, not editing (so I don’t like to do it) That worked too! So attaching a dataset, allows us to access the variables on the dataset, without using the $ format – but only for visualizing and analysing, not editing (so I don’t like to do it)

30 J. Charles Victor – Intro to R Renaming Variables Occasionally we need to rename a variable Many ways Many ways We can edit the data like a spreadsheet Fix(d_name) Create a copy of Class dataset, and “Fix” it NEWDATA <- edit(d_name) OR We can create a new variable d_name$new_v_name <-d_name$old_v_name

31 J. Charles Victor – Intro to R Deleting and Creating Variables To delete a variable set a variable to NULL d_name$v_name <- NULL To create a variable just set the new variable equal to some value – we use a similar construct as before d_name$v_name <- SOME_VALUE OR EXPRESSION d_name$v_name <- SOME_VALUE OR EXPRESSION

32 J. Charles Victor – Intro to R Creating Variables Suppose we want a variable identifying the day the exam was written and a variable identifying the maximum value for the exam class$test_day <- c(“Monday”) class$test_max <- c(100)

33 J. Charles Victor – Intro to R We can also create variables based on other variables Imagine that we now want to calculate the students percentage on the exam Imagine that we now want to calculate the students percentage on the exam d_name$newv_name = expression d_name$newv_name = expression For example: For example: class$prct <- (class$score / class$test_max)*100 Remember rules of BEDMAS Creating Variables

34 J. Charles Victor – Intro to R A Note on Mathematic Functions += addition += addition -= subtraction -= subtraction *= multiplication *= multiplication /= division /= division ( )= brackets ( )= brackets **= to the exponent **= to the exponent abs( x )= absolute value of x abs( x )= absolute value of x int( x )= integer value of x int( x )= integer value of x log( x )= natural log of x (ie Ln to non-math types) log( x )= natural log of x (ie Ln to non-math types) log10( x )= log base 10 of x (ie Log to non-math types) log10( x )= log base 10 of x (ie Log to non-math types) sqrt( x )= square root of x sqrt( x )= square root of x round( x, value)= round x, to value decimals round( x, value)= round x, to value decimals

35 J. Charles Victor – Intro to R Lets change the existing prct variable into letter grades Map out which letter grades apply to which percents Map out which letter grades apply to which percents Below 50 = F Below 50 = F 50 – 59= D 50 – 59= D 60 – 69 = C 60 – 69 = C 70 – 79= B 70 – 79= B 80 – 100= A 80 – 100= A Changing Variables

36 J. Charles Victor – Intro to R Two ways Two ways 1) Only for numeric variables Using Base R Cut function Cut function D_name$new_v_name <- Cut(d_name$old_v_name, breaks = c(breakpoints) OR breaks = #breaks, labels = c(“LABEL1”, “LABEL2”,….) ) EG class$lettergrd <- cut(class$prct, breaks = c(-Inf,49,59,60, 79,100), labels = c(“F”,”D”,”C”,”B”,”A”) ) Changing Variables - Recoding

37 J. Charles Victor – Intro to R Recoding variables – Second Method There is a “RECODE” function, but it has been developed outside of the original Base R We can incorporate programs that have been written by other people We can incorporate programs that have been written by other people Often these programs are compiled into a group of programs that are used for a similar construct Often these programs are compiled into a group of programs that are used for a similar construct These groups of programs are called “Packages” These groups of programs are called “Packages”

38 J. Charles Victor – Intro to R Installing a Package (to get a function that you do not have) First, note that you do not have ‘recode’ help(recode) help(recode) Now (after searching google) you find out that a special function called ‘recode’ is available in the package called ‘car’ Click PACKAGES -> INSTALL PACKAGE(S) R will ask you to set a CRAN Mirror (site from which to download packages) R will ask you to set a CRAN Mirror (site from which to download packages) Choose CANADA (ON) R will now ask which package you want to download R will now ask which package you want to download Choose “CAR” R will now download the ‘car’ package R will now download the ‘car’ package BUT the car package has just been installed, it has not yet been loaded Click PACKAGES -> LOAD PACKAGE(S) R will ask which package to Load from all that you have installed R will ask which package to Load from all that you have installed Choose “CAR” You can now use the recode function Type help(recode) Type help(recode)

39 J. Charles Victor – Intro to R Recoding – Second Method Now that the ‘CAR’ package is installed, we can use ‘recode D_name$new_v_name <- recode(d_name$old_v_name, recodes) Where recodes can be in form of: specific values: “c(99,999) = NA; c(1)=‘Y’ “ range of values: “lo:50=‘F’; 51:60=‘D’ “ class$lettergrd2 <- recode(class$prct, “lo:50=‘F’; 51:60=‘D’;…..”)

40 J. Charles Victor – Intro to R Combining Conditional Statements to Change Values within Observations Your TA informs you that Jim Smith was sick on for the Monday Exam, instead he was given a makeup exam, out of 98 To identify observations using conditional statements, we use the R function IFELSE To identify observations using conditional statements, we use the R function IFELSE IFELSE(condition/expression, value if true, value if false) class$testmax <- ifelse(class$firstname == ‘Jim’ & class$lastname == ‘Smith’, 98, class$testmax)

41 J. Charles Victor – Intro to R You are then informed that the twins (Joan and John Smith) cheated, you have to give them zeros: class$score <- ifelse((class$firstname == ‘Joan’ | class$firstname == ‘John’) & class$lastname == ‘Smith’, 0, class$score) More complex…

42 J. Charles Victor – Intro to R Logical Statements < = Less than < = Less than <= = Less than or equal to <= = Less than or equal to > = Greater than > = Greater than >= = Greather than or equal to >= = Greather than or equal to != = Not equal to != = Not equal to === Equal to === Equal to & or &&= Intersection boolean operator & or &&= Intersection boolean operator | or ||= Union boolean operator | or ||= Union boolean operator

43 J. Charles Victor – Intro to R Deleting Observations (or Subsetting) Suppose we want to look at only the Female students We need to either delete the Males or keep the females We need to either delete the Males or keep the females Best to create a new dataset with only females than deleting observations from our original dataset Best to create a new dataset with only females than deleting observations from our original dataset Many ways – Use subset command Many ways – Use subset command New_d_name <- subset(old_d_name, condition, select=variables wanted)

44 J. Charles Victor – Intro to R Females <- subset(class, class$sex == ‘F’) Note, we can also select out certain variables only Males <- subset(class, class$sex == ‘M’, select=c(firstname,lastname,lettergrd) )

45 J. Charles Victor – Intro to R Data Merge Two important types of merge Concatenation Concatenation Adding new observations to a set of old observations Matched merge Matched merge Adding new variables (values) to an existing dataset with the same observations (eg we need to add mid-term marks to our exam database)

46 J. Charles Victor – Intro to R Concatenation Easy Use rbind function, and add all datasets Use rbind function, and add all datasets new_d_name <- rbind(d_name1, d_name2,…) But all datasets must have same number (and names) of variables!

47 J. Charles Victor – Intro to R Matched Merge A little more complex Use merge function Use merge function If there is a common variable on which to merge: New_d_name <- merge(d_name1, d_name2, by = “ID”, all=TRUE) If the matching variables has different names New_d_name <- merge(d_name1, d_name2, by.x=“IDX”, by.y=“IDY”,all=TRUE)


Download ppt "Introduction to R. J. Charles Victor – Intro to R Workshop Plan The R interface The Console The Console The Script Editor The Script Editor The “Workspace”"

Similar presentations


Ads by Google