Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jack ChenCrash Course in R · October 16, 2009. 2 Presentation Flow Major Topics Background/ Environment Object- orientated Concept Common Data Structures.

Similar presentations


Presentation on theme: "Jack ChenCrash Course in R · October 16, 2009. 2 Presentation Flow Major Topics Background/ Environment Object- orientated Concept Common Data Structures."— Presentation transcript:

1 Jack ChenCrash Course in R · October 16, 2009

2 2 Presentation Flow Major Topics Background/ Environment Object- orientated Concept Common Data Structures and Operations Read/Write Data Graphics Samples R Session: Function writing Plots customization Simulation tips Control Blocks

3 Background/Environment

4 4  Mid 1970s, Bell Laboratory John Chambers, Rick Becker History Background Statistical Computing Subroutines Fortran Interactive Environment “Interactive Statistical Computing System” “Statistical Analysis System” Engine “S”

5 5  Early 1990s, University of Auckland Ross Ihaka, Robert Gentleman ”R” History Background Interactive Environment Statistical Computing Subroutines Engine C Pascal Java C++ Fortran Perl … R functions

6 6  Major differences between S and R o Syntax o Memory management o Variable scoping o S has developed into S-plus, a commercially available software from Tibco o R is an open source freeware, with contributed packages from researchers worldwide o Recently, XLSolutions is developing R-plus, the commercial version of R History Background

7 7  Starting R in Windows Environment Command line to interact with R Mouse- click menus Mouse- click shortcuts Windows

8 8  Some keyboard shortcuts for the Windows platform: o Esc: cancels current line of execution (useful when running into trouble) o Ctr-p or arrow up: previous command o Ctr-n or arrow down: next command o Ctr-u: erase line o Ctr-a or ‘ home ’ : beginning of line o Ctr-e or ‘ end ’ : end of line o Ctr-c: copy highlighted text o Ctr-v: paste o Ctr-x: copy and paste highlighted text o Ctr-l: clear command line window o Ctr-z or q(): quit Environment Windows

9 9  Starting R in Unix Environment Command line to interact with R Unix

10 10  Some keyboard shortcuts for Unix platform: o Esc or Ctr-c: cancels current line of execution (useful when running into trouble) o Ctr-p or arrow up: previous command o Ctr-n or arrow down: next command o Ctr-u: erase line o Ctr-a: beginning of line o Ctr-e: end of line o Ctr-z: send to background (type fg to bring back R) o Ctr-l: clear command line window o Ctr-r : reverse search command history o q(): quit session Environment Unix

11 11  R has an interpretative environment  Everything you type on the command line followed by ‘enter’ will be sent to R’s internal engine. R performs the following steps: o Interprets what you have typed o Evaluates it o Returns a result (possibly an error message)  The only exception when R sees a comment. R does not interpret anything after the pound sign # Environment R interpretor

12 Object-oriented Concept

13 13  Object-oriented programming is a natural way to classify and modularize “things” of interest in order to interact with them during program execution.  For example, suppose in our program there are 3 shapes: o Circle o Square o Triangle  Initialization o We want to be able to create different shapes of different sizes  Interaction o We want each shape to be able to report to us its area o We want each shape to be able to display itself Object-oriented Concept Intuition

14 14 Object-oriented Concept Intuition  Class: Shape  Type: Circle  Functions:  Report area  Draw  Attributes:  Name ID  Radius r  Class: Shape  Type: Square  Functions:  Report area  Draw  Attributes:  Name ID  Width w  Class: Shape  Type: Triangle (Isosceles)  Functions:  Report area  Draw  Attributes:  Name ID  Base b  Height h  Internally in a program:

15 15 Object-oriented Concept Intuition  Class: Shape  Type: Circle  Functions:  Report area  Draw  Attributes:  Name ID  Radius r  Typical programming steps: Radius: r = 1 Name: ID = circle1 Initialize Interact Tell me the area 1 2 π= 3.14159… Interact Draw

16 16 Object-oriented Concept Intuition  Class: Shape  Type: Circle  Functions:  Report area  Draw  Attributes:  Name ID  Radius r  Typical programming steps: Radius: r = 2 Name: ID = circle2 Initialize Interact Tell me the area 2 2 π= 12.566… Interact Draw

17 17 Object-oriented Concept Intuition  Typical programming steps: Width: w = 1 Name: ID = square1 Initialize Interact Tell me the area 1 2 = 1 Interact Draw  Class: Shape  Type: Square  Functions:  Report area  Draw  Attributes:  Name ID  Width w

18 18 Object-oriented Concept Intuition  Typical programming steps: base: b = 1 Height: h = 0.866 Name: ID = tri1 Initialize Interact Tell me the area 1(0.866)/2 = 0.433 Interact Draw  Class: Shape  Type: Triangle (Isosceles)  Functions:  Report area  Draw  Attributes:  Name ID  Base b  Height h

19 19 Object-oriented Concept Intuition  Class: Shape  Type: Circle  Functions:  Report area  Draw  Attributes:  Name ID  Radius r  Translating to sensible commands: Radius: r = 1 Name: ID = circle1 Initialize Interact Tell me the area 1 2 π= 3.14159… Interact Draw circle1 = Circle(r=1) area(circle1) draw(circle1)

20 20  Programming commands o circle2 = Circle(radius=2) o area(circle2) o draw(circle2) o square1 = Square(w=1) o area(square1) o draw(square1) o tri1 = Triangle(b=1, h=0.866) o area(tri1) o draw(tri1) Object-oriented Concept Intuition

21 21  What does this have to do with R? o R is inherently object-oriented. o R has a set of pre-defined objects that we can interact with them o There are tons of objects inside various packages in R online repository for us to perform various tasks o We can also write our own R objects that perform analysis to our needs o The way we interact with R is very similar to the way we interacted with the program with 3 shapes Object-oriented Concept In relation to R

22 Common Data Structures and Operations in R

23 23  Primitive data objects o Comes with all R installations o Integers: -3, -2, 1, 2, 3, 1e+10, … o Doubles: 0.789, 3.14, 1.68, 2.9e-6, … o Complex numbers: 3i+7, 2i+3, … o Characters: “a”, “zZ”, “I hope you are still awake”,… o Constants: pi o Logical symbols: TRUE, FALSE o The empty object: NULL o Missing value: NA o Infinity: Inf o Some others Common Data Structures Primitive data objects

24 24  Primitive operators arithmetic:+, -, *, / modular:% matrix multiply:%*% power:^ logical and/or:&, | relation:, >=, ==, != assignment:=, <- Common Data Structures Primitive operators

25 25  R function calls have the form: functionName(arg1, arg2, …)  Primitive functions square-root:sqrt(arg) exponential:exp(arg) natural log:log(arg) length of object:length(arg) sum of elements in object:sum(obj) concatenate objects:c(arg1, arg2, …) round down to nearest integer:floor(arg) round up to nearest integer:ceiling(arg) many many others Common Data Structures Primitive functions

26 26  Examples of valid expressions 1 “a” ‘a’ 1 & TRUE TRUE == FALSE TRUE != FALSE 2 > 3 1 + 2 + 3 + 4 2^3 a = 4; b = 2^a log(37) Common Data Structures Simple valid expressions

27 27  Examples of invalid expressions lala# variable not assigned sqrt(25, 4)# too many arguments log(1 2)# invalid argument 1 = “a”# cannot assign value to primitive numeric TRUE = 3# cannot assign value to primitive logical Common Data Structures Simple invalid expressions

28 28  Vectors o R vectors are column vectors, even though they are displayed horizontally in R o c(object 1, object 2, …, object N ) o c stands for: concatenate object 1, object 2, …, object N Common Data Structures and Constructs vectors

29 29  Examples of vectors: o c(1, 2, 3, 4)# numeric vector, (1, 2, 3, 4) o c(1:4)# same as above o c(1, “a”)# mixture of object types o c(c(1:3),c(7:10))# (1, 2, 3, 7, 8, 9, 10) o c(TRUE, FALSE)# logical vector Common Data Structures and Constructs vectors

30 30  Other ways to form vectors: o seq(start, end, by increment) seq(1, 10, 1)# equivalent to c(1:10) seq(10, 1, -1)# equivalent to c(10:1) o rep(object, repeat) rep(1, 10)# a vector of 10 1’s rep(c(1, 2), 10)# a vector of 1 2 1 2 … Common Data Structures and Constructs vectors

31 31  Accessing vector elements o vector[start index:end index] v = c(1, 2, 3, 4)# assigns v c(1, 2, 3, 4)[1]# returns 1 c(1, 2, 3, 4)[2:4]# returns (2, 3, 4) c(1, 2, 3, 4)[-1]# removes 1 st element, returns (2, 3, 4) c(1, 2, 3, 4)[c(1, 3)]# returns (1, 3) Common Data Structures and Constructs vectors

32 32  Matrices o R matrices are objects internally represented as vectors, with 2 additional attributes: number of rows number of columns o matrix(c(object 1, object 2, …, object N ), nrow = I, ncol = J) Common Data Structures and Constructs matrices

33 33  Examples of matrices: o matrix(c(1:12), nrow=4, ncol=3) o matrix(c(1:12), 4, 3)# same as above o matrix(c(1:12), nrow=4)# same as above o matrix(c(1:12), ncol=3)# same as above o matrix(c(1:12), 4, 2)# invalid  Other ways to form matrices: o diag(1, 10)# 10x10 identity matrix o diag(“a”, 10)# 10x10 matrix with diagonal of “a” o diag(c(1:10), 10)# 10x10 matrix with diagonal # entries 1, 2, …, 10 Common Data Structures and Constructs matrices

34 34  Accessing matrix elements o matrix[(accessing row vectors), (accessing column vectors)] A = matrix(c(1:9), 3, 3)# assign matrix to variable name A A[1, 1]# returns 1 st row 1 st element A[1, ]# returns row 1 A[, 1]# returns column 1 A[, 1:2]# returns column 1, 2 A[1:5]# returns (1, 2, 3, 4, 5) Common Data Structures and Constructs matrices

35 35  Matrix manipulation o Adding a row rbind(matrix object, vector object) o Adding a column cbind(matrix object, vector object) o Examples: A = matrix(c(1:9), 3, 3) cbind(matrix, c(10:12)) # add (10, 11, 12) as last # column cbind(A[,1], c(10:12), A[,2:3]) # add (10, 11, 12) as # 2 nd column Common Data Structures and Constructs matrices

36 36  Matrix operation o Matrix operations on matrices A, B of conforming dimensions Addition: A + B Subtraction: A - B Multiplication: A %*% B Inverse: solve(A) Transpose: t(A) Determinant: det(A) Common Data Structures and Constructs matrices

37 37  Lists o Traditionally vectors and matrices contain simple data objects, mostly primitive data objects. More complex data structures are stored in lists. o lists contain objects and their assigned names: o list(name1=object1, name2=object2, …)  Example of a list: o list(foo=“hello”, bar=“world”) Common Data Structures and Constructs lists

38 38  Accessing elements in a list: o We can reference objects in lists by their names with the dollar “$” operator: alist = list(Friday=“happy”, Monday=“urrr”) alist$Friday# returns “happy” alist$Monday# returns “urrr” o If no object in the list contains the name following $, then NULL is returned: alist$Tuesday# returns NULL o We can also access objects in lists by their index with double bracket [[index]]: alist[[1]]# returns “happy” alist[[2]]# returns “urrr” Common Data Structures and Constructs lists

39 39  Operating on R objects o R operations are vector-based o When the left hand side (LHS) and right hand side (RHS) of an operator conform, elements on LHS of an operator interact with elements on RHS o Examples c(1, 2) + c(3, 4)# returns (4, 6) c(1, 2) + c(3, 4, 5, 6)# returns (4, 6, 6, 8) # (1, 2) is added to (3, 4) and (5, 6) 2^c(1, 2, 3, 4)# returns (2, 4, 8, 16) c(1, 2)^c(1, 2, 3, 4)# returns (1, 4, 1, 16) Operations operating on R objects

40 40  Operating on R objects o Most of the built-in R objects can report their dimensions. o Examples: length(c(1:4))# return 4 length(list(a=1, b=2))# return 2 length(matrix(c(1:12),4,3))# return 12 nrow(matrix(c(1:12),4,3))# returns 4 ncol(matrix(c(1:12),4,3))# returns 3 Operations operating on R objects

41 Control Blocks

42 42  Logical Expressions o Logical expression is an expression which evaluates to TRUE or FALSE o Logical expressions can be formed by the relation operators equal: == not equal:!= less than< greater than> less than or equal to: <= greater than or equal to:>= o Examples: 0 < 1# evaluates to TRUE 0 > 1# evaluates to FALSE “A” == “a”# evaluates to FALSE Control Blocks Logical expressions

43 43  if-else statement o if (logical expression) { … } else { … } { … } can be a single expression, or a group of expressions and statements, including another if-else statement. The else part of the statement is optional. o Examples: if (0 < 1) “ true ” if (0 > 1) “ should not see anything ” if ( “ a ” == “ A ” ) { “ not equal ” } else { “ equal ” } if (FALSE) { “ nothing ” } else if (TRUE) { “ something ” } Control Blocks if-else statement

44 44  While loop o while (logical expression) { … } { … } (the “ body ” of the statement) can be a single expression, or a group of expressions. while statement loops inside { … } until the logical expression evaluates to FALSE. o Example: while (TRUE) { “ never ends!! ” } while (FALSE) { “ never executed!! ” } x=1; while (x==1) { print(x); x=2 }# prints 1, then # assign x to 2 Control Blocks while loop

45 45  For loop o for (index in start:end) { … } { … } (the “ body ” of the statement) can be a single expression, or a group of expressions or statements. for statement loops in { … } until index exceeds end o Example: for (i in 1:10) { print(i); } Control Blocks for loop

46 Read/Write Data

47 47  Read/Write Data o Importing and Exporting data in R is relatively painless. o We can easily import/export files where: data points are separated by commas data points are separated by tabs or spaces data points are separated by some other delimiter.  Read SAS/SPSS/Stata data o Package “foreign” contains functions that allow you to read, among others, SAS/SPSS/Stata data. type: install.packages(“foreign”), select a location to download package, the rest is automatic type: library(foreign) to load the package type: help(package = foreign) to see a list of functions Read/Write Data

48 48  Example of reading a file # reads a file, data points separated by spaces or tabs # assign first column to y, second column to x1, third column to x2 file = “ http://www-personal.umich.edu/~jktc/R/samples/simple.dat ” http://www-personal.umich.edu/~jktc/R/samples/simple.dat read.table(file, col.names=c( “ y ”, “ x1 ”, “ x2 ” )) # specify missing data in file read.table(file, na.strings= “. ” ) # if first row of data file has header (names for each column) file2 = http://www-personal.umich.edu/~jktc/R/samples/simple.header.dathttp://www-personal.umich.edu/~jktc/R/samples/simple.header.dat read.table(file2, header=TRUE) # to see more details of read.table function help(read.table) Read/Write Data Reading from a file

49 49  Example of writing to a file data = matrix(c(1:9), 3, 3) # write a space separated file. # assign first column to y, second column to x1 # third column to x2 write.table(data, file= “ c:/temp/simple.dat ”, row.names=FALSE, col.names=c( “ y ”, “ x1 ”, “ x2 ” ), sep= “ “ ) # to see more details on write.table function help(write.table) Read/Write Data Writing to a file

50 Graphics Samples

51 51  R has a sophisticated and powerful graphic engine.  We can think of graphic engine as one large object with many attributes representing different pieces to be displayed. The par function allows you to change different attributes of a graph.  Take a look at the different graphic parameters that are available in R: o help(par) Graphics Samples Basic graphics

52 52  Sample plot Graphic Samples Sample plot

53 53  Sample plot Graphic Samples Sample plot

54 54  Image plot Graphic Samples Sample graphics

55 55  3-D figure Graphic Samples Sample graphics

56 R Session: Function writing, Plots customization, Simulation tips

57 57  Writing and Debugging Functions o One of the advantages in R is the ease of creating our own functions. Here ’ s a very simple function: foo = function() { print( “ hello world ” ); } o Functions are object themselves. o We are assigning to variable “ foo ” a function with no argument. o When executed: foo(), a message “ hello world ” is printed to screen. Functions Writing/Debugging Function syntax

58 58  Sample function writing session: 1. Generate population of size 1000 based on the model: 2. Take a random sample of size 100 from population 3. Perform two simple linear regressions of y on x: fit one with intercept fit one without intercept 4. Repeat steps 2 and 3 500 times, store each regression coefficients and plot a histogram of their distribution over the 500 values (ie, distributions of estimated coefficients based on samples). Functions Writing Function writing session

59 59  More on graphics: o To output a plot/graph to a file pdf(file=filename)# generates pdf file jpeg(file=filename)# generates jpeg file png(file=filename)# generates png file and some others o When you are done graphing/plotting, run dev.off() to have the image saved in file. o Without calling the above functions, R generates graphics in a separate window. o The package “xtable” allows you to output tables into various formats, including html, latex, etc. Functions Writing Function writing session

60 60  Help and administrative functions o help.search(any key word) # help.search(“random forest”) o help(functionName) # help(glm) o install.packages(“packageName”) # note the quote o require(packageName) o save(file=, list= ) o save.image(file= )  Other common functions o Model fitting lm, glm, lsfit, anova summary, coef, residuals o Model adequecy checking av.plot (in car package), influence.measure, colldiag Functions Writing/Debugging Common functions

61 61  Distributions functions o For normal distribution, R has 4 associated functions: dnorm: probability density function pnorm: cumulative density function qnorm: inverse of cumulative density function rnorm: point generating function o Others dpois, ppois, qpois, rpois (poisson) dgeom, pgeom, qgeom, rgeom(geometric) dbinom, pbinom, qbinom, rbinom(binomial) dnbinom, pnbinom, qnbinom, rnbinom(negative binomial) dunif, punif, qunif, runif(uniform) dexp, pexp, qexp, rexp(exponential) dgamma, pgamma, qgamma, rgamma(gamma) dbeta, pbeta, qbeta, rbeta(beta) dchisq, pchisq, qchisq, rchisq(chi-square) df, pf, qf, rf(F distribution) dt, pt, qt, rt(t distribution) Functions Writing/Debugging Common functions

62 62  Running R commands in batch mode under Unix environment o Suppose the R commands are in file: cmds.R o At a command line prompt, type: R --no-save output.log 2>&1 o To see the details of command line options: man R Functions Writing/Debugging Common functions

63 63  References  Official R-project website: o http://www.r-project.org http://www.r-project.org o On the left hand side, there ’ s a link “ Manuals ” under Documentation. There are quite a few good documentations. o The link “ packages ” gives a listing of available R packages, and their documentations.  An excellent link with R examples (including linking R with C/C++ programs): o http://www.math.ncu.edu.tw/~chenwc/R_note/ http://www.math.ncu.edu.tw/~chenwc/R_note/  R for Windows FAQ: o http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/rw-FAQ.html http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/rw-FAQ.html  Google: o Since R is single letter, searching “ R ” might give you unrelated results. I ’ ve used: R+project, R+cran, R+stat, etc … o cran stands for “ Complete R Archive Network ” Wrapping up References

64 64  Thank you!  The slides are posted at: http://www-personal.umich.edu/~jktc/R/presentation2009.pptx  The sample R commands in the slides are posted at: http://www-personal.umich.edu/~jktc/R/samples/sample.cmds.2009.R This is it! Q&A


Download ppt "Jack ChenCrash Course in R · October 16, 2009. 2 Presentation Flow Major Topics Background/ Environment Object- orientated Concept Common Data Structures."

Similar presentations


Ads by Google