Presentation on theme: "Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia."— Presentation transcript:
Statistical Software An introduction to Statistics Using R Instructed by Jinzhu Jia
Chap 1. R Basics Installing R R Data Structures Vectors Matrices and Arrays Lists Data Frames Factors Objects
Installing R R can be downloaded freely from project.org. Windows, MAC, Linux versions
An Example Through this example, we will learn what data structures R is using. Data frames Vectors Factors Lists Matrices
Using R as a calculator Now you see a few functions: sin() exp() log() Try sin(pi/2) and Sin(pi/2), you will find that R is sensitive to the case of an alphabetical character We will talk more about functions later
Vectors A vector is an ordered collection of elements of the same basic type. Numeric vectors Logical vectors Character vectors
Numeric vectors final_scores <- c(100,99,98) ## create a vector ## this is also an assignment statement ## notice the differences between R and C “final_scores” is the name of the created variable “<-” is the assignment operator 100,99,98 are the values of the elements of the created vector; they are concatenated with function c() Type the variable name in R and hit enter, you will see this variable on screen
Numeric vectors -- variables A variable is used to store information The value can be alternated Variable names use A-Z, or a-z, 0-9, period (.) and underscore (_) Variable names cannot include spaces. Variable names are case sensitive. Variable names must start with a letter or a period. Variable names cannot be one of the reserved keywords.
Vectors-- How long is a vector? length() A vector is an R object. Each object has two intrinsic attributes: mode (or type) and length. We can use mode() and length() to find these two attributes.
Vectors – Change length of a vector Below is an equivalent way to create the above vector final_scores2 <- numeric()## the length is 0 final_scores2 = 100 ## the length is 1 final_scores2 = 99 ## the length is 2 final_scores2 = 98 ### note: `=' is also an assignment operator Try the following operations: X = 1:10 length(X) = 3 What is X? Differences between () and  ??
Vectors – Basic operations
Vectors – Index vectors An index vector is used to select subsets of a vector. Below are four types of index vectors A logical vector A vector of positive integers A vector of negative integers A vector of character strings
Logical index vectors A logical vector must be the same length as the vector from which elements are to be selected Values corresponds to TRUE in the index vector are selected For example: find the scores that are greater than 85 scores[scores >= 85] Y<-X[!is.na(X)] scores[gender == ‘Male’]
Positive or negative index vectors A positive index vector can be any length. It specifies which element should be included in the result X[c(1,5,6,1,2,1)] A negative index vector tells which element should be excluded. X[-c(2,3)]
Index vectors with character strings This index vector is used when a vector has a names attribute. scores = c(90,85,93,78) names(scores) = c('Li Bai','Li Hei', 'Li Hong', 'Li Xiaolan') scores[c('Li Bai','Li Hong')]
Vectors – A useful example Plot a unit circle – a circle centered at 0 with radius 1. X= seq(from = -1,to = 1, by = 0.001) Y = sqrt(1 - X^2) Z = c(Y,-Y) plot(rep(X,2),Z,type = 'l') n = length(X) X1 = c(X,X[n:1]) Z1 = c(Y, -Y[n:1]) plot(X1,Z1,type = 'l')
Vectors -- Help Try the following commands ? plot ? seq ? Rep ?’(‘ ?’[‘ Google baidu
Logical vectors The elements of a logical vector can have value TRUE, FALSE, or NA (not available) Logical operators: >, =, <=, != & (and) | (or)
Character vectors A sequence of characters delimited by the double quote character or single quote – no differences. For example, c(‘Li Bai’, ‘Li Hong’, ‘Li Xiaolan’) is a character vector with 3 elements. A useful function: paste() paste(c(‘a’,’b’,’c’),c(‘1’,’2’,’3’)) paste(c(‘X’,’Y’), 1:10, sep = “”) See the differences??
A simple text mining example text1 = "China's Jade Rabbit moon rover has endured a long lunar night but is still malfunctioning, state media said on Thursday, after technical problems last month cast uncertainty over the country's first moon landing.” text2 = "Jade Rabbit, named after a lunar goddess in traditional Chinese mythology, landed to domestic fanfare in mid-December, on a mission to do geological surveys and hunt natural resources." Question: (1)how many characters are there in Text1? (2)how many unique words are there in both Text1 and Text2? – google?
Factors A factor is a vector…..Will learn more later Just show one example: tapply(final,gender,mean), here gender is a factor; this function returns an array, The function tapply() is used to apply a function, here mean(), to each group of components of the first argument, here final, defined by the levels of the second component, here gender, as if they were separate vector structures.
A note on vector-recycling rule Look at the following example: X <- c(3,5,6) Y<-1 Z <- c(1,2,3,4,5,6) X+Y = c(3,5,6) + c(1,1,1) X+Z = c(3,5,6,3,5,6) + c(1,2,3,4,5,6) In words, Shorter vectors in the expression are recycled as often as need be (perhaps fractionally) until they match the length of the longest vector.
Matrices and Arrays Construction of a matrix X = matrix(,nrow = 2,ncol=2) X[1,1] = 2 X[2,2] = 3 X = matrix(1:9,ncol=3) X = matrix(1:9,ncol=3,byrow = T) as.vector(X) ## turn a matrix to a vector c(X) ## the same as as.vector(X)
Index matrices Index matrices are used to extract information Extract elements: X[1,3] Extract a row: X[1,] Extract a column: X[,2] Extract a few rows and columns: X[c(1,2),c(3,3,2)]
Higher dimensional array We take a 3 dimensional array as an example. It can store matrices. Say Z = array(dim=c(3,3,2)) Z[,,1] = X1; Z[,,2] = X2; ……
Operations on Matrices Transpose: t(X) dim(X), ncol(X),nrow(X) Addition: X + Y Subtraction: X-Y Multiplication: NOT X*Y; X %*% Y Inversion: solve(X) diag(): investigate diag(X), diag(c(1,2,3)),diag(3)
Eigenvalues and SVD Obj = eigen(X) ## eigenvalue decomposition Obj2 = svd(X) ## singular value decomposition Each returns a list.
cbind() and rbind() cbind() forms matrices by binding together matrices column-wise rbind() forms matrices row-wise Vectors are treated as matrices. Recycling rule will be used for short vectors. For example cbind(1,c(1,2),c(1,2,3))
More comments on factors table() return frequency tables Examples: tabl=tapply(gender,gender,length) tabl2 = table(gender) Best_scores = cut(final,breaks = c(min(final)- 0.5,85,max(final)+0.5)) Tab3 = table(Best_scores,gender)
Lists Recall that Vectors consists of an ordered collection of elements with the same basic type. Matrices also contains elements with the same type (numeric) A new type object called list consists of an ordered collection of any kinds of objects such as vectors, matrices, and lists……
Construction of a list list(name1 = obj1, name2 = obj2) It is very useful to use a list to return values of a function. For example, obj = svd(X). This obj is a list; it contains singular values and singular vectors. Lst <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9)) Lst,Lst[]??
Modifying Lists Lst$wife, Lst[[‘wife’]] #both retrieve the value of components of the lists with name attributes `wife’ You can also use Lst$w to denote Lst$wife if w can identify `wife”, ie. no other component name starts with `w’ You can concatenate different lists with c() via c(lsit1,list2,list3)
Data Frames A data frame is a special list. It is a list of vectors of the same length. Data frame is a list with the components arranged like a matrix – each column is one component of the list. Some Examples:
attach() and detach() After using attach(DF), you can use each column of DF as a vector and the vector name is the column name This way the original column in DF is protected. After using detach(DF), all of the variable names after column names of DF will not be available.
Objects The following are all R objects: Vectors Matrices and Arrays Lists Data Frames Factors