Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to R Workshop June 23-25, 2010 Southwest Fisheries Science Center 3333 North Torrey Pines Court La Jolla, CA 92037 Eric Archer

Similar presentations


Presentation on theme: "1 Introduction to R Workshop June 23-25, 2010 Southwest Fisheries Science Center 3333 North Torrey Pines Court La Jolla, CA 92037 Eric Archer"— Presentation transcript:

1 1 Introduction to R Workshop June 23-25, 2010 Southwest Fisheries Science Center 3333 North Torrey Pines Court La Jolla, CA Eric Archer

2 2 Introduction to R 1) How R thinks Environment Data Structures Data Input/Output 2) Becoming a codeR Data Selection and Manipulation Data Summary Functions 3) Visualization and analysis Data Processing (‘apply’ family) Plotting & Graphics Statistical Distributions Statistical Tests Model Fitting Packages, Path, Options

3 3 “Programming ought to be regarded as an integral part of effective and responsible data analysis” - Venables and Ripley S Programming S, S-Plus, R Why R? Free Open source Many packages Large support base Multi-platform Vectorization S Chambers, Becker, Wilks 1984: Bell Labs S-Plus 1988: Statistical Sciences 1993: MathSoft 2001: Insightful 2008: TIBCO R Ihaka & Gentleman 1996 (The R Project)

4 4 Workspace Entering commands commands and assignments executed or evaluated immediately separated by new line (Enter/Return) or semicolon recall commands with ↑ or ↓ case sensitive everything is some sort of function that does something Getting help > help(mean) > ?median > help(“[“) > example(mean) > help.search(“regression”) > RSiteSearch(“genetics”) >

5 5 Workspace ls() list objects in workspace rm(…) remove objects from workspace rm(list = ls()) remove all objects from workspace save.image() saves workspace load(".rdata") loads saved workspace history() view command history loadhistory() load command history savehistory() save command history # comments

6 6 <- assign c(…) combine arguments into a vector seq(x) generate sequence from 1 to x seq(from,to,by) generate sequence with increment by from:to generate sequence from.. to rep(x,times) replicate x letters,LETTERS vector of 26 lower and upper case letters Assignment and data creation > x <- 1 > y <- "A" > my.vec <- c(1, 5, 6, 10) > my.nums <- 12:24 > x [1] 1 > y [1] "A" > my.vec [1] > my.nums [1]

7 7 Data Structures Object modes (atomic structures) integer whole numbers (15, 23, 8, 42, 4, 16) numeric real numbers (double precision: 3.14, , 6.022E23) character text string (“Hello World”, “ROFLMAO”, “A”) logical TRUE/FALSE or T/F Object classes vector object with atomic mode factor vector object with discrete groups (ordered/unordered) array multiple dimensions matrix 2-dimensional array list vector of components data.frame "matrix –like" list of variables of same # of rows Special Values NULL object of zero length, test with is.null(x) NA Not Available / missing value, test with is.na(x) NaN Not a number, test with is.nan(x) (e.g. 0/0, log(-1) ) Inf, -Inf Positive/negative infinity, test with is.infinite(x) (e.g. 1/0 )

8 8 Creation and info vector(mode,length) create vector length(x) number of elements names(x) get or set names Indexing (number, character (name), or logical) x[n] nth element x[-n] all but the nth element x[a:b] elements a to b x[-(a:b)] all but elements a to b x[c(…)] specific elements x[“name”] “name” element x[x > a] all elements greater than a x[x %in% c(…)] all elements in the set Vectors

9 9 Create a vector > x <- 1:10 Give the elements some names > names(x) <- c("first","second","third","fourth","fifth") Select elements based on another vector > i <- c(1,5) > x[i] first fifth 1 5 > x[-c(i,8)] second third fourth Vectors

10 10 logical testing ==equals >, =, <=greater,less than or equal to ! not &, &&and (single is element-by-element, double is first element) |, ||orVectors Select elements based on a condition > x <- 1:10 > x[x < 5] [1] > x < 5 [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE > x[x < 5] [1] & vs && > x 2 [1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE > x 2 [1] FALSE

11 11 Operator recycles smaller object enough times to cover larger object > x <- 4 > y <- c(5, 6, 7, 8, 9, 10) > z <- x + y > z [1] > x <- c(3, 5) > z <- x + y > z [1] > i <- 1:10 > j <- c(T, T, F) > i[j] [1] Vectorization

12 12 summary(x) generic summary of object str(x) display object structure mode(x) get or set storage mode class(x) name of object class is. (x) test type of object (is.numeric, is.logical, etc.) attr(x, which) get or set the attribute of an object attributes(x) get or set all attributes of an object Object Information

13 13 > y <- 1:10 > str(y) int [1:10] > mode(y) [1] "numeric“ > class(y) [1] "integer“ > is.character(y) [1] FALSE > is.integer(y) [1] TRUE > is.double(y) [1] FALSE > is.numeric(y) [1] TRUE Object Information

14 14 Object Information > x <- 1:4 > names(x) <- c("first","second","third","four") > x first second third four > str(x) Named int [1:4] attr(*, "names")= chr [1:4] "first" "second" "third" "four" > attributes(x) $names [1] "first" "second" "third" "four" > attr(x, "notes") <- "This is a really important vector." > attributes(x) $names [1] "first" "second" "third" "four" $notes [1] "This is a really important vector." > attr(x, "date") < > attributes(x) $names [1] "first" "second" "third" "four" $notes [1] "This is a really important vector." $date [1] > x first second third four attr(,"notes") [1] "This is a really important vector." attr(,"date") [1]

15 15 coercion as. (x) coerces object x to if possible > x <- 1:10 > x.char <- as.character(x) > as.numeric(x.char) [1] > y <- letters[1:10] > as.numeric(y) [1] NA NA NA NA NA NA NA NA NA NA Warning message: NAs introduced by coercion > z <- "1char" > as.numeric(z) [1] NA Warning message: NAs introduced by coercion > logic.chars <- c("TRUE", "FALSE", "T", "F", "t", "f", "0", "1") > as.logical(logic.chars) [1] TRUE FALSE TRUE FALSE NA NA NA NA > logic.nums <- c(-2, -1, 0, 1.5, 2, 100) > as.logical(logic.nums) [1] TRUE TRUE FALSE TRUE TRUE TRUE

16 16 Factors Discrete ordered or unordered data Internally represented numerically factor(x, levels, labels, exclude, ordered) levels(x) labels(x) is.factor(x),is.ordered(x)

17 17 > x <- c("b", "a", "a", "c", "B", "d", "a", "d") > x.fac <- factor(x) > x.fac [1] b a a c B d a d Levels: a b B c d > str(x.fac) Factor w/ 5 levels "a","b","B","c",..: > levels(x.fac) [1] "a" "b" "B" "c" "d“ > labels(x.fac) [1] "1" "2" "3" "4" "5" "6" "7" "8“ > as.numeric(x.fac) [1] > as.character(x.fac) [1] "b" "a" "a" "c" "B" "d" "a" "d" Factors

18 18 Factors > x.fac.lvl <- factor(x, levels = c("a", "c")) > x.fac.lvl [1] a a c a Levels: a c > x.fac.exc <- factor(x, exclude = c("a", "c")) > x.fac.exc [1] b B d d Levels: b B d > x.fac.lbl <- factor(x, labels = c("L1", "L2", "L3", "L4", "L5")) > x.fac.lbl [1] L2 L1 L1 L4 L3 L5 L1 L5 Levels: L1 L2 L3 L4 L5 > x.fac[2] < x.fac[1] [1] NA Warning message: In Ops.factor(x.fac[2], x.fac[1]) : < not meaningful for factors > x.ord <- factor(x, ordered = TRUE) > x.ord [1] b a a c B d a d Levels: a < b < B < c < d > x.ord[2] < x.ord[1] [1] TRUE

19 19 Arrays and Matrices array(data, dim, dimnames) create array (row-priority) matrix(data, nrow, ncol, dimnames) create matrix x[row, col] element at row,col x[row,] x[, col] vector of row and col x[“name”, ] vector of row “name” etc. dim(x) retrieve or set dimensions nrow(x) number of rows ncol(x) number of columns dimnames(x) retrieve or set dimension names rownames(x) retrieve or set row names colnames(x) retrieve or set column names cbind(…) create array from columns rbind(…) create array from rows t(x) transpose (matrices)

20 20 Create an array > x <- array(1:10, dim = c(4, 6)) > x [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] > str(x) int [1:4, 1:6] > attributes(x) $dim [1] 4 6 > dim(x) [1] 4 6 > dimnames(x) NULL Arrays and Matrices

21 21 Set column or row names > colnames(x) <- c("col1", "col2", "col3", "col4", "5", "6") > x col1 col2 col3 col4 5 6 [1,] [2,] [3,] [4,] > colnames(x) <- c("column1", "column2") Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent > colnames(x)[1] <- "column1" > x column1 col2 col3 col4 5 6 [1,] [2,] [3,] [4,] Arrays and Matrices

22 22 Set row and columns names using dimnames > dimnames(x) <- list(c("first", "second", "third", "4"), NULL) > x [,1] [,2] [,3] [,4] [,5] [,6] first second third Arrays and Matrices Setting dimension names > dimnames(x) <- list(my.rows = c("first", "second", "third", "4"), my.cols = NULL) > x my.cols my.rows [,1] [,2] [,3] [,4] [,5] [,6] first second third

23 23 Change dimensionality of array > dim(x) <- c(6, 4) > x [,1] [,2] [,3] [,4] [1,] [2,] [3,] [4,] [5,] [6,] > dim(x) <- c(3, 4, 2) > x,, 1 [,1] [,2] [,3] [,4] [1,] [2,] [3,] ,, 2 [,1] [,2] [,3] [,4] [1,] [2,] [3,] Arrays

24 24 Bind several vectors into an array > i1 <- seq(from = 1, to = 20, length = 10) > i2 <- seq(from = 3.4, to = 25, length = 10) > i3 <- seq(from = 15, to = 25, length = 10) > i <- cbind(i1, i2, i3) > i i1 i2 i3 [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] Arrays and Matrices

25 25 > j <- rbind(i1, i2, i3) > j [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] i i i [,10] i1 20 i2 25 i3 25 > i <- cbind(col1 = i1, col2 = i2, col3 = i3) Arrays and Matrices

26 26 Special vector Collection of elements of different modes Often used as return type of functions list(…), vector(“list”, length) create list x[i] list of element i x[[i]] element i x[“name”] list of element name x[[“name”]] or x$name element name unlist transform list to a vector Lists

27 27 Lists > x <- list(1:10, c("a", "b"), c(TRUE, TRUE, FALSE, TRUE), 5) > x [[1]] [1] [[2]] [1] "a" "b" [[3]] [1] TRUE TRUE FALSE TRUE [[4]] [1] 5 > is.list(x) [1] TRUE > is.vector(x) [1] TRUE > is.numeric(x) [1] FALSE

28 28 What are the elements in a list? > x[1] [[1]] [1] > str(x[1]) List of 1 $ : int [1:10] > mode(x[1]) [1] "list“ > x[[1]] [1] > str(x[[1]]) int [1:10] > mode(x[[1]]) [1] "numeric“ Lists

29 29 > y <- list(numbers = c(5, 10:25), initials = c(“rnm", "fds")) > y $numbers [1] $initials [1] “rnm" "fds" > y$initials [1] “rnm" "fds“ > y["numbers"] $numbers [1] > y$new.element <- "This is new" > y $numbers [1] $initials [1] “rnm" "fds" $new.element [1] "This is new" Lists

30 30 Like matrices, but columns of different modes Organized list where components are columns of equal length rows x[[“name”]] or x$name column name x[row, column], etc. > age <- c(1:5) > color <- c("neonate", "two-tone", "speckled", "mottled", "adult") > juvenile <- c(TRUE, TRUE, FALSE, FALSE, FALSE) > spotted <- data.frame(age, color, juvenile) > spotted age color juvenile 1 1 neonate TRUE 2 2 two-tone TRUE 3 3 speckled FALSE 4 4 mottled FALSE 5 5 adult FALSE Data Frames

31 31 > is.matrix(spotted) [1] FALSE > is.array(spotted) [1] FALSE > is.list(spotted) [1] TRUE > is.data.frame(spotted) [1] TRUE > spotted$age [1] > spotted$age[2] [1] 2 > spotted$color[2] [1] two-tone Levels: adult mottled neonate speckled two-tone > spotted[spotted$age < 3, ] age color juvenile 1 1 neonate TRUE 2 2 two-tone TRUE Data Frames

32 32 Forcing character columns > str(spotted) 'data.frame': 5 obs. of 3 variables: $ age : int $ color : Factor w/ 5 levels "adult","mottled",..: $ juvenile: logi TRUE TRUE FALSE FALSE FALSE > spotted2 <- data.frame(age.class = age, + color.pattern = color, juvenile.stat = juvenile, + stringsAsFactors = FALSE) > spotted2 age.class color.pattern juvenile.stat 1 1 neonate TRUE 2 2 two-tone TRUE 3 3 speckled FALSE 4 4 mottled FALSE 5 5 adult FALSE > str(spotted2) 'data.frame': 5 obs. of 3 variables: $ age.class : int $ color.pattern: chr "neonate" "two-tone" "speckled" "mottled"... $ juvenile.stat: logi TRUE TRUE FALSE FALSE FALSE Data Frames

33 33 Data Frames Deleting columns > spotted$age <- NULL > spotted color juvenile 1 neonate TRUE 2 two-tone TRUE 3 speckled FALSE 4 mottled FALSE 5 adult FALSE Creating new columns > spotted$freq <- c(0.3, 0.2, 0.2, 0.15, 0.15) > spotted$have.data <- TRUE > spotted color juvenile freq have.data 1 neonate TRUE 0.30 TRUE 2 two-tone TRUE 0.20 TRUE 3 speckled FALSE 0.20 TRUE 4 mottled FALSE 0.15 TRUE 5 adult FALSE 0.15 TRUE

34 34 Data Frames subset(x, subset, select) > subset(spotted, age >=3) age color juvenile 3 3 speckled FALSE 4 4 mottled FALSE 5 5 adult FALSE > subset(spotted, juvenile == FALSE & age <= 4) age color juvenile 3 3 speckled FALSE 4 4 mottled FALSE > subset(spotted, age <=2, select = c("color", "juvenile")) color juvenile 1 neonate TRUE 2 two-tone TRUE

35 35 Data Input/Output Directory management dir() list files in directory setwd(path) set working directory getwd() get working directory ?files File and Directory Manipulation Standard ASCII Format read.table creates a data frame from text file read.csv read comma-delimited file read.delim read tab-delimited file read.fwf read fixed width format write.table write data to text file write.csv write comma-delimited file R Binary Format save writes binary R objects save.image writes current environment in binary R load reload files written with save R Text Format dump creates text representation of R objects source accept input from text file (scripts)

36 36 Reading ASCII > sets <- read.csv("Sets_All.csv", header = TRUE) > sets$Ordered.Year <- ordered(sets$Year) > sets$SpotCd.Fac <- factor(sets$SpotCd, exclude = NULL) > spotted.sets <- sets[sets$Sp1Cd == 2, ] > write.table(spotted.sets, file = "spotted.txt", + row.names = FALSE) Reading R binary > save(spotted.sets, file = "spotted.RData") > rm(list = ls()) > load("spotted.RData") Reading R commands > positions <- spotted.sets[, c("Latitude", "Longitude")] > dump("positions", file = "set_positions.R") > rm(list = ls()) > source("set_positions.R") Data Input/Output

37 37 Writing Scripts Text files containing commands and comments written as if executed on command line (usually end with.r) From R GUI : File|New script Any text editor (Notepad, Tinn-R, VEDIT, etc.) Commands executed with: source("filename.r") Copy/paste From R Editor : Edit|Run...

38 38 Exercise 1A : Assemble data frame 1.Assemble a data frame from “Homework 1” files with only these columns (make these names and in this order): boat (character), skipper (character), lat, lon, year, month, day, mammals, turtles, fish 2.Add a column classifying each trip by season: Winter: Dec – Feb, Spring: Mar – May, Summer: Jun – Aug, Fall: Sep – Nov 3.Add three columns classifying bycatch size for each of: fish : 200 (large) turtles : = 4 (large) mammals: = 2 (large) 4. Add column indicating that boat needs to be inspected if any bycatch class is “large” 5. Write your new data frame to a.csv file End Day 1 Exercise 1B : Make a list 1.Read.csv file from 1A into clean R environment 2.Create a list with one element for the entire data set and one element per bycatch type (4 elements total). Each bycatch element should contain a named vector of the number of trips with small, medium, and large bycatches 3.How many trips needed to be inspected? 4.How many trips had no bycatch at all? 5.Save list and results from 3 & 4 in an R workspace

39 39 sample(x, size, replace, prob) take a random sample from x cut(x, breaks, labels) divide vector into intervals %in% return logical vector of matches which(x) return index of TRUE results all(…), any(…) return TRUE if all or any arguments are TRUE unique(x) return unique observations in vector duplicated(x) return duplicated observations sort sort vector or factor order sort based on multiple arguments merge() merge two data frames by common cols or rows ceiling, floor, trunc, round, signif rounding functions Data Selection and Manipulation

40 40 sample > x <- 1:5 Sample x (jumble or permute) > sample(x) [1] Sample from x > sample(x, 3) [1] Sample with replacement > sample(x, 10, replace = TRUE) [1] Sample with modified probabilities > cars <- c("Ford", "GM", "Toyota", "VW", "Subaru", "Honda") > male.wts <- c(6, 5, 3, 1, 3, 3) > female.wts <- c(3, 3, 4, 8, 3, 6) > > male.survey <- sample(cars, 100, replace = TRUE, prob = male.wts) > female.survey <- sample(cars, 100, replace = TRUE, prob = female.wts)

41 41 cut cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE, dig.lab = 3, ordered_result = FALSE,...) > y <- c(4, 5, 6, 10, 11, 30, 49, 50, 51) Bins : 5 > y y y <= 50 > y.cut <- cut(y, breaks = c(5, 10, 30, 50)) > y.cut [1] (5,10] (5,10] (10,30] (10,30] (30,50] (30,50] Levels: (5,10] (10,30] (30,50] > str(y.cut) Factor w/ 3 levels "(5,10]","(10,30]",..: NA NA NA Bins : 5 >= y y y <= 50 > cut(y, breaks = c(5, 10, 30, 50), include.lowest = TRUE) [1] [5,10] [5,10] [5,10] (10,30] (10,30] (30,50] (30,50] Levels: [5,10] (10,30] (30,50] Bins : 5 >= y = y = y < 50 > cut(y, breaks = c(5, 10, 30, 50), right = FALSE) [1] [5,10) [5,10) [10,30) [10,30) [30,50) [30,50) Levels: [5,10) [10,30) [30,50) Bins : 5 >= y = y = y <= 50 > cut(y, breaks = c(5, 10, 30, 50), include.lowest = TRUE, right = FALSE) [1] [5,10) [5,10) [10,30) [10,30) [30,50] [30,50] [30,50] Levels: [5,10) [10,30) [30,50]

42 42 %in%, which > x <- sample(1:10, 20, replace = TRUE) > x [1] [20] 5 > x %in% c(3, 10, 2, 1) [1] FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE [10] TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE [19] FALSE FALSE > x[x %in% c(3, 10, 2, 1)] [1] > which(x %in% c(3, 10, 2, 1)) [1] > which(x < 5) [1] > x[which(x > 6)] [1]

43 43 any, all > x <- sample(1:10, 20, replace = TRUE) > x [1] > any(x == 6) [1] TRUE > all(x < 5) [1] FALSE

44 44 unique, duplicated > x <- sample(1:10, 20, replace = TRUE) > x [1] [20] 10 > unique(x) [1] > duplicated(x) [1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE [10] TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE TRUE [19] TRUE TRUE

45 45 sort, order > x <- sample(1:10, 20, replace = TRUE) > x [1] [19] 4 1 > sort(x) [1] [19] 9 10 > sort(x, decreasing = TRUE) [1] [19] 1 1 > order(x) [1] [19] 12 7 > trips <- read.csv(“homework 1a df.csv") > month.sort <- trips[order(trips$month), ] > month.days.sort <- trips[order(trips$month, trips$day), ]

46 46 merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"),...) > rm(list = ls()) > load("merge data.rdata") > str(cranial) 'data.frame': 20 obs. of 2 variables: $ id : Factor w/ 20 levels "Specimen-1","Specimen-12",..: $ skull: num > str(haps) 'data.frame': 20 obs. of 2 variables: $ id : Factor w/ 20 levels "Specimen-1","Specimen-10",..: $ haps: Factor w/ 5 levels "A","B","C","D",..: > merge(haps, cranial) id haps skull 1 Specimen-1 A Specimen-12 A Specimen-16 E Specimen-22 E merge

47 47 merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"),...) > str(sex) 'data.frame': 40 obs. of 2 variables: $ specimens: Factor w/ 40 levels "Specimen-1","Specimen- 10",..: $ sex : Factor w/ 2 levels "F","M": > str(trials) 'data.frame': 30 obs. of 2 variables: $ id : Factor w/ 23 levels "Specimen-1","Specimen-18",..: $ value: num > merge(sex, trials, by.x = "specimens", by.y = "id") specimens sex value 1 Specimen-1 F Specimen-11 F Specimen-12 M Specimen-14 M Specimen-15 M merge

48 48 nchar(x) number of characters in string substr(x, start, stop) extract or replace substrings strsplit(x, split) split string paste(..., sep, collapse) concatenate vectors format format object for printing grep, sub, gsub pattern matching and replacement String Manipulation

49 49 > x <- "This is a sentence." > nchar(x) [1] 19 > substr(x, 3, 9) [1] "is is a“ > substr(x, 1, 4) <- "That" > x [1] "That is a sentence.“ > strsplit(x, " ") [[1]] [1] "That" "is" "a" "sentence." > strsplit(x, "a") [[1]] [1] "Th" "t is " " sentence." nchar, substr, strsplit

50 50 paste > sites <- LETTERS[1:6] > paste("Site", sites) [1] "Site A" "Site B" "Site C" "Site D" "Site E" "Site F" > paste("Site", sites, sep = "-") [1] "Site-A" "Site-B" "Site-C" "Site-D" "Site-E" "Site-F" > paste("Site", sites, sep = "_", collapse = ",") [1] "Site_A,Site_B,Site_C,Site_D,Site_E,Site_F"

51 51 summary summarizes object – different for each class table create contingency table sum(x), prod(x) sum and product of vector cumsum(x) vector of cumulative sums rowSums, colSums compute row or column sums rowMeans, colMeans compute row or column means rowsum(x, group) compute column sums for a grouping variable Data Summary

52 52 > trips <- read.csv(“homework 1a df.csv") > table(season = trips$season) season Fall Spring Summer Winter > table(season = trips$season, fish.class = trips$fish.class) fish.class season Large Medium Small Fall Spring Summer Winter > turtle.class.table <- as.data.frame(table(turtle.class = trips$turtle.class)) > str(turtle.class.table) 'data.frame': 2 obs. of 2 variables: $ turtle.class: Factor w/ 2 levels "Large","Small": 1 2 $ Freq : int > turtle.class.table turtle.class Freq 1 Large Small 6557 table

53 53 > x <- matrix(1:18, nrow = 6, ncol = 3) > x [,1] [,2] [,3] [1,] [2,] [3,] [4,] [5,] [6,] > rowSums(x) [1] > colMeans(x) [1] > rowsum(x, c(1, 1, 2, 2, 3, 3)) [,1] [,2] [,3] > rowsum(x, c("a", "a", "a", "b", "b", "b")) [,1] [,2] [,3] a b row/col sums/means

54 54 min, max return minimum or maximum values range return a vector of minimum and maximum values which.min, which.max return index of first minimum value mean(x) arithmetic mean of vector sd, var, cov, cor standard deviation, variance, covariance, correlation median(x) median of vector quantile(x, probs) give quantiles of vector Data Summary > x <- sample(1:100, 50, replace = TRUE) > mean(x) [1] > median(x) [1] 51.5 > range(x) [1] > quantile(x, probs = 0.1) 10% 21.9 > quantile(x, probs = c(0.025, 0.5, 0.975)) 2.5% 50% 97.5%

55 55 fun.name <- function(args) { statements x or return(x) } result of last statement is return value arguments(args) passed by value can give default arguments “…” passes unmatched arguments to other functions Functions

56 56 F2C <- function(faren) { # converts farenheit to celsius cels <- round((faren - 32) * 5/9, 2) paste(faren, "deg. Farenheit =", cels, "deg. Celsius", sep=" ", collapse="") } sample.mean <- function(x, sample.size = 10) { y <- sample(x, size = sample.size, replace = TRUE) mean(y) } sample.mean <- function(x, sample.size = length(x)) { y <- sample(x, size = sample.size, replace = TRUE) mean(y) } sample.mean <- function(x,...) { y <- sample(x,...) mean(y, na.rm = TRUE) } Functions

57 57 if(cond) {statements} else {statements} evaluate condition ifelse(test, yes, no) evaluate test, return yes or no for(var in seq) {statements} execute one loop for each var in seq while(cond) {statements} execute loop as long as condition is true repeat {statements} execute expression on each loop break exits loop next moves to next iteration in loop switch(EXPR,...) select from list of alternatives print(x) prints object x to screen stop("...") stop function and print error message warning("...") generate warning message stopifnot(cond) stop if cond not TRUE Functions

58 58 fishery.status.1 <- function(catch, catch.limit = 20) { result <- list(to.close = TRUE, remaining.catch = NA) if (catch < catch.limit) { result$to.close = FALSE result$remaining.catch = catch.limit - catch } else { result$to.close = TRUE result$remaining.catch = 0 } result } fishery.status.2 <- function(catch, catch.limit = 20) { to.close = catch.limit remaining.catch <- ifelse(catch < catch.limit, catch.limit - catch, 0) list(to.close = to.close, remaining.catch = remaining.catch) } if, ifelse > x <- c(TRUE, TRUE, FALSE) > y <- c(FALSE, TRUE, FALSE) > z <- c(TRUE, FALSE, FALSE) > x & y [1] FALSE TRUE FALSE > x && y [1] FALSE > x && z [1] TRUE

59 59 for make.plates <- function(num.plates) { plate.vec <- vector("character", length = num.plates) for(i in 1:num.plates) { first.num <- sample(0:9, 1) chars <- sample(LETTERS, 3, replace = TRUE) chars <- paste(chars, collapse = "") last.nums <- sample(0:9, 3, replace = TRUE) last.nums <- paste(last.nums, collapse = "") plate.vec[i] <- paste(first.num, chars, last.nums, sep = "", collapse = "") } plate.vec } check.plates <- function(plates, reserved) { bad.plates <- vector("character") for(plate in plates) { plate.str <- substr(plate, 2, 4) if (plate.str %in% reserved) bad.plates <- c(bad.plates, plate) } bad.plates }

60 60 Question: How many trips had “small” bycatches for all categories? More importantly: What is the variance of this measure? bootstrap example trips <- read.csv("homework 1a df.csv") boot.bycatch <- function(trip.df, nrep) { obs.num.small <- num.all.small(trip.df) boot.results <- vector("numeric", nrep) for(i in 1:nrep) { boot.rows <- sample(1:nrow(trip.df), nrow(trip.df), rep = TRUE) boot.df <- trip.df[boot.rows, ] boot.results[i] <- num.all.small(boot.df) } list(observed = obs.num.small, boot.dist = boot.results) } num.all.small <- function(trip.df) { f.small <- trip.df$fish.class == "Small" t.small <- trip.df$turtle.class == "Small" m.small <- trip.df$mammal.class == "Small" sum(f.small & t.small & m.small) }

61 61 Exercise 2A : Reformat dates 1)Use “Homework 2 sets.csv” 2)Write function to split Date into Year, Month, Day 3)Save function as R object 4)Create numeric Year, Month, Day columns in data frame 5)Create new Date character column that is DD-MM-YY 6)Remove old Date column and save new data frame under new name Exercise 2B : Bootstrap fishery closures 1) Use “Homework 2 catches.txt" 2) Write and save a function that takes catch.data, a catch.limit, and a number of bootstrap replicates. The function should bootstrap the catch over all years and return two objects: 1) a distribution of the number of years with closures, and 2) a distribution of the average catch remaining. 3) Run bootstrap with catch limits of 20 and 50 at 1000 replicates each. Extra: Create a table showing the frequency distribution of the number of closures in the bootstrap result. End Day 2

62 62 lapply(X, FUN, …) apply function to list or vector sapply(X, FUN, …) simplified version of lapply apply(X, MARGIN, FUN, …) apply function to margins of array tapply(X, INDEX, FUN, …) apply function to ragged array by(data, INDICES, FUN,...) apply function to data frame aggregate(x, by, FUN,...) compute function for subsets of object Data Processing - ‘apply’ family

63 63 lapply lapply returns list > spring.trip <- trips$season == "Spring" > spring.fish 0] > spring.turtles 0] > spring.mammals 0] > > spring <- list(fish = spring.fish, turtles = spring.turtles, mammals = spring.mammals) > > lapply(spring, length) $fish [1] 2525 $turtles [1] 1274 $mammals [1] 2119 > lapply(spring, mean) $fish [1] $turtles [1] $mammals [1]

64 64 sapply sapply returns vector or matrix > sapply(spring, median) fish turtles mammals > sapply(spring, function(i) sum(i > 5 & i < 20)) fish turtles mammals > sapply(spring, function(i) c(n = length(i), mean = mean(i), var = var(i))) fish turtles mammals n mean var

65 65 apply bycatch.df <- subset(trips,, c("fish", "turtles", "mammals")) Apply across columns > apply(bycatch.df, 2, mean) fish turtles mammals > apply(bycatch.df, 2, quantile, prob = c(0.025, 0.975)) fish turtles mammals 2.5% % Apply across rows > bycatch.sum <- apply(bycatch.df, 1, sum) > range(bycatch.sum) [1] > mean(bycatch.sum) [1]

66 66 tapply apply function based on groups > tapply(trips$fish, trips$season, mean) Fall Spring Summer Winter > tapply(trips$fish, list(season = trips$season, class = trips$fish.class), median) class season Large Medium Small Fall Spring Summer Winter

67 67 1) Rewrite bootstrap from Exercise 2B using apply family 2) Run bootstrap with catch limits of 10, 15, 20, 30, 50, 60. 3) Summarize mean and median of results for each catch limit in one object Exercise 3 : Bootstrap with apply

68 68 Create a function that simulates growth data according to a Gompertz model, The output should have two columns (age and length). Age should be rounded to two decimal places. Length should be rounded to one decimal place. Try to put in checks and traps for screwy input data. sim.growth.func <- function(age.range, L0, k, g, sd, sample.size) age.range is a two element vector giving min and max ages L0 is length at birth k, g are model rate parameters sd is the standard deviation for the error term sample.size is the number of samples to return Simulated growth data

69 69

70 70 Simulated growth data # Gompertz growth function gomp.func <- function(age.vec, LAB, k, g) { LAB * exp(k * (1 - exp(-g * age.vec))) } # A function to created simulated growth data according # to a Gompertz equation sim.growth.func <- function(age.range, LAB, k, g, std.dev, sample.size = 1000) { # Check to make sure age.range is a reasonable vector if (!is.numeric(age.range) || !is.vector(age.range)) stop("'age.range' is not a numeric vector") if (any(age.range < 0)) stop("'age.range' < 0") if (age.range[1] >= age.range[2]) stop("'age.range[1]' >= 'age.range[2]'") # Generate some random ages between min and max of age.range random.ages <- runif(sample.size, age.range[1], age.range[2]) # Calculate the expected length for those ages from the Gompertz equation expected.length <- gomp.func(random.ages, LAB, k, g) # Add some error to the lengths and return the named array length.err <- rnorm(sample.size, 0, std.dev) as.data.frame(cbind(age = random.ages, length = expected.length + length.err)) } growth.df <- sim.growth.func(age.range = c(0, 65), LAB = 10, k = 2, g = 0.25, std.dev = 5)

71 71 Plot plot(x, y = NULL, type = "p", xlim = NULL, ylim = NULL, log = "", main = NULL, sub = NULL, xlab = NULL, ylab = NULL, ann = par("ann"), axes = TRUE, frame.plot = axes, panel.first = NULL, panel.last = NULL, col = par("col"), bg = NA, pch = par("pch"), cex = 1, lty = par("lty"), lab = par("lab"), lwd = par("lwd"), asp = NA,...) plot(growth$age, growth.df$length, xlab = "Age (years)", ylab = "Length (cm)")

73 73 Boxplot boxplot(formula, data = NULL,..., subset, na.action = NULL) boxplot(x,..., range = 1.5, width = NULL, varwidth = FALSE, notch = FALSE, outline = TRUE, names, plot = TRUE, border = par("fg"), col = NULL, log = "", pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), horizontal = FALSE, add = FALSE, at = NULL) > age.breaks <- hist(growth$age)$breaks > binned.age <- cut(growth$age, breaks = age.breaks) > boxplot(growth$length ~ binned.age, xlab = "Age bin", ylab = "Length")

74 74 Modifying Graphs abline add straight lines to plot lines join points at coordinates with lines points place points on plot title add labels to a plot text write text on a plot ?plot.default default plot options par set or get graphical parameters layout(mat,...) divide graphical screen into matrix split.screen(figs,...) divide graphical screen into sub-screens > newborns <- growth[growth$age <= 3, ] > adults 3, ] > > plot(newborns$age, newborns$length, xlim = range(growth$age), + ylim = range(growth$length), xlab = "Age", ylab = "Length", + col = "blue", pch = 21) > > par(new = TRUE) > > plot(adults$age, adults$length, xlim = range(growth$age), + ylim = range(growth$length), xlab = "", ylab = "", + col = "red", pch = 21) > > abline(v = 3, col = "green") > > text(3, 80, "Transition", pos = 4)

75 75 Modifying Graphs > layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE)) > plot(growth$age, growth$length, xlab = "Age", ylab = +"Length", main = "Simulated growth data") > age.breaks <- seq(0, max(growth$age) + 5, 5) > binned.age <- cut(growth$age, age.breaks) > hist(growth$age, age.breaks, xlab = "Age", main = "") > boxplot(growth$length ~ binned.age, names = +age.breaks[-length(age.breaks)], xlab = "Age bin")

76 76 Curve curve(expr, from, to, n = 101, add = FALSE, type = "l", ylab = NULL, log = NULL, xlim = NULL,...) > curve(sin, -10, 10) > plot(growth$age, growth$length, xlab = "Age", + ylab = "Length", main = "") > curve(10 * exp(2 * (1 - exp(-0.25 * x))), + add = TRUE, lty = "dashed", lwd = 2, col = "red")

77 77 d density p distribution function q quantile function r random number dunif, dnorm, dgamma, dbeta, dchisq, etc. >library(help=“stats”) >set.seed(x) set random number seed Statistical Distributions dnorm pnormqnorm

78 78 Statistical Tests binom.test(x, n, p = 0.5, alternative = c("two.sided", "less", "greater"), conf.level = 0.95) chisq.test(x, y = NULL, correct = TRUE, p = rep(1/length(x), length(x)), rescale.p = FALSE, simulate.p.value = FALSE, B = 2000) t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,...)

79 79 > male.growth <- sim.growth.func(c(0, 65), 10, 2.05, 0.27, 5) > female.growth <- sim.growth.func(c(0, 65), 10, 1.99, 0.23, 4) > adult.males 18, ] > adult.females 18, ] > gender.test <- t.test(adult.males[, "length"], adult.females[, "length"]) > gender.test Welch Two Sample t-test data: adult.males[, "length"] and adult.females[, "length"] t = , df = , p-value < 2.2e-16 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y > str(gender.test) List of 9 $ statistic : Named num attr(*, "names")= chr "t" $ parameter : Named num attr(*, "names")= chr "df" $ p.value : num 3.56e-74 $ conf.int : atomic [1:2] attr(*, "conf.level")= num 0.95 $ estimate : Named num [1:2] attr(*, "names")= chr [1:2] "mean of x" "mean of y" $ null.value : Named num 0..- attr(*, "names")= chr "difference in means" $ alternative: chr "two.sided" $ method : chr "Welch Two Sample t-test" $ data.name : chr "adult.males[, \"length\"] and adult.females[, \"length\"]" - attr(*, "class")= chr "htest"t-test

80 80 Linear Models lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset,...) Analysis of Variance Model aov(formula, data = NULL, projections = FALSE, qr = TRUE, contrasts = NULL,...) Generalized Linear Models glm(formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = glm.control(...), model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL,...) Nonlinear Least Squares nls(formula, data, start, control, algorithm, trace, subset, weights, na.action, model, lower, upper,...) Non-Linear Minimization nlm(f, p, hessian = FALSE, typsize=rep(1, length(p)), fscale=1, print.level = 0, ndigit=12, gradtol = 1e-6, stepmax = max(1000 * sqrt(sum((p/typsize)^2)), 1000), steptol = 1e-6, iterlim = 100, check.analyticals = TRUE,...) Model Fitting

81 81 > sim.growth <- sim.growth.func(c(0, 65), 10, 2, 0.25, 5) > juv <- as.data.frame(sim.growth[sim.growth[, "age"] < 10, ]) > juv.lm <- lm(length ~ age, juv) > juv.lm Call: lm(formula = length ~ age, data = juv) Coefficients: (Intercept) age > plot(juv.lm) Waiting to confirm page change... > plot(juv) > abline(coef = juv.lm$coefficients, col = "red", lty = "dashed") lm

82 82 Model Fitting fitted extract fitted values for models coef extract model coefficients resid extract model residuals deviance extract deviances for models logLik calculate log-likelihood for model fit AIC calculate AIC for model fit predict predictions from model results anova calculate analysis of variance tables > coef(juv.lm) (Intercept) age > logLik(juv.lm) 'log Lik.' -508 (df=3) > AIC(juv.lm) [1] 1023 > predict(juv.lm, data.frame(age = c(1, 5, 10)))

83 83 > gomp.form <- formula(length ~ LAB * exp(k * (1 - exp(-g * age)))) > growth.nls <- nls(gomp.form, sim.growth, start = c(LAB = 5, k = 5, g = 0.6)) > growth.nls Nonlinear regression model model: length ~ LAB * exp(k * (1 - exp(-g * age))) data: sim.growth LAB k g residual sum-of-squares: Number of iterations to convergence: 6 Achieved convergence tolerance: 9.67e-06 > plot(sim.growth) > age.vec <- 1:max(sim.growth$age) > lines(age.vec, predict(growth.nls, list(age = age.vec)), col = "red", + lty = "dashed", lwd = 2) nls

84 84 Packages, Path, & Options library() list available packages library(package) load package library(help = "package") list info about package (build, functions, etc.) require(package) loads package and returns FALSE if not present attach(x,pos) attach database (list, data frame, or file) to search path detach(x) remove database from search path search() list attached packages in search path options(...) set and examine global options ?Startup Control initialization of R session


Download ppt "1 Introduction to R Workshop June 23-25, 2010 Southwest Fisheries Science Center 3333 North Torrey Pines Court La Jolla, CA 92037 Eric Archer"

Similar presentations


Ads by Google