Download presentation

Presentation is loading. Please wait.

Published byKenyon Ledford Modified over 4 years ago

1
Introducing

2
What is and what can I do with it? R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: http://cran.stat.auckland.ac.nz/http://cran.stat.auckland.ac.nz/ What is R? A very simple programming language A place for you to input data A collection of tools for you to perform calculations A tool for producing graphics A statistics suite that can be downloaded on to any PC, Mac or Linux system A software package that can run on high performance computing clusters R is freely available software for Windows, Mac OS and Linux To download R in New Zealand: http://cran.stat.auckland.ac.nz/http://cran.stat.auckland.ac.nz/ What is R? A very simple programming language A place for you to input data A collection of tools for you to perform calculations A tool for producing graphics A statistics suite that can be downloaded on to any PC, Mac or Linux system A software package that can run on high performance computing clusters

3
What is and what can I do with it? With R you can: Perform simple or advanced statistical tests and analyses e.g. standard deviation, t-test, principal component analysis Read and manipulate data from existing files e.g. tables in Excel files, trees in nexus files, data on websites Write data or figures to files e.g. export a figure to.pdf, export a.csv file Produce simple or advanced figures With R you can: Perform simple or advanced statistical tests and analyses e.g. standard deviation, t-test, principal component analysis Read and manipulate data from existing files e.g. tables in Excel files, trees in nexus files, data on websites Write data or figures to files e.g. export a figure to.pdf, export a.csv file Produce simple or advanced figures

4
What is and what can I do with it? http://dx.doi.org/10.1098/rspb.2014.0806 Figure 2. A reconstruction of the evolutionary history of carotenoid pigmentation in feathers. The likelihood that ancestors could display carotenoid feather pigments has been reconstructed using ‘hidden’ transition rates in three rate categories (AIC = 4002.5, 11 transition rates) [33]. The POEs (defined in Material and methods) for carotenoid feather pigmentation are identified by red circles. Branches are coloured according to the proportional likelihood of carotenoid-consistent colours at the preceding node. Solid purple points indicate species for which carotenoid feather pigments were confirmed present from chemical analysis; open black points represent those for which where carotenoids were not detected in feathers after chemical analysis. Supertree phylogeny from [21]. http://dx.doi.org/10.1098/rspb.2014.0806 Figure 2. A reconstruction of the evolutionary history of carotenoid pigmentation in feathers. The likelihood that ancestors could display carotenoid feather pigments has been reconstructed using ‘hidden’ transition rates in three rate categories (AIC = 4002.5, 11 transition rates) [33]. The POEs (defined in Material and methods) for carotenoid feather pigmentation are identified by red circles. Branches are coloured according to the proportional likelihood of carotenoid-consistent colours at the preceding node. Solid purple points indicate species for which carotenoid feather pigments were confirmed present from chemical analysis; open black points represent those for which where carotenoids were not detected in feathers after chemical analysis. Supertree phylogeny from [21].

5
Who is this guide for? Starting at ground level and shaping you into a confident R user Are you… Completely new to R? An infrequent R user who wants a refresher? The material in these slides may not be useful for confident R users. Starting at ground level and shaping you into a confident R user Are you… Completely new to R? An infrequent R user who wants a refresher? The material in these slides may not be useful for confident R users. An Introduction to R W. N. Venables, D. M. Smith and the R Core Team http://cran.r-project.org/doc/manuals/R-intro.pdf An Introduction to R W. N. Venables, D. M. Smith and the R Core Team http://cran.r-project.org/doc/manuals/R-intro.pdf

6
What does this guide cover? Part zero: Getting started Interacting with R Part one: Objects Vectors, Matrices, Character arrays Part two: Data manipulation Analysing data, T-test Part three: External data Reading data into R, ANOVA Part four: Packages and libraries Installing new packages into R Part five: Scripts Using pre-written code Part six: Logic (programming) Other functions in R Part zero: Getting started Interacting with R Part one: Objects Vectors, Matrices, Character arrays Part two: Data manipulation Analysing data, T-test Part three: External data Reading data into R, ANOVA Part four: Packages and libraries Installing new packages into R Part five: Scripts Using pre-written code Part six: Logic (programming) Other functions in R

7
Starting This guide will demonstrate the R Console (command-line input) for R 3.02 running in Windows 7. For Mac OS, R can be executed from terminal. For Unix, seek professional help… The only point of difference should be the initial starting of R and the visual appearance: Console commands will be the same for all operating systems. This guide will demonstrate the R Console (command-line input) for R 3.02 running in Windows 7. For Mac OS, R can be executed from terminal. For Unix, seek professional help… The only point of difference should be the initial starting of R and the visual appearance: Console commands will be the same for all operating systems.

8
Part zero: Getting started #Throughout this guide a hashtag (i.e. number sign ‘#’) will identify a comment or instruction #Start R by finding the R application on your computer #You will be presented with the R console #Throughout this guide a hashtag (i.e. number sign ‘#’) will identify a comment or instruction #Start R by finding the R application on your computer #You will be presented with the R console

9
Part zero: Getting started #There are a variety of ways of using R, and we will start out with the most basic #We are going to enter lines of code into R by typing or pasting them into the R console #At its most basic, R is just a calculator > 1+1 [1] 2 > 1*3 [1] 3 > 4-7 [1] -3 > 20/4 [1] 5 > #The lines above this have come from the R Console. Remember to remove the > symbol if you copy text directly from these slides and paste it into R #There are a variety of ways of using R, and we will start out with the most basic #We are going to enter lines of code into R by typing or pasting them into the R console #At its most basic, R is just a calculator > 1+1 [1] 2 > 1*3 [1] 3 > 4-7 [1] -3 > 20/4 [1] 5 > #The lines above this have come from the R Console. Remember to remove the > symbol if you copy text directly from these slides and paste it into R

10
Part zero: Getting started #Some more basic mathematical operations in R > 12--2 [1] 14 > 2^2 [1] 4 > sqrt(9) [1] 3 > 4*(1+2) [1] 12 #Some more basic mathematical operations in R > 12--2 [1] 14 > 2^2 [1] 4 > sqrt(9) [1] 3 > 4*(1+2) [1] 12

11
Part zero: Exercise #Use R to find the length of the hypotenuse in the triangle shown below #Side a has length 3, Side b has length 4, and the hypotenuse has length h h 2 =a 2 +b 2 h= √(a 2 +b 2 ) #Use R to find the length of the hypotenuse in the triangle shown below #Side a has length 3, Side b has length 4, and the hypotenuse has length h h 2 =a 2 +b 2 h= √(a 2 +b 2 ) 3 4 h

12
Part zero: Exercise #Use R to find the length of the hypotenuse in the triangle shown below > sqrt(3^2+4^2) [1] 5 #Use R to find the length of the hypotenuse in the triangle shown below > sqrt(3^2+4^2) [1] 5 3 4 h

13
Part one: Objects #R is more than just a basic calculator… #Most operations in R will use objects, which are values stored in R #Type x=1 into the R console #You have now input a number into R by storing that number as an object. For this example, the name of our object is x #Objects must be named using letters alone, or letters followed by other symbols #Object names cannot include spaces > x=1 > #Congratulations, you have just programmed R to store an object. #Type x into the R console to recall your object > x [1] 1 > #R is more than just a basic calculator… #Most operations in R will use objects, which are values stored in R #Type x=1 into the R console #You have now input a number into R by storing that number as an object. For this example, the name of our object is x #Objects must be named using letters alone, or letters followed by other symbols #Object names cannot include spaces > x=1 > #Congratulations, you have just programmed R to store an object. #Type x into the R console to recall your object > x [1] 1 >

14
Part one: Objects #We will now replace the value of x with 10 > x [1] 1 > x=10 > x [1] 10 > #As you can see, the value of an object can be easily replaced by simply making the object equal to a new value #We will now replace the value of x with 10 > x [1] 1 > x=10 > x [1] 10 > #As you can see, the value of an object can be easily replaced by simply making the object equal to a new value

15
Part one: Objects #Let’s make y into a vector - a one dimensional array #There are several ways of making a vector in R. These methods introduce functions. #A function is an operation performed on numbers and/or objects. #The two easiest ways of making a vector in R use different functions: #Use the concatenate function c and place numbers inside parentheses > y=c(10,11,12,13,14,15,16,17,18,19,20) > y [1] 10 11 12 13 14 15 16 17 18 19 20 #Use the array function and place numbers inside parentheses > y=array(10:20) > y [1] 10 11 12 13 14 15 16 17 18 19 20 #Let’s make y into a vector - a one dimensional array #There are several ways of making a vector in R. These methods introduce functions. #A function is an operation performed on numbers and/or objects. #The two easiest ways of making a vector in R use different functions: #Use the concatenate function c and place numbers inside parentheses > y=c(10,11,12,13,14,15,16,17,18,19,20) > y [1] 10 11 12 13 14 15 16 17 18 19 20 #Use the array function and place numbers inside parentheses > y=array(10:20) > y [1] 10 11 12 13 14 15 16 17 18 19 20

16
Part one: Objects #Just as we replaced x with a single value, we can also replace a single value within our vector #Let’s replace the fifth number in our vector with 0 > y [1] 10 11 12 13 14 15 16 17 18 19 20 > y[5]=0 > y [1] 10 11 12 13 0 15 16 17 18 19 20 > #Square brackets [] placed after a vector will instruct R that we are interested in only a part of the vector. In the example above, we are referring to the fifth position in the vector #Just as we replaced x with a single value, we can also replace a single value within our vector #Let’s replace the fifth number in our vector with 0 > y [1] 10 11 12 13 14 15 16 17 18 19 20 > y[5]=0 > y [1] 10 11 12 13 0 15 16 17 18 19 20 > #Square brackets [] placed after a vector will instruct R that we are interested in only a part of the vector. In the example above, we are referring to the fifth position in the vector

17
Part one: Objects #Try these vector manipulations as well: > y[1]=y[2] > y [1] 11 11 12 13 0 15 16 17 18 19 20 > #The value of the first position was changed to be the same as the value in the second position > y[c(1,3,5)]=5 > y [1] 5 11 5 13 5 15 16 17 18 19 20 > #The values in the first, third and fifth positions were made equal to 5 #Try these vector manipulations as well: > y[1]=y[2] > y [1] 11 11 12 13 0 15 16 17 18 19 20 > #The value of the first position was changed to be the same as the value in the second position > y[c(1,3,5)]=5 > y [1] 5 11 5 13 5 15 16 17 18 19 20 > #The values in the first, third and fifth positions were made equal to 5

18
Part one: Objects #Onward! We will make a new object, a two-dimensional matrix, and call it z #Our matrix will have ten rows and ten columns, and we will start out by filling all the cells with 0 > z=matrix(0,ncol=10,nrow=10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > #Onward! We will make a new object, a two-dimensional matrix, and call it z #Our matrix will have ten rows and ten columns, and we will start out by filling all the cells with 0 > z=matrix(0,ncol=10,nrow=10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 >

19
Part one: Objects #We can replace parts of our matrix, like we did with our vector > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > z[1,3]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 33 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 #Here, the two numbers inside the square brackets are a coordinate for the matrix: first row, third column #We can replace parts of our matrix, like we did with our vector > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > z[1,3]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 33 0 0 0 0 0 0 0 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 #Here, the two numbers inside the square brackets are a coordinate for the matrix: first row, third column

20
Part one: Objects #We can replace an entire row by not providing a column coordinate > z[1,]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 33 33 33 33 33 33 33 33 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > #Likewise, we can replace an entire column > z[,3]=c(1:10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 1 33 33 33 33 33 33 33 [2,] 0 0 2 0 0 0 0 0 0 0 [3,] 0 0 3 0 0 0 0 0 0 0 [4,] 0 0 4 0 0 0 0 0 0 0 [5,] 0 0 5 0 0 0 0 0 0 0 [6,] 0 0 6 0 0 0 0 0 0 0 [7,] 0 0 7 0 0 0 0 0 0 0 [8,] 0 0 8 0 0 0 0 0 0 0 [9,] 0 0 9 0 0 0 0 0 0 0 [10,] 0 0 10 0 0 0 0 0 0 0 > #We can replace an entire row by not providing a column coordinate > z[1,]=33 > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 33 33 33 33 33 33 33 33 [2,] 0 0 0 0 0 0 0 0 0 0 [3,] 0 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 0 0 0 0 0 [5,] 0 0 0 0 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 > #Likewise, we can replace an entire column > z[,3]=c(1:10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 33 33 1 33 33 33 33 33 33 33 [2,] 0 0 2 0 0 0 0 0 0 0 [3,] 0 0 3 0 0 0 0 0 0 0 [4,] 0 0 4 0 0 0 0 0 0 0 [5,] 0 0 5 0 0 0 0 0 0 0 [6,] 0 0 6 0 0 0 0 0 0 0 [7,] 0 0 7 0 0 0 0 0 0 0 [8,] 0 0 8 0 0 0 0 0 0 0 [9,] 0 0 9 0 0 0 0 0 0 0 [10,] 0 0 10 0 0 0 0 0 0 0 >

21
Part one: Objects #Lastly, we will make a character array, which is like a vector or a matrix except that it can hold numbers and letters > w=matrix("df",ncol=10,nrow=10) > w [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [2,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [3,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [4,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [5,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [6,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [7,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [8,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [9,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [10,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" > #So, this covers the basics of creating objects for storing data in R. #Lastly, we will make a character array, which is like a vector or a matrix except that it can hold numbers and letters > w=matrix("df",ncol=10,nrow=10) > w [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [2,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [3,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [4,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [5,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [6,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [7,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [8,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [9,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" [10,] "df" "df" "df" "df" "df" "df" "df" "df" "df" "df" > #So, this covers the basics of creating objects for storing data in R.

22
Part one: Objects #Let’s clean out the objects that we made in Part One > ls() [1] "w" "x" "y" "z" > #The list objects command ls() will show us which objects are stored in R #We can permanently remove a specific object with the rm() function > rm(x) > ls() [1] "w" "y" "z" > #We can also remove all objects > rm(list = ls()) > ls() > character(0) #Let’s clean out the objects that we made in Part One > ls() [1] "w" "x" "y" "z" > #The list objects command ls() will show us which objects are stored in R #We can permanently remove a specific object with the rm() function > rm(x) > ls() [1] "w" "y" "z" > #We can also remove all objects > rm(list = ls()) > ls() > character(0)

23
Part one: Exercise #Make a new matrix object with three columns and seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. #Replace the fourth row of your matrix with your vector. #Make a new matrix object with three columns and seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. #Replace the fourth row of your matrix with your vector.

24
Part one: Exercise #Make a new matrix object with three columns, seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. > daniel=matrix(9,ncol=3,nrow=7) > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 9 9 9 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9 #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. > thomas=c(101,898,-3) > thomas [1] 101 898 -3 #Replace the fourth row of your matrix with your vector. > daniel[4,]=thomas > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 101 898 -3 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9 #Make a new matrix object with three columns, seven rows, and fill every cell with the number 9. Use your first name as the name of the matrix object. > daniel=matrix(9,ncol=3,nrow=7) > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 9 9 9 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9 #Make a new vector object with the numbers 101, 898 and -3. Use your surname as the name of the vector object. > thomas=c(101,898,-3) > thomas [1] 101 898 -3 #Replace the fourth row of your matrix with your vector. > daniel[4,]=thomas > daniel [,1] [,2] [,3] [1,] 9 9 9 [2,] 9 9 9 [3,] 9 9 9 [4,] 101 898 -3 [5,] 9 9 9 [6,] 9 9 9 [7,] 9 9 9

25
HELP! #You can call on the help function if you become lost or unstuck when using R #Can’t remember how to make a matrix? > ?matrix > #You can call on the help function if you become lost or unstuck when using R #Can’t remember how to make a matrix? > ?matrix >

26
Part two: Data manipulation #This will be a worked example for a Student’s T-test for the means of two samples, showcasing the storage and analysis of data in R #This will be a worked example for a Student’s T-test for the means of two samples, showcasing the storage and analysis of data in R

27
Part two: Data manipulation #Make x a vector containing 1000 random numbers > set.seed(1) > x=rnorm(1000) #Make y a vector containing 1000 random numbers > set.seed(100) > y=rnorm(1000) #The random numbers in R are not truly random, they are simply drawn from a pool of data that has many characteristics of random data. Using the set.seed function, we can define a set of ‘random’ numbers for use in our calculations. This will mean that we should all get the same results from our ‘random’ numbers’ #We will use Student’s T-test to see if the mean of x and mean of y are significantly different #Make x a vector containing 1000 random numbers > set.seed(1) > x=rnorm(1000) #Make y a vector containing 1000 random numbers > set.seed(100) > y=rnorm(1000) #The random numbers in R are not truly random, they are simply drawn from a pool of data that has many characteristics of random data. Using the set.seed function, we can define a set of ‘random’ numbers for use in our calculations. This will mean that we should all get the same results from our ‘random’ numbers’ #We will use Student’s T-test to see if the mean of x and mean of y are significantly different

28
Part two: Data manipulation #What are the assumptions for a T-test? #1) That the two samples ( x and y ) are each normally distributed #2) That the two samples have the same variance #3) That the two samples are independent #These are calculated data so we will assume that 3) is true. #We should test 1) and 2) if we want our T-test results to be meaningful! #What are the assumptions for a T-test? #1) That the two samples ( x and y ) are each normally distributed #2) That the two samples have the same variance #3) That the two samples are independent #These are calculated data so we will assume that 3) is true. #We should test 1) and 2) if we want our T-test results to be meaningful!

29
Part two: Data manipulation #We will use the Shapiro-Wilk 1 test to see if the data are normally distributed #The Shapiro-Wilk test calculates a normality statistic (W) and tests the hypothesis that the data are normal #We would reject the null hypothesis for our sample if we received a p-value of <0.05 #To perform a Shapiro-Wilk test in R we use the shapiro.test function > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9988, p-value = 0.7256 > > shapiro.test(y) Shapiro-Wilk normality test data: y W = 0.9993, p-value = 0.9765 1 Shapiro SS & Wilk MB. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591–611 #We will use the Shapiro-Wilk 1 test to see if the data are normally distributed #The Shapiro-Wilk test calculates a normality statistic (W) and tests the hypothesis that the data are normal #We would reject the null hypothesis for our sample if we received a p-value of <0.05 #To perform a Shapiro-Wilk test in R we use the shapiro.test function > shapiro.test(x) Shapiro-Wilk normality test data: x W = 0.9988, p-value = 0.7256 > > shapiro.test(y) Shapiro-Wilk normality test data: y W = 0.9993, p-value = 0.9765 1 Shapiro SS & Wilk MB. 1965. An analysis of variance test for normality (complete samples). Biometrika 52: 591–611

30
Part two: Data manipulation #We will use an F-test 1 to see if x and y have equal variances #The null hypothesis of this F-test is that the two datasets have equal variances, and this hypothesis is rejected if the p-value is <0.05 #We calculate an F-test for equal variances in R using the var.test function > var.test(x,y) F test to compare two variances data: x and y F = 1.0084, num df = 999, denom df = 999, p-value = 0.8947 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.890733 1.141648 sample estimates: ratio of variances 1.008417 1 Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335. #We will use an F-test 1 to see if x and y have equal variances #The null hypothesis of this F-test is that the two datasets have equal variances, and this hypothesis is rejected if the p-value is <0.05 #We calculate an F-test for equal variances in R using the var.test function > var.test(x,y) F test to compare two variances data: x and y F = 1.0084, num df = 999, denom df = 999, p-value = 0.8947 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.890733 1.141648 sample estimates: ratio of variances 1.008417 1 Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

31
Part two: Data manipulation #Are your x and y normally distributed? (hint… mine are) #Do your x and y have equal variances? (hint… mine do) #Are your x and y normally distributed? (hint… mine are) #Do your x and y have equal variances? (hint… mine do)

32
Part two: Data manipulation #Let’s perform the Student’s T-test and see if the mean of x and the mean of y are significantly different #We will use a simple form of the t.test function. This test requires three pieces of information: x, y, and information about equal variance > t.test(x,y,var.equal=TRUE) Two Sample t-test data: x and y t = -0.6161, df = 1998, p-value = 0.5379 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.11903134 0.06212487 sample estimates: mean of x mean of y -0.01164814 0.01680509 #The null hypothesis for this test is that x and y have the same mean value. The significance level was set at 0.95, so the rejection criteria would be a p-value less than 0.05. Did we reject the null hypothesis? #Let’s perform the Student’s T-test and see if the mean of x and the mean of y are significantly different #We will use a simple form of the t.test function. This test requires three pieces of information: x, y, and information about equal variance > t.test(x,y,var.equal=TRUE) Two Sample t-test data: x and y t = -0.6161, df = 1998, p-value = 0.5379 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.11903134 0.06212487 sample estimates: mean of x mean of y -0.01164814 0.01680509 #The null hypothesis for this test is that x and y have the same mean value. The significance level was set at 0.95, so the rejection criteria would be a p-value less than 0.05. Did we reject the null hypothesis?

33
Part two: Exercise #Generate vector objects a and b as below > set.seed(10) > a=rnorm(1000,sd=2) > set.seed(50) > b=rnorm(1000,sd=1) #Is the mean of a significantly different from the mean of b ? Is it appropriate to use a Student’s T-test to address this question? #Generate vector objects a and b as below > set.seed(10) > a=rnorm(1000,sd=2) > set.seed(50) > b=rnorm(1000,sd=1) #Is the mean of a significantly different from the mean of b ? Is it appropriate to use a Student’s T-test to address this question?

34
Part two: Exercise > shapiro.test(a) Shapiro-Wilk normality test data: a W = 0.9979, p-value = 0.2538 > shapiro.test(b) Shapiro-Wilk normality test data: b W = 0.9978, p-value = 0.2242 > var.test(a,b) F test to compare two variances data: a and b F = 3.7431, num df = 999, denom df = 999, p-value < 2.2e-16 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 3.306307 4.237678 sample estimates: ratio of variances 3.743136 > shapiro.test(a) Shapiro-Wilk normality test data: a W = 0.9979, p-value = 0.2538 > shapiro.test(b) Shapiro-Wilk normality test data: b W = 0.9978, p-value = 0.2242 > var.test(a,b) F test to compare two variances data: a and b F = 3.7431, num df = 999, denom df = 999, p-value < 2.2e-16 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 3.306307 4.237678 sample estimates: ratio of variances 3.743136 > t.test(a,b,var.equal=F) Welch Two Sample t-test data: a and b t = 0.3949, df = 1497.218, p-value = 0.693 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1106290 0.1663946 sample estimates: mean of x mean of y 0.022749483 -0.005133326 > Is the mean of a different from the mean of b ? p-value = 0.693 Fail to reject the null hypothesis that the means are different. > t.test(a,b,var.equal=F) Welch Two Sample t-test data: a and b t = 0.3949, df = 1497.218, p-value = 0.693 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1106290 0.1663946 sample estimates: mean of x mean of y 0.022749483 -0.005133326 > Is the mean of a different from the mean of b ? p-value = 0.693 Fail to reject the null hypothesis that the means are different.

35
Part three: External data #Datasets can often be too large to type into R. This section of the guide will show you how to automatically read data into R and then perform an analysis #For this test we will perform a one-way analysis of variance (ANOVA) #Datasets can often be too large to type into R. This section of the guide will show you how to automatically read data into R and then perform an analysis #For this test we will perform a one-way analysis of variance (ANOVA) #Right click on the dataset embedded above the arrow, move the mouse to ‘Macro- Enabled Worksheet Object’, click Open, and then save the table as IUCN.csv (a comma separated values file) to a folder on your computer #Right click on the dataset embedded above the arrow, move the mouse to ‘Macro- Enabled Worksheet Object’, click Open, and then save the table as IUCN.csv (a comma separated values file) to a folder on your computer #The dataset contains a count of endangered species for sixty randomly selected countries in three different regions. These data have been extracted from Table 6a of the IUCN Red List summary statistics: http://www.iucnredlist.org/documents/summarystatistics/2010_3RL_Stats_Table_6a.pdf #The dataset contains a count of endangered species for sixty randomly selected countries in three different regions. These data have been extracted from Table 6a of the IUCN Red List summary statistics: http://www.iucnredlist.org/documents/summarystatistics/2010_3RL_Stats_Table_6a.pdf

36
Part three: External data #We are going to use a one-way ANOVA to see if the mean number of endangered species is different in different regions (AFRICA, ASIA and EUROPE). #First step: we will now tell R where to look for the file, using the setwd() function > setwd("H:/Projects/Teaching/R") #Hint: your working directory will be different to mine #Note: we use forwardslashes / and not backslashes \ #Second step: we read the file into R as a new object called IUCN. The term sep="," is used because values in the dataset are separated by commas. The term header=T is used because the first row of the IUCN table contains column names > IUCN=read.table("IUCN.csv",sep=",",header=T) #Alternatively, if we know the full file path, then we could read the file into R without using setwd() > IUCN=read.table("H:/Projects/Teaching/R/IUCN.csv",sep=",",header=T) #We are going to use a one-way ANOVA to see if the mean number of endangered species is different in different regions (AFRICA, ASIA and EUROPE). #First step: we will now tell R where to look for the file, using the setwd() function > setwd("H:/Projects/Teaching/R") #Hint: your working directory will be different to mine #Note: we use forwardslashes / and not backslashes \ #Second step: we read the file into R as a new object called IUCN. The term sep="," is used because values in the dataset are separated by commas. The term header=T is used because the first row of the IUCN table contains column names > IUCN=read.table("IUCN.csv",sep=",",header=T) #Alternatively, if we know the full file path, then we could read the file into R without using setwd() > IUCN=read.table("H:/Projects/Teaching/R/IUCN.csv",sep=",",header=T)

37
Part three: External data #What are the assumptions for a one-way ANOVA? #1) That the data in each group have been randomly selected from a normal distribution #2) That each group of data have the same variance #3) That each group of data is independent #Assumption 3) may be unlikely but we will assume it is true. #We should test 1) and 2) if we want our ANOVA results to be meaningful! #What are the assumptions for a one-way ANOVA? #1) That the data in each group have been randomly selected from a normal distribution #2) That each group of data have the same variance #3) That each group of data is independent #Assumption 3) may be unlikely but we will assume it is true. #We should test 1) and 2) if we want our ANOVA results to be meaningful!

38
Part three: External data #We will use the Shapiro-Wilk test to see if the data from each region (AFRICA, ASIA and EUROPE) and are normally distributed #First though, we will separate out the data for each region so that we can test for normality separately > af=IUCN[which(IUCN[,2]=="AFRICA"),3] #Let’s take a closer look: IUCN[,2] calls up the second column of the IUCN object #The which() function is asking ‘which of the values in column 2 of the IUCN object contain the word “AFRICA”? which(IUCN[,2]=="AFRICA"). This give us the Africa row values. #Now we can use the Africa row values to find the number of Endangered species for each African country. These species counts are stored in column 3 of the IUCN object. IUCN[which(IUCN[,2]=="AFRICA"),3] #Now we store the endangered species counts for African countries as the af object af=IUCN[which(IUCN[,2]=="AFRICA"),3] #We will use the Shapiro-Wilk test to see if the data from each region (AFRICA, ASIA and EUROPE) and are normally distributed #First though, we will separate out the data for each region so that we can test for normality separately > af=IUCN[which(IUCN[,2]=="AFRICA"),3] #Let’s take a closer look: IUCN[,2] calls up the second column of the IUCN object #The which() function is asking ‘which of the values in column 2 of the IUCN object contain the word “AFRICA”? which(IUCN[,2]=="AFRICA"). This give us the Africa row values. #Now we can use the Africa row values to find the number of Endangered species for each African country. These species counts are stored in column 3 of the IUCN object. IUCN[which(IUCN[,2]=="AFRICA"),3] #Now we store the endangered species counts for African countries as the af object af=IUCN[which(IUCN[,2]=="AFRICA"),3]

39
Part three: External data #Repeat for ASIA and EUROPE > ai=IUCN[which(IUCN[,2]=="ASIA"),3] > eu=IUCN[which(IUCN[,2]=="EUROPE"),3] #Repeat for ASIA and EUROPE > ai=IUCN[which(IUCN[,2]=="ASIA"),3] > eu=IUCN[which(IUCN[,2]=="EUROPE"),3]

40
Part three: External data #We will use a Bartlett Test of Homogeneity of Variances 1 to test if variance is equal across our three groups (AFRICA, ASIA, EUROPE). #The function for the Bartlett test is simply Bartlett.test(). The terms for this function will be the Endangered species column of the IUCN object, and the Region column of the IUCN object. Column 3 and column 2 respectively. #A Bartlett operates similar to an F-test. The null hypothesis for this Bartlett-test is that the groups have equal variances. #We would reject the null hypothesis for our dataset if we received a p-value of <0.05. > bartlett.test(IUCN[,3]~IUCN[,2]) Bartlett test of homogeneity of variances data: IUCN[, 3] by IUCN[, 2] Bartlett's K-squared = 11.6261, df = 2, p-value = 0.002988 1 1Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335. #We will use a Bartlett Test of Homogeneity of Variances 1 to test if variance is equal across our three groups (AFRICA, ASIA, EUROPE). #The function for the Bartlett test is simply Bartlett.test(). The terms for this function will be the Endangered species column of the IUCN object, and the Region column of the IUCN object. Column 3 and column 2 respectively. #A Bartlett operates similar to an F-test. The null hypothesis for this Bartlett-test is that the groups have equal variances. #We would reject the null hypothesis for our dataset if we received a p-value of <0.05. > bartlett.test(IUCN[,3]~IUCN[,2]) Bartlett test of homogeneity of variances data: IUCN[, 3] by IUCN[, 2] Bartlett's K-squared = 11.6261, df = 2, p-value = 0.002988 1 1Box, G.E.P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4): 318–335.

41
Part three: External data #Here we reject the null hypothesis – at least Region has a variance that is not equal to the variance of another Region in the dataset. #Our dataset does not satisfy the second assumption of the ANOVA. We can still proceed however. #The ANOVA test is robust to violations of this second assumption. This means that it can still produce meaningful results even if the groups do not have equal variances. As a rule of thumb, we can proceed if the maximum variance of our groups is less than 4 times greater than the minimum variance of our groups. > var(af) [1] 25.07692 > var(ai) [1] 9.002849 > var(eu) [1] 7.464387 > #The variance of the number of endangered species in Africa is substantially greater than the other two variance values. However, the Africa group variance is less than 4 time the variance of the Europe group > var(eu)<4*var(af) [1] TRUE #So, we will proceed, but we need to be aware that with unequal variances is will be tougher for an analysis of variance to find a significant result. #Here we reject the null hypothesis – at least Region has a variance that is not equal to the variance of another Region in the dataset. #Our dataset does not satisfy the second assumption of the ANOVA. We can still proceed however. #The ANOVA test is robust to violations of this second assumption. This means that it can still produce meaningful results even if the groups do not have equal variances. As a rule of thumb, we can proceed if the maximum variance of our groups is less than 4 times greater than the minimum variance of our groups. > var(af) [1] 25.07692 > var(ai) [1] 9.002849 > var(eu) [1] 7.464387 > #The variance of the number of endangered species in Africa is substantially greater than the other two variance values. However, the Africa group variance is less than 4 time the variance of the Europe group > var(eu)<4*var(af) [1] TRUE #So, we will proceed, but we need to be aware that with unequal variances is will be tougher for an analysis of variance to find a significant result.

42
Part three: External data #Perform the one-way ANOVA using the aov() function with the following syntax, and store the results as an object called IUCN_ANOVA > IUCN_ANOVA=aov(Endangered_species~Region,data=IUCN) #You can see the ANOVA results by calling up the IUCN_ANOVA object > IUCN_ANOVA Call: aov(formula = Endangered_species ~ Region, data = IUCN) Terms: Region Residuals Sum of Squares 703.284 1080.148 Deg. of Freedom 2 78 Residual standard error: 3.721297 Estimated effects may be unbalanced > #Perform the one-way ANOVA using the aov() function with the following syntax, and store the results as an object called IUCN_ANOVA > IUCN_ANOVA=aov(Endangered_species~Region,data=IUCN) #You can see the ANOVA results by calling up the IUCN_ANOVA object > IUCN_ANOVA Call: aov(formula = Endangered_species ~ Region, data = IUCN) Terms: Region Residuals Sum of Squares 703.284 1080.148 Deg. of Freedom 2 78 Residual standard error: 3.721297 Estimated effects may be unbalanced >

43
Part three: External data #Use the summary() function to find out more about the ANOVA > summary(IUCN_ANOVA) Df Sum Sq Mean Sq F value Pr(>F) Region 2 703.3 351.6 25.39 3.21e-09 *** Residuals 78 1080.1 13.8 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > #Interpretation: How do we read this table to find out if the mean number of endangered species is different in different regions? #The null hypothesis for this test is that the mean number of endangered species is the same in each region. We would reject this null hypothesis if the p-value (i.e. Pr(>F)) is less than the significance level for this test (i.e. <0.05). So, we reject the null hypothesis, and conclude that the mean number of endangered species is significantly different between regions. #Use the summary() function to find out more about the ANOVA > summary(IUCN_ANOVA) Df Sum Sq Mean Sq F value Pr(>F) Region 2 703.3 351.6 25.39 3.21e-09 *** Residuals 78 1080.1 13.8 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > #Interpretation: How do we read this table to find out if the mean number of endangered species is different in different regions? #The null hypothesis for this test is that the mean number of endangered species is the same in each region. We would reject this null hypothesis if the p-value (i.e. Pr(>F)) is less than the significance level for this test (i.e. <0.05). So, we reject the null hypothesis, and conclude that the mean number of endangered species is significantly different between regions.

44
Part three: External data #Are the number of endangered animals different between all regions, or just different for one region? To find out we will use Tukey’s Honest Significant Difference test. #The function for Tukey’s HSD is simply TukeyHSD(). The test uses the following syntax > TukeyHSD(IUCN_ANOVA,"Region") Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Endangered_species ~ Region, data = IUCN) $Region diff lwr upr p adj ASIA-AFRICA -4.185185 -6.605050 -1.7653208 0.0002620 EUROPE-AFRICA -7.185185 -9.605050 -4.7653208 0.0000000 EUROPE-ASIA -3.000000 -5.419864 -0.5801356 0.0111684 #Tukey’s HSD provides a pairwise test of each group in the ANOVA. Any Region pair with a p adj value <0.05 had a significantly different number of endangered species. #Are the number of endangered animals different between all regions, or just different for one region? To find out we will use Tukey’s Honest Significant Difference test. #The function for Tukey’s HSD is simply TukeyHSD(). The test uses the following syntax > TukeyHSD(IUCN_ANOVA,"Region") Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = Endangered_species ~ Region, data = IUCN) $Region diff lwr upr p adj ASIA-AFRICA -4.185185 -6.605050 -1.7653208 0.0002620 EUROPE-AFRICA -7.185185 -9.605050 -4.7653208 0.0000000 EUROPE-ASIA -3.000000 -5.419864 -0.5801356 0.0111684 #Tukey’s HSD provides a pairwise test of each group in the ANOVA. Any Region pair with a p adj value <0.05 had a significantly different number of endangered species.

45
Part three: External data #Bonus: Let’s plot our IUCN data to better visualise these results > boxplot(Endangered_species~Region,data=IUCN) #Bonus: Let’s plot our IUCN data to better visualise these results > boxplot(Endangered_species~Region,data=IUCN) mean upper quartile lower quartile minimum (excl. outliers) maximum (excl. outliers) Outlier

46
Part Three: Exercise #Plotting basics #To quickly generate a plot in R using only default options, simply use the plot() function. > plot(af) > #There are many variables that you change to improve the look of your plots plot(af,xlab="Country",main="Africa",col=rainbow(100),p ch=16,ylab="Endangered species (number)",cex=2,font=6) barplot(af,col="red",names.arg=IUCN[which(IUCN[,2]=="AF RICA"),1],las=2,ylab="Endangered species (count)",main="Africa") #Use ?plot and ?barplot to learn about the variables you can change when plotting data #Plotting basics #To quickly generate a plot in R using only default options, simply use the plot() function. > plot(af) > #There are many variables that you change to improve the look of your plots plot(af,xlab="Country",main="Africa",col=rainbow(100),p ch=16,ylab="Endangered species (number)",cex=2,font=6) barplot(af,col="red",names.arg=IUCN[which(IUCN[,2]=="AF RICA"),1],las=2,ylab="Endangered species (count)",main="Africa") #Use ?plot and ?barplot to learn about the variables you can change when plotting data

47
Part four: Packages and libraries #You have been using some of the basic functions that are packaged with R, and you have been either generating or importing datasets #Anyone can write a new function in R though, or make a dataset, and these functions and datasets can be bundled together into a package #R is modular, which means you can download and install new packages to give you access to new functions and/or datasets #There is an automatic and a manual method for installing packages. This guide will teach you how to manually install packages in R #Why the manual method you ask? Because R requires internet access to download packages, which can be complicated by a University proxy. I can’t guarantee that the proxy won’t be an issue. That’s why. Well that, and it will be good for you. #You have been using some of the basic functions that are packaged with R, and you have been either generating or importing datasets #Anyone can write a new function in R though, or make a dataset, and these functions and datasets can be bundled together into a package #R is modular, which means you can download and install new packages to give you access to new functions and/or datasets #There is an automatic and a manual method for installing packages. This guide will teach you how to manually install packages in R #Why the manual method you ask? Because R requires internet access to download packages, which can be complicated by a University proxy. I can’t guarantee that the proxy won’t be an issue. That’s why. Well that, and it will be good for you.

48
Part four: Packages and libraries #This will be an exercise in downloading the ‘Analyses of Phylogenetics and Evolution’ package, first written by Emmanuel Paradis in 2008 #The abbreviation for this package is ape #This will be an exercise in downloading the ‘Analyses of Phylogenetics and Evolution’ package, first written by Emmanuel Paradis in 2008 #The abbreviation for this package is ape

49
Part four: Packages and libraries #Open a web browser and enter http://cran.r-project.org/web/packages/ape/index.htmlhttp://cran.r-project.org/web/packages/ape/index.html into the address bar – go to the website. The page should be mostly black text on a white background. #Find the Downloads section towards the bottom of the website. #For mac users: download the Mac OS X binary (ape_3.1-4.tgz) #For PC users: download the Windows binary (ape_3.1-4.zip) #For UNIX users: again, seek professional help #Save the ape_3.1-4.xxx file somewhere on your computer that you can easily find #Note to future users: the file name may be slightly different if Paradis has updated ape #Open a web browser and enter http://cran.r-project.org/web/packages/ape/index.htmlhttp://cran.r-project.org/web/packages/ape/index.html into the address bar – go to the website. The page should be mostly black text on a white background. #Find the Downloads section towards the bottom of the website. #For mac users: download the Mac OS X binary (ape_3.1-4.tgz) #For PC users: download the Windows binary (ape_3.1-4.zip) #For UNIX users: again, seek professional help #Save the ape_3.1-4.xxx file somewhere on your computer that you can easily find #Note to future users: the file name may be slightly different if Paradis has updated ape

50
Part four: Packages and libraries #Run R #Use the install.packages function with the following syntax to install the ape package > install.packages("H:/Teaching/ape_3.1-4.zip") #Remember to replace my file path “ H:/Teaching/” with the file path of the folder where you downloaded the ape package #You should see text like this appear after you enter the install.packages command Installing package into ‘C:/Documents/R/win-library/3.1’ (as ‘lib’ is unspecified) inferring 'repos = NULL' from the file name package ‘ape’ successfully unpacked and MD5 sums checked #Congratulations, you have now added functions and datasets written by Emmanuel Paradis to your own copy of R #Run R #Use the install.packages function with the following syntax to install the ape package > install.packages("H:/Teaching/ape_3.1-4.zip") #Remember to replace my file path “ H:/Teaching/” with the file path of the folder where you downloaded the ape package #You should see text like this appear after you enter the install.packages command Installing package into ‘C:/Documents/R/win-library/3.1’ (as ‘lib’ is unspecified) inferring 'repos = NULL' from the file name package ‘ape’ successfully unpacked and MD5 sums checked #Congratulations, you have now added functions and datasets written by Emmanuel Paradis to your own copy of R

51
Part four: Packages and libraries #You only need to install a package into R once. The package is now available as a ‘library’. If you want to use the ape library in your current R session, then you need to load the library into R > library(ape) > #So, you install a package once, and load a library many times (every time you run R and want to use the library) #The ape library is now available for youto use. Ape is a library of datasets and tools that have been designed around phylogenetic analyses. We quickly will explore some of the data and functions in ape: > data(bird.orders) #The data function loads a dataset into R. Here we have loaded the bird orders dataset that is part of the ape library #You only need to install a package into R once. The package is now available as a ‘library’. If you want to use the ape library in your current R session, then you need to load the library into R > library(ape) > #So, you install a package once, and load a library many times (every time you run R and want to use the library) #The ape library is now available for youto use. Ape is a library of datasets and tools that have been designed around phylogenetic analyses. We quickly will explore some of the data and functions in ape: > data(bird.orders) #The data function loads a dataset into R. Here we have loaded the bird orders dataset that is part of the ape library

52
Part four: Packages and libraries > plot(bird.orders) #The plot function detects that bird.orders is a special type of object – it is a ‘ phylo ’ class of object. This type of object is a different object class from the vectors, matrices and data frames that we have been working with #The ape library has a special plot function for plotting ‘ phylo ’ objects. This special plot function replaced the normal plot function when we tried to plot bird.order s. #Don’t worry! All of this happened automatically because we installed the ape package > plot(bird.orders) #The plot function detects that bird.orders is a special type of object – it is a ‘ phylo ’ class of object. This type of object is a different object class from the vectors, matrices and data frames that we have been working with #The ape library has a special plot function for plotting ‘ phylo ’ objects. This special plot function replaced the normal plot function when we tried to plot bird.order s. #Don’t worry! All of this happened automatically because we installed the ape package

53
Part four: Packages and libraries #Test: Use the ? (help) function for plot.phylo to learn how to plot the bird.orders dataset as a fan, as below > ?plot.phylo #Test: Use the ? (help) function for plot.phylo to learn how to plot the bird.orders dataset as a fan, as below > ?plot.phylo

54
Part Three: Exercise #Download, install and load two packages: ggplot2 and labeling #Get the packages using Google ‘r ggplot2 cran’ and ‘r labeling cran’ or use the links below http://cran.r-project.org/web/packages/labeling/index.html http://cran.r-project.org/web/packages/ggplot2/index.html #Use the new data and functions provided by these packages to plot the density of diamonds against their weight (carat). > qplot(carat, data = diamonds, geom = "density", colour = color) > #For more information on ggplot see http://ggplot2.org/book/qplot.pdfhttp://ggplot2.org/book/qplot.pdf #Download, install and load two packages: ggplot2 and labeling #Get the packages using Google ‘r ggplot2 cran’ and ‘r labeling cran’ or use the links below http://cran.r-project.org/web/packages/labeling/index.html http://cran.r-project.org/web/packages/ggplot2/index.html #Use the new data and functions provided by these packages to plot the density of diamonds against their weight (carat). > qplot(carat, data = diamonds, geom = "density", colour = color) > #For more information on ggplot see http://ggplot2.org/book/qplot.pdfhttp://ggplot2.org/book/qplot.pdf

55
Part five: Scripts #One of the best features of R is the ability to automatically carry out many commands, one after another. For this type of operation we would first write all of our commands into a script, and then enter the entire script into R in one action #We are going to use previously scripted code for this section of the guide. Our script will generate, analyse and plot some data. #Go ahead and open this embedded text file by right clicking on it and clicking ‘Packager Shell Object Object’ ‘Activate Contents’ #One of the best features of R is the ability to automatically carry out many commands, one after another. For this type of operation we would first write all of our commands into a script, and then enter the entire script into R in one action #We are going to use previously scripted code for this section of the guide. Our script will generate, analyse and plot some data. #Go ahead and open this embedded text file by right clicking on it and clicking ‘Packager Shell Object Object’ ‘Activate Contents’ #Copy the entire contents of this notepad document and paste it all into R #Now, read through the notepad document to find out what has taken place #Copy the entire contents of this notepad document and paste it all into R #Now, read through the notepad document to find out what has taken place

56
Part six: Logic (programming) #There are many functions in R that do more than just basic mathematical operations #We have seen one already, the which() function. This function looked through an object to find a particular value that we wanted. > which(IUCN[,2]==“AFRICA”) #Here we will focus on loops, which we access using the for() function. #A loop is written as follows > for(i in 1:10){ } # for starts the loop # i is a value that will be updated as the loop iterates # 1 is the starting value for i # 10 is the final value for i #The curly brackets {} enclose the calculations that are looped #There are many functions in R that do more than just basic mathematical operations #We have seen one already, the which() function. This function looked through an object to find a particular value that we wanted. > which(IUCN[,2]==“AFRICA”) #Here we will focus on loops, which we access using the for() function. #A loop is written as follows > for(i in 1:10){ } # for starts the loop # i is a value that will be updated as the loop iterates # 1 is the starting value for i # 10 is the final value for i #The curly brackets {} enclose the calculations that are looped

57
Part six: Logic (programming) #Make j = 1 > j=1 #We will use a loop to increase the value of j by i through ten iterations > for(i in 1:10){ j+i } #We don’t get to see what happens inside a loop unless we specifically ask for it > for(i in 1:10){ + print(j+i) + } [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 [1] 11 > #Make j = 1 > j=1 #We will use a loop to increase the value of j by i through ten iterations > for(i in 1:10){ j+i } #We don’t get to see what happens inside a loop unless we specifically ask for it > for(i in 1:10){ + print(j+i) + } [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 [1] 11 >

58
Part six: Logic (programming) #What is the new value of j ? j is still 1, because we did not store the changed value. > for(i in 1:10){ + j=j+1 + } > j [1] 11 # j is now equal to 11. How did that happen? > j=1 > for(i in 1:10){ + j=j+1 + print(j) + } [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 [1] 11 #What is the new value of j ? j is still 1, because we did not store the changed value. > for(i in 1:10){ + j=j+1 + } > j [1] 11 # j is now equal to 11. How did that happen? > j=1 > for(i in 1:10){ + j=j+1 + print(j) + } [1] 2 [1] 3 [1] 4 [1] 5 [1] 6 [1] 7 [1] 8 [1] 9 [1] 10 [1] 11

59
Part six: Exercise #Make a vector of ten random numbers #Using a loop, add 100 to each number in the vector, in sequence. For example, in the first iteration of your loop you will add 100 to the first value of your vector, in the second iteration of your loop you will add 100 to the second value of your vector, and so on. #Make a vector of ten random numbers #Using a loop, add 100 to each number in the vector, in sequence. For example, in the first iteration of your loop you will add 100 to the first value of your vector, in the second iteration of your loop you will add 100 to the second value of your vector, and so on.

60
Part six: Exercise > x=rnorm(10) > x [1] -0.81673186 0.35409408 0.69619606 -2.04003445 - 1.02832503 -0.31418186 [7] 0.09717105 0.78778455 -0.15048025 1.86026573 > > for(i in 1:length(x)){ + x[i]=x[i]+100 + } > x [1] 99.18327 100.35409 100.69620 97.95997 98.97167 99.68582 100.09717 [8] 100.78778 99.84952 101.86027 > > x=rnorm(10) > x [1] -0.81673186 0.35409408 0.69619606 -2.04003445 - 1.02832503 -0.31418186 [7] 0.09717105 0.78778455 -0.15048025 1.86026573 > > for(i in 1:length(x)){ + x[i]=x[i]+100 + } > x [1] 99.18327 100.35409 100.69620 97.95997 98.97167 99.68582 100.09717 [8] 100.78778 99.84952 101.86027 >

61
Department of Conservation Reference: 10039929 Photograph by Chris Smuts-Kennedy Grid of monitored stations How far does a Duvaucel's gecko travel after release? Methods: Record the grid coordinates of the station where the gecko is released Each day for three subsequent days measure the grid coordinates of the station where the gecko is found Calculate the distance between recorded stations 10 m by 10 m grid 1 m

62
#Step one: Set up the monitoring grid data for each day. 0 means that the gecko was not observed in that grid cell, 1 means that the gecko was observed in that grid cell. #Release day set.seed(1) d0=rep(0,100) d0[round(runif(1,min=0,max=100))]=1 day.zero=matrix(d0,ncol=10,nrow=10) #Day one check set.seed(2) d1=rep(0,100) d1[round(runif(1,min=0,max=100))]=1 day.one=matrix(d1,ncol=10,nrow=10) #Day two check set.seed(3) d2=rep(0,100) d2[round(runif(1,min=0,max=100))]=1 day.two=matrix(d2,ncol=10,nrow=10) #Day three check set.seed(4) d3=rep(0,100) d3[round(runif(1,min=0,max=100))]=1 day.three=matrix(d3,ncol=10,nrow=10) #Step one: Set up the monitoring grid data for each day. 0 means that the gecko was not observed in that grid cell, 1 means that the gecko was observed in that grid cell. #Release day set.seed(1) d0=rep(0,100) d0[round(runif(1,min=0,max=100))]=1 day.zero=matrix(d0,ncol=10,nrow=10) #Day one check set.seed(2) d1=rep(0,100) d1[round(runif(1,min=0,max=100))]=1 day.one=matrix(d1,ncol=10,nrow=10) #Day two check set.seed(3) d2=rep(0,100) d2[round(runif(1,min=0,max=100))]=1 day.two=matrix(d2,ncol=10,nrow=10) #Day three check set.seed(4) d3=rep(0,100) d3[round(runif(1,min=0,max=100))]=1 day.three=matrix(d3,ncol=10,nrow=10)

63
#Step two: Combine all of the grid data into one list. This will help us quickly analyse the data as a single batch. days=list(day.zero,day.one,day.two,day.three) #Step three: Create a matrix where we will store the grid locations for the gecko location, and calculate the daily distance. movement=matrix(0,ncol=3,nrow=length(days)) colnames(movement)=c("Easting","Northing","Displacement (m)") #Step four: Find the grid cell for the location of the gecko on each day and store that information in the movement matrix. for(i in 1:length(days)){ movement[i,1]=which(days[[i]]==1, arr.ind=TRUE)[1] movement[i,2]=which(days[[i]]==1, arr.ind=TRUE)[2] } #Step two: Combine all of the grid data into one list. This will help us quickly analyse the data as a single batch. days=list(day.zero,day.one,day.two,day.three) #Step three: Create a matrix where we will store the grid locations for the gecko location, and calculate the daily distance. movement=matrix(0,ncol=3,nrow=length(days)) colnames(movement)=c("Easting","Northing","Displacement (m)") #Step four: Find the grid cell for the location of the gecko on each day and store that information in the movement matrix. for(i in 1:length(days)){ movement[i,1]=which(days[[i]]==1, arr.ind=TRUE)[1] movement[i,2]=which(days[[i]]==1, arr.ind=TRUE)[2] }

64
#Step five: Calculate the distance that the gecko travelled each day. for(j in 2:length(days)){ movement[j,3]=sqrt(((abs(movement[j,1]-movement[j- 1,1]))^2)+((abs(movement[j,2]-movement[j-1,2]))^2)) } #Step six: Plot the distance between each station where the gecko was found on each subsequent day. barplot(movement[,3],xlab="Day",ylab="Displament (m)",main="Gecko distance") #Step five: Calculate the distance that the gecko travelled each day. for(j in 2:length(days)){ movement[j,3]=sqrt(((abs(movement[j,1]-movement[j- 1,1]))^2)+((abs(movement[j,2]-movement[j-1,2]))^2)) } #Step six: Plot the distance between each station where the gecko was found on each subsequent day. barplot(movement[,3],xlab="Day",ylab="Displament (m)",main="Gecko distance")

65
Conclusion #By now you should have a good understanding of how to use R #We have covered all of the basic ways of interacting with R: -Storing data -Plotting data -Analysing data with functions -Loading new functions for data analysis #There is so much further you can take this though – your imagination is the limit! #You should think of this tutorial as a quick reference guide to help get you on your feet #You can also check out tutorial videos at illuminatingaotearoa.wordpress.com/zoostar #By now you should have a good understanding of how to use R #We have covered all of the basic ways of interacting with R: -Storing data -Plotting data -Analysing data with functions -Loading new functions for data analysis #There is so much further you can take this though – your imagination is the limit! #You should think of this tutorial as a quick reference guide to help get you on your feet #You can also check out tutorial videos at illuminatingaotearoa.wordpress.com/zoostar

Similar presentations

Presentation is loading. Please wait....

OK

Chapter 12: Analysis of Variance

Chapter 12: Analysis of Variance

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google