Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1.

Similar presentations


Presentation on theme: "Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1."— Presentation transcript:

1 Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1

2 Scope of Introductory Workshop on How to install R platform on your machine How to install R packages and dependencies How to get help and instructions How to use a library Variables and assigning values to variables Data types which R accepts Arithmetic manipulations of variables (+ - * / % ** etc) Browsing and managing your variables (ls, rm) Assigning vectors - the c() command 2 Vector manipulations and referencing Matrices – declaration and manipulation (rows/columns) – rbind Data frames – import from xls/csv/txt files and statistical manipulation Introducing data categorisation using R datatype - Factor Simple graph plotting More statistical analysis Simple example of linear regression Quick Revision Future classes on R

3 What is ? R = software and programming language R is mainly used for statistical analysis and for graphics generation Free Simple and intuitive ??? Available across difference platforms ( Mac, Unix/Linux/ Windows) 3

4 Starting with Installation (administrator rights required) http://www.r-project.org/ 4 Tip: install the latest version (or the last stable version )

5 Starting with Installation 5 http://cran.bic.nus.edu.sg

6 Starting with Installation 6

7 Your very first interface Default prompt in R 7

8 Starting with  Packages Additional functions that are not included within the “base package” Installation (additional packages)  install.packages(“package name”) To use package, type “library(package name)” 8

9 Starting with Confused on R commands, get help  On the GUI  ?(function) or ??(function)  Via WWW  http://cran.r-project.org or http://www.rseek.org/http://cran.r-project.orghttp://www.rseek.org/ 9

10 Fundamentals of Programming Simple data input and manipulation Declaration of object (variable) Take note that object names are case sensitive (i.e. x is different from X) do not contain spaces, numbers or symbols Comprehensible 10

11 Data types Rich set of datatypes in R Commonly encountered datatypes in R Scalars Vectors (numerical, character and logical) Matrices (2D) Arrays (can have more than 2 dimensions) Data frames Lists Factors 11 Previous slide See for example http://www.statmet hods.net/input/dat atypes.html for more details http://www.statmet hods.net/input/dat atypes.html

12 Perform simple manipulations e.g. arithmetic calculations For more built-in R arithmetic functions, visit http://ww2.coastal.edu/king w/statistics/R- tutorials/arithmetic.html http://ww2.coastal.edu/king w/statistics/R- tutorials/arithmetic.html Fundamentals of Programming 12

13 Removing variables when they are not required Use “ls()” to check if object declared is still kept in memory To remove object from memory, do “rm(x)” Fundamentals of Programming 13

14 More complex data inputs Data Vectors  list of objects 1234512345 X (object) X X (vector) Fundamentals of Programming 14

15 Assigning a data vector 1234512345 12345 Fundamentals of Programming x <- c(1,2,3,4,5) 15

16 Define a vector var1 with values 1,2,3 Define a vector var2 with values 4,5,6 What value is var2[4] ? What is the sum of var1 ? What is the R code to assign object subsetvar1 with the first element of var1. What is the product of var1 and var2 ? Experiment for yourself http://www.statmethods.net/input/datatypes.html 16

17 Define a vector var1 with values 1,2,3  var1 <- c(1,2,3) Define a vector var2 with values 4,5,6  var2 <- c(4,5,6) What value is var2[4] ?  NA What is the sum of var1 ?  6 What is the R code to assign object subsetvar1 with the first element of var1.  subsetvar1 <- var1[1] What is the product of var1 and var2 ?  4 10 18 Experiment for yourself 17

18 More complex data structures  Matrices Fundamentals of Programming 138 695 417 651 18

19 Declaring a matrix Fundamentals of Programming 19

20 Simple manipulations of data matrix Fundamentals of Programming 138 695 417 651 123 1 2 3 4 > y [1,] – 1 3 8 > y [,3] – 8 5 7 1 Simple arithmetic manipulations  mean (y) – 4.666667  sum(y[2,]) – 20 Modify and add values  y[4,] <- c(6,2,2)  y <- rbind(y,c(3,9,8)) Tip: Think of rbind as “row combine” 138 695 417 622 138 695 417 622 398 5 20

21 More complex data structures  Data frames NameHeight 1John171cm 2Mary155cm 3Peter165cm Fundamentals of Programming 21

22 Data frames NameHeight 1John171cm 2Mary155cm 3Peter165cm Fundamentals of Programming 22

23 Reading in from input files Fundamentals of Programming 23

24 Simple manipulations with data frames Fundamentals of Programming  head(hfile,1)  summary(hfile) 12 NameHeight 1John171cm 2Mary155cm 3Peter165cm  Create subsets  new <- hfile[1,] 24

25 Simple statistics with R Load file “Sampledata-1.txt” into R studentprofile <- read.table("B://Users/bchhuyng/Desktop/Sampledata- 1.txt",sep="\t",header=TRUE) View the data loaded into R. studentprofile, head(studentprofile) How many categories are there in the field “Gender”? factor(studentprofile$Gender) Fundamentals of Programming 25

26 “factor” function in R  store them as categorical variables Fundamentals of Programming M M M M M M M M M M M M M M M M M M M F F F F F F F F F F F F F F F F F F F F F F F M M M M M 26

27 Usage of factor in plotting graphs Fundamentals of Programming Hu et. al, 2013 27

28 Usage of factor in plotting graphs Fundamentals of Programming 28

29 Calculate the mean and the standard deviation of the height and weight of the students. E.g.mean(studentprofile$Weight) median(studentprofile$Weight) Fundamentals of Programming 29

30 Simple graph plotting with R View the distribution of height and weight of the 100 students ( data from “Sampledata-1.txt” ) plot(studentprofile$Weight, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5) Fundamentals of Programming 30

31 Fundamentals of Programming 31

32 What is the distribution of height and weight amongst students? Fundamentals of Programming hist(studentprofile$Weight,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(40,90), breaks = 51) 32

33 What is the distribution of height and weight amongst students? Fundamentals of Programming hist(studentprofile$Height,xlab="Weight (Kg)", main = "Distributional Frequency of student weight", ylim=c(0,8), xlim=c(140,190), breaks = 51) 33

34 Is height and weight of students sampled normally distributed? ks.test(studentprofile$Height, pnorm) ks.test(studentprofile$Weight, pnorm) Fundamentals of Programming 34 H 0 : The data follow a specified distribution H 1 : The data do not follow the specified distribution p-value ≤ 0.05  Reject H 0 p-value > 0.05  Do not reject H 1 CAVEAT!!! http://www.r- bloggers.com/normality-tests- don%E2%80%99t-do-what- you-think-they-do/

35 Are the height and weight of students linearly correlated? reg1 <- lm(studentprofile$Height~ studentprofile$Weight) Fundamentals of Programming 35

36 Are the height and weight of students linearly correlated? Fundamentals of Programming 36

37 Fundamentals of Programming plot(studentprofile$Weigh t, studentprofile$Height, main="Distribution of Height and Weight of students", xlab="Weight (Kg)", ylab="Height(cm)", pch=19, cex=0.5) reg1 <- lm(studentprofile$Height~ studentprofile$Weight) abline(reg1,col=2) 37

38 intro checklist: what have you learnt today? 38 How to install R platform on your machine How to install R packages and dependencies How to get help and instructions How to use a library Variables and assigning values to variables Data types which R accepts Arithmetic manipulations of variables (+ - * / % ** etc) Browsing and managing your variables (ls, rm) Assigning vectors - the c() command Vector manipulations and referencing Matrices – declaration and manipulation (rows/columns) – rbind Data frames – import from xls/csv/txt files and statistical manipulation Introducing data categorization using R datatype - Factor Simple graph plotting More statistical analysis Simple example of linear regression

39 References Crawley, M.J. (2007) The R book. Macdonald, J., and Braun, W.J. (2010) Data Analysis and Graphics using R – an Example-based approach. Kabacoff, R.I. (2012) Quick-R : Data types http://www.statmethods.net/input/datatypes.html Accessed on 7/1/2014 http://www.statmethods.net/input/datatypes.html King, W.B. (2010) Doing Arithmetic in R. http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html Accessed on 7/1/2014 http://ww2.coastal.edu/kingw/statistics/R-tutorials/arithmetic.html Ian (2011) Normality tests don’t do what you think they do. http://www.r- bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Accessed on 7/1/2014http://www.r- bloggers.com/normality-tests-don%E2%80%99t-do-what-you-think-they-do/ Joris Meys and Andried de Vries. How to Test Data Normality in a Formal Way in R. http://www.dummies.com/how-to/content/how-to-test-data- normality-in-a-formal-way-in-r.html Accessed on 7/1/2014http://www.dummies.com/how-to/content/how-to-test-data- normality-in-a-formal-way-in-r.html 39

40 Future classes on and packages R has a very rich repertoire of packages Statistical analysis Microarray analysis NGS Etc etc. 40


Download ppt "Basic R Programming for Life Science Undergraduate Students Introductory Workshop (Session 1) 1."

Similar presentations


Ads by Google