Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi.

Slides:



Advertisements
Similar presentations
Training on R For 3 rd and 4 th Year Honours Students, Dept. of Statistics, RU Empowered by Higher Education Quality Enhancement Project (HEQEP) Department.
Advertisements

Introduction to arrays
Chapter 7 Introduction to Procedures. So far, all programs written in such way that all subtasks are integrated in one single large program. There is.
Introduction to Computing Science and Programming I
Computer Science 1620 Loops.
ITERATIVE CONSTRUCTS: DOLIST Dolist is an iterative construct (a loop statement) consisting of a variable declaration and a body The body states what happens.
Pseudocode and Algorithms
CSI 101 Elements of Computing Spring 2009 Lecture # 8 Looping and Recursion Wednesday, February 25 th, 2009.
Control Structures: Part 2. Introduction Essentials of Counter-Controlled Repetition For / Next Repetition Structure Examples Using the For / Next Structure.
ECE122 L11: For loops and Arrays March 8, 2007 ECE 122 Engineering Problem Solving with Java Lecture 11 For Loops and Arrays.
Chapter 4 Loops and Character Manipulation Dr. Ali Can Takinacı İstanbul Technical University Faculty of Naval Architecture and Ocean Engineering İstanbul.
An Introduction to Logistic Regression
Chapter 2: Algorithm Discovery and Design
11 Chapter 4 LOOPS AND FILES. 22 THE INCREMENT AND DECREMENT OPERATORS To increment a variable means to increase its value by one. To decrement a variable.
Loops are MATLAB constructs that permit us to execute a sequence of statements more than once. There are two basic forms of loop constructs: i. while.
Fundamentals of Python: From First Programs Through Data Structures
Fundamentals of Python: First Programs
Chapter 5: Control Structures II (Repetition)
Recursion, Complexity, and Searching and Sorting By Andrew Zeng.
Outlines Chapter 3 –Chapter 3 – Loops & Revision –Loops while do … while – revision 1.
CHAPTER 5: CONTROL STRUCTURES II INSTRUCTOR: MOHAMMAD MOJADDAM.
EGR 2261 Unit 5 Control Structures II: Repetition  Read Malik, Chapter 5.  Homework #5 and Lab #5 due next week.  Quiz next week.
A First Book of ANSI C Fourth Edition
Chapter 2 - Algorithms and Design
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
Python Programming Chapter 6: Iteration Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
CPS120 Introduction to Computer Science Iteration (Looping)
CPS120: Introduction to Computer Science Decision Making in Programs.
Copyright © 2012 Pearson Education, Inc. Chapter 6 More Conditionals and Loops Java Software Solutions Foundations of Program Design Seventh Edition John.
Chapter 5: Control Structures II (Repetition). Objectives In this chapter, you will: – Learn about repetition (looping) control structures – Learn how.
An Introduction to Programming with C++ Sixth Edition Chapter 7 The Repetition Structure.
Pseudocode. Simple Program Design, Fourth Edition Chapter 2 2 Objectives In this chapter you will be able to: Introduce common words, keywords, and meaningful.
Pseudocode Simple Program Design Third Edition A Step-by-Step Approach 2.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Visual Basic Programming
Chapter 5: Control Structures II J ava P rogramming: From Problem Analysis to Program Design, From Problem Analysis to Program Design,
Logic Our programs will have to make decisions in terms of what to do next –we refer to the decision making aspect as logic Logic goes beyond simple if.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Introduction to Loops Iteration Repetition Counting Loops Also known as.
Loops Robin Burke IT 130. Outline Announcement: Homework #6 Conditionals (review) Iteration while loop while with counter for loops.
CPS120 Introduction to Computer Science Iteration (Looping)
Think Possibility 1 Iterative Constructs ITERATION / LOOPS C provides three loop structures: the for-loop, the while-loop, and the do-while-loop. Each.
Decision Making and Branching (cont.)
1 An Introduction to R © 2009 Dan Nettleton. 2 Preliminaries Throughout these slides, red text indicates text that is typed at the R prompt or text that.
Repetition Statements (Loops). 2 Introduction to Loops We all know that much of the work a computer does is repeated many times. When a program repeats.
Python Programing: An Introduction to Computer Science
Fortran: Control Structures Session Three ICoCSIS.
ALGORITHMS AND FLOWCHARTS. Why Algorithm is needed? 2 Computer Program ? Set of instructions to perform some specific task Is Program itself a Software.
Arrays and Loops. Learning Objectives By the end of this lecture, you should be able to: – Understand what a loop is – Appreciate the need for loops and.
C++ Programming: From Problem Analysis to Program Design, Fifth Edition Chapter 5: Control Structures II (Repetition)
CMSC 104, Section 301, Fall Lecture 18, 11/11/02 Functions, Part 1 of 3 Topics Using Predefined Functions Programmer-Defined Functions Using Input.
4 - Conditional Control Structures CHAPTER 4. Introduction A Program is usually not limited to a linear sequence of instructions. In real life, a programme.
1 Introduction to R A Language and Environment for Statistical Computing, Graphics & Bioinformatics Introduction to R Lecture 3
Control Structures Hara URL:
1 COMS 261 Computer Science I Title: C++ Fundamentals Date: September 23, 2005 Lecture Number: 11.
CprE 185: Intro to Problem Solving (using C)
Programming in R Intro, data and programming structures
REPETITION CONTROL STRUCTURE
Loops BIS1523 – Lecture 10.
Introduction To Repetition The for loop
Repetition Structures Chapter 9
CS1371 Introduction to Computing for Engineers
Siti Nurbaya Ismail Senior Lecturer
Control Structure Senior Lecturer
Iteration: Beyond the Basic PERFORM
Algorithm Discovery and Design
Computing Fundamentals
CSC115 Introduction to Computer Programming
Using C++ Arithmetic Operators and Control Structures
Presentation transcript:

Training on R for Students Higher Education Quality Enhancement Project (HEQEP) Software Training Program Organized by Department of Statistics Rajshahi University, Rajshahi-6205, Bangladesh. (An Open Source Package) Date: March 22-23, 2013 Lecture 3

2 Programming with R

3 X<-matrix(rpois(20,1.5),nrow=4) # rpois() is random sample from Poisson dist X [,1] [,2] [,3] [,4] [,5] [1,] [2,] [3,] [4,] Suppose that the rows refer to four different trials and we want to label the rows ‘Trial.1’ etc. We employ the function rownames to do this. We could use the paste function but here we take advantage of the prefix option: rownames(X)<-rownames(X,do.NULL=FALSE, prefix="Trial.") X [,1] [,2] [,3] [,4] [,5] Trial Trial Trial Trial Matrices

4 For the columns we want to supply a vector of different names for the five drugs involved in the trial, and use this to specify the colnames(X): drug.names<-c("aspirin", "paracetamol", "nurofen", "hedex", "placebo") colnames(X)<-drug.names X aspirin paracetamol nurofen hedex placebo Trial Trial Trial Trial Alternatively, you can use the dimnames function to give names to the rows and/or columns of a matrix. In this example we want the rows to be unlabelled (NULL) and the column names to be of the form ‘drug.1’, ‘drug.2’, etc. dimnames(X)<-list(NULL, paste("drug.",1:5,sep="")) X drug.1 drug.2 drug.3 drug.4 drug.5 [1,] [2,] [3,] [4,] Matrices

5 Making data frames (1) We illustrate how to construct a data frame from the following car data. MakeModelCylinderWeightMileageType HondaCivicV Sporty Chevrolet BerettaV Compact FordEscortV Small EagleSummitV Small VolkswagenJettaV Small BuickLe SabreV Large MitsubishiGalantV Compact DodgeGrand CaravanV Van ChryslerNew YorkerV Medium AcuraLegendV Medium

6 Making data frames (2) > Make <- c("Honda","Chevrolet","Ford","Eagle","Volkswagen","Buick","Mitsbusihi", + "Dodge","Chrysler","Acura") > Model <- c("Civic","Beretta","Escort","Summit","Jetta","Le Sabre","Galant", + "Grand Caravan","New Yorker","Legend") > Cylinder <-c (rep("V4",5),"V6","V4",rep("V6",3)) > Weight <- c(2170, 2655, 2345, 2560, 2330, 3325, 2745, 3735, 3450, 3265) > Mileage <- c(33, 26, 33, 33, 26, 23, 25, 18, 22, 20) > Type <- c("Sporty","Compact",rep("Small",3),"Large","Compact","Van", + rep("Medium",2)) # rep("V4",5) instructs R to repeat V4 five times.

7 Making data frames (3) Now data.frame() function combines the six vectors into a single data frame. Car <- data.frame(Make,Model,Cylinder,Weight,Mileage,Type) Car MakeModelCylinderWeightMileageType 1 HondaCivicV Sporty 2 Chevrolet BerettaV Compact 3 FordEscortV Small 4 EagleSummitV Small 5 VolkswagenJettaV Small 6 BuickLe SabreV Large 7 MitsubishiGalantV Compact 8 DodgeGrand CaravanV Van 9 ChryslerNew YorkerV Medium 10 AcuraLegendV Medium

8 Few Operations in data frame Car (1) names(Car) [1] "Make" "Model" "Cylinder“ "Weight" "Mileage" "Type" Car[1,] Make Model Cylinder Weight Mileage Type 1 Honda Civic V Sporty Car[10,4] [1] 3265 Car$Mileage [1] mean(Car$Mileage) #average mileage of the 10 vehicles [1] 25.9 min(Car$Weight) # minimum of car weights [1] 2170

9 table(Car$Type) # gives a frequency table Compact Large Medium Small Sporty Van table(Car$Make, Car$Type) # Cross tabulation Compact Large Medium Small Sporty Van Acura Buick Chevrolet Chrysler Dodge Eagle Ford Honda Mitsbusihi Volkswagen Few Operations in data frame Car (2)

10 Making data frames (6) Make.Small <- Car$Make[Car$Type == "Small"] summary(Car$Mileage) # gives summary statistics Min. 1st Qu. Median Mean 3rd Qu. Max

11 Rank, Sorting and Order Price<- scan() 1: : Read 12 items ranks<-rank(Price) sorted<-sort(Price) ordered<-order(Price) # position view<-data.frame(Price, ranks, sorted, ordered) view Price ranks sorted ordered

apply( arr, margin, fct ) Applies the function fct along some dimensions of the array arr, according to margin, and returns a vector or array of the appropriate size. The apply function is used for applying functions to the rows or columns of matrices or dataframes. Evaluating Functions with apply, sapply and lapply

13 For example: apply (X<-matrix(1:24,nrow=4)) [,1] [,2] [,3] [,4] [,5] [,6] [1,] [2,] [3,] [4,] apply(X,1,sum) # to obtain the row total [1] apply(X,2,sum) # to obtain the column totals (six of them): [1] apply(X,1,sqrt) apply(X,2,sqrt) apply(X,1,sample) apply(X,1,function(x) x^2+x)

14 If you want to apply a function to a vector then use sapply (rather than apply for matrices or margins of matrices). Here is the code to generate a list of sequences from 1:3 up to 1:7 sapply(3:7, seq) [[1]] [1] [[2]] [1] [[3]] [1] [[4]] [1] [[5]] [1] The function sapply is most useful with complex iterative calculations. Vector and sapply

15 Example: sapply a<-seq(0.01,0.2,.005) Now we can use sapply to apply the sum of squares function for each of these values of a (without writing a loop), and plot the deviance against the parameter value for a: sumsq<- function(x) {sum(x^2)} # function that produce sum of squares plot(a, sapply(a, sumsq), type="l")

16 Lists and lapply lapply( li, fct ) # to each element of the list li, the function fct is applied. a<-c("a","b","c","d") b<-c(1,2,3,4,4,3,2,1) c<-c(T,T,F) list.object<-list(a,b,c) # create a list object class(list.object) # to see the class type [1] "list" list.object # to see the contents of the list we just type its name: [[1]] [1] "a" "b" "c" "d" [[2]] [1] [[3]] [1] TRUE TRUE FALSE The function lapply applies a specified function to each of the elements of a list in turn (without the need for specifying a loop, and not requiring us to know how many elements there are in the list).

17 #To know the length of each of the vectors making up the list: lapply(list.object, length) #To find out class, we apply the function class to the list: lapply(list.object, class) #To find mean lapply(list.object, mean) Lists and lapply

18 Working with Vectors and Logical Subscripts Take the example of a vector containing the 11 numbers 0 to 10: x<-0:10 There are two quite different kinds of things we might want to do with this. We might want to add up the values of the elements: sum(x) # adds up the values of the xs [1] 55 Alternatively, we might want to count the elements that passed some logical criterion. Suppose we wanted to know how many of the values were less than 5: sum(x<5) # counts up the number of cases that pass the logical # condition ‘x is less than 5’ [1] 5

19 When we counted the number of cases, the counting was applied to the entire vector, using sum(x<5). To find the sum of the values of x that are less than 5: sum(x[x<5]) [1] 10 The logical condition x<5 is either true or false: x<5 [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE [10] FALSE FALSE Imagine false as being numeric 0 and true as being numeric 1. Then the vector of subscripts [x<5] is five 1s followed by six 0s: 1*(x<5) [1] Now imagine multiplying the values of x by the values of the logical vector x*(x<5) [1] When the function sum is applied, it gives us the answer we want: the sum of the values of the numbers =10. sum(x*(x<5)) [1] 10 This produces the same answer as sum(x[x<5])

20 Addresses within Vectors There are two important functions for finding addresses within arrays. The function which is very easy to understand. y<-c(8,3,5,7,6,6,8,9,2,3,9,4,10,4,11) y [1] Suppose we wanted to know which elements of y contained values bigger than 5. We type which(y>5) [1] Notice that the answer to this enquiry is a set of subscripts. We don’t use subscripts inside the which function itself. The function is applied to the whole array. To see the values of y that are larger than 5, we just type y[y>5] [1] Note that this is a shorter vector than y itself, because values of 5 or less have been left out: length(y) [1] 15 length(y[y>5]) [1] 9

21 Finding Closest Values Finding the value in a vector that is closest to a specified value is straightforward using which. Here, we want to find the value of xv that is closest to 108.0: which(abs(y-8.9)==min(abs(y-8.9))) [1] 8 11 The closest value to is in location 332. But just how close to 8.9 is this 8 th and 11 th value? We use 8 and 11 as a subscript on y to find this out y[c(8,11)] [1] 9 9 Thus, we can write a function to return the closest value to a specified value sv closest<-function(xv, sv){ xv[which(abs(xv-sv)==min(abs(xv-sv)))] } and run it like this: closest(y,8.9) [1] 9 9

22 The sample Function y<- scan() [1] and here are two samples of y: sample(y) [1] sample(y) [1] The order of the values is different each time that sample is invoked, but the same numbers are shuffled in every case. This is called sampling without replacement. sample(y, 5) [1] sample(y, 5) [1] The option replace=T allows for sampling with replacement, sample(y, replace=T) [1]

23 Writing a computer program to solve a problem can usually be reduced by following this sequence of steps: 1 Understand the problem. 2 Work out a general idea how to solve it. 3 Translate your general idea into a detailed implementation. 4 Check: Does it work? Is it good enough? If yes, you are done! If no, go back to step 2. Some general programming guidelines

24 1.The for() statement 2.The if() statement 3.The while() loop 4.The repeat loop, and the break and next statements Flow control

25 The for statement (1) The for() statement allows one to specify that a certain operation should be repeated a fixed number of times. Syntax for (variable in sequence) expression Or, for (variable in sequence) { expression } Flow control

26 Example: The for statement sum.x <- 0 for (i in 1:5) { sum.x <- sum.x + i print(i) } [1] 1 [1] 2 [1] 3 [1] 4 [1] 5 sum.x [1] 15 print() # Prints a single R object cat() # Prints multiple objects, # one after the other

27 Example: for loop The Fibonacci sequence is a famous sequence in mathematics. The first two elements are defined as [1, 1]. Subsequent elements are defined as the sum of the preceding two elements. For example, the third element is 2 (= 1+1), the fourth element is 3 (= 1+2), the fifth element is 5 (= 2+3), and so on. To obtain the first 12 Fibonacci numbers in R, we can use Fibonacci <- numeric(12) # numeric array of size 12 Fibonacci[1] <- Fibonacci[2] <- 1 for (i in 3:12) Fibonacci[i] <- Fibonacci[i - 2] + Fibonacci[i - 1] To see all 12 values, type in Fibonacci [1]

28 The for statement (2) What are the outputs of the following statement? for (x in 1:10) print(sqrt(x)) It prints the square root of the integers one to ten

29 Conditional execution (1) The if() statements: The if() statement allows us to control which statements are executed Syntax if (condition) {commands when TRUE} Or, if (condition) {commands when TRUE} else {commands when FALSE} That is, if (expr1) expr2 else expr3 where expr1 must evaluate to a single logical value and the result of the entire expression is then evident.

30 Conditional execution: if statement Expre- ssion Statement 1 False True (1) Entry Exit

31 Conditional execution: if – else statement Body of if Test Expre- ssion Body of else True (1)False (0) Entry Exit

32 Example: Conditional execution (2) A simple example: x <- 3 if (x > 2) y <- 2 * x else y <- 3 * x Since x > 2 is TRUE, y is assigned 2 * 3 = 6. If it hadn’t been true, y would have been assigned the value of 3 * x.

33 The while() loop Sometimes we want to repeat statements, but the pattern of repetition isn’t known in advance. We need to do some calculations and keep going as long as a condition holds. The while() statement accomplishes this. Syntax while (condition) {statements} The condition is evaluated, and if it evaluates to FALSE, nothing more is done. If it evaluates to TRUE the statements are executed, condition is evaluated again, and the process is repeated.

34 Example: while loop Suppose we want to list all Fibonacci numbers less than 300. We don’t know beforehand how long this list is, so we wouldn’t know how to stop the for()loop at the right time, but a while()loop is perfect: Fib1 <- 1 Fib2 <- 1 Fibonacci <- c(Fib1, Fib2) while (Fib2 < 300) { Fibonacci <- c(Fibonacci, Fib2) oldFib2 <- Fib2 Fib2 <- Fib1 + Fib2 Fib1 <- oldFib2 } To see the final result of the computation, type Fibonacci [1]

35

36

37 The repeat loop, and the break and next statements Sometimes we don’t want a fixed number of repetitions of a loop, and we don’t want to put the test at the top of the loop the way it is in a while() loop. In this situation we can use a repeat loop. This loop repeats until we execute a break statement. Syntax repeat { statements } This causes the statements to be repeated endlessly. The statements should normally include a break statement, typically in the form if (condition) break but this is not a requirement of the syntax. The break statement causes the loop to terminate immediately. Break statements can also be used in for() and while() loops. The next statement causes control to return immediately to the top of the loop; it can also be used in any loop. The repeat loop and the break and next statements are used relatively infrequently.

38 Example: repeat, break We can repeat the Newton’s algorithm example from the previous example using a repeat loop: x <- x0<-.5 tolerance < repeat { f <- x^3 + 2 * x^2 - 7 if (abs(f) < tolerance) break f.prime <- 3 * x^2 + 4 * x x <- x - f / f.prime } x This version removes the need to duplicate the line that calculates f. #****** Prog Using while x <- x0<-.5 f <- x^3 + 2 * x^2 - 7 tolerance < while (abs(f) > tolerance) { f.prime <- 3 * x^2 + 4 * x x <- x - f / f.prime f <- x^3 + 2 * x^2 - 7 } x

39 Writing functions (1) A function is defined by an assignment of the form > name <- function(arg_1, arg_2,...) expression The expression is an R expression, (usually a grouped expression), that uses the arguments, arg_i, to calculate a value. The value of the expression is the value returned for the function. A call to the function then usually takes the form > name(expr_1, expr_2,...) and may occur anywhere a function call is legitimate.

40 Writing functions (2) Example 1: Write a function to compute standard deviation sd <- function(x){ sqrt(var(x)) } If X = 9, 5, 2, 3, 7; type x <- c(9,5,2,3,7) sd(x) [1] Exercise: Calculate the coefficient of variation as the standard deviation of a variable, after dividing by its mean.

41 Example : Geometric mean as a function insects<-c(1,10,1000,10,1) mean(insects) [1] To calculate a geometric mean by finding the antilog (exp) of the average of the logarithms (log) of the data: exp(mean(log(insects))) [1] 10 So a function to calculate geometric mean of a vector of numbers x: geometric<-function (x) {exp(mean(log(x)))} and testing it with the insect data geometric(insects) [1] 10

42 Writing functions (3)

43 Writing functions (4) Example 3: Calculation of Grade point and Letter grade from score or marks. grade <- function (s) { n <- length(s) gp <- matrix(0, nrow = n, ncol = 1) # gp means Grade Point lg <- matrix(0, nrow = n, ncol = 1) # lg means Letter Grade for (i in 1:n) { if (s[i] < 40){ gp[i] = 0.00; lg[i]= "F" } else if (s[i] >= 40 && s[i] < 45){ gp[i] = 2.00; lg[i] = "D" }

44 Writing functions (5) else if (s[i] >= 45 && s[i] < 50){ gp[i] = 2.25; lg[i] = "C" } else if (s[i] >= 50 && s[i] < 55){ gp[i] = 2.50; lg[i] = "C+" } else if (s[i] >= 55 && s[i] < 60){ gp[i] = 2.75; lg[i] = "B-" } else if (s[i] >= 60 && s[i] < 65){ gp[i] = 3.00; lg[i] = "B" } else if (s[i] >= 65 && s[i] < 70){ gp[i] = 3.25; lg[i] = "B+" }

45 Writing functions (6) else if (s[i] >= 70 && s[i] < 75){ gp[i] = 3.50; lg[i] = "A-" } else if (s[i] >= 75 && s[i] < 80){ gp[i] = 3.75; lg[i] = "A" } else{ gp[i] = 4.00; lg[i] = "A+" } } # end of for loop return(list(Grade.Point = gp, Letter.Grade = lg)) } # end of function score <- c(80, 45, 55, 90, 75, 38, 62) result <- grade(score) result

46 Writing functions (7) Example 4: Write a function to calculate the two sample t- statistic. This is an artificial example. The function is defined as follows: twosam <- function(y1, y2) { n1 <- length(y1); n2 <- length(y2) yb1 <- mean(y1); yb2 <- mean(y2) s1 <- var(y1); s2 <- var(y2) s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2) tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2)) deg.free <- n1+n2-2 return(list(test.stat = tst, df=deg.free)) }

47 Writing functions (8) If, x <- c(37, 29, 35, 28, 24, 36, 40, 37, 33, 28, 39) y <- c(22, 32, 27, 30, 24, 34, 32, 20, 24, 25, 28, 26, 26) Then twosam(x, y) $test.stat [1] $df [1] 22

48 Maximum likelihood estimation The pdf of Gamma distribution The likelihood and log-likelihood are Example 4: Maximum likelihood estimation (Gamma distribution as an example)

49 Maximum likelihood estimation mle() allows to estimate parameters by maximum likelihood method using iterative methods of numerical calculus to minimize the negative log- likelihood (which is the same of maximizing the log- likelihood). This requires to specify the negative log-likelihood analytical expression as argument and giving some starting parameters estimates.

50 Maximum likelihood estimation x.gam <- rgamma(200, rate=0.5, shape=3.5) # sample size n=200 from a gamma distribution with # λ=0.5 (scale parameter) and α=3.5 (shape parameter) library(stats4) # loading package stats4 for mle() logL <- function(lambda, alfa) { n <-200 x <- x.gam temp1 <- -n*alfa*log(lambda)+n*log(gamma(alfa)) temp2 <- -(alfa-1)*sum(log(x))+lambda*sum(x) temp1+temp2 # -log-likelihood function } est <- mle(minuslog =logL, start =list(lambda =2, alfa =1))

51 Maximum likelihood estimation summary(est) Maximum likelihood estimation Call: mle(minuslogl = logL, start = list(lambda = 2, alfa = 1)) Coefficients: Estimate Std. Error lambda alfa log L:

52 Thank You