Presentation is loading. Please wait.

Presentation is loading. Please wait.

Harnessing the Power of High Performance Computing for R

Similar presentations


Presentation on theme: "Harnessing the Power of High Performance Computing for R"— Presentation transcript:

1 Harnessing the Power of High Performance Computing for R
Stanford Research Computing Center Zhiyong Zhang

2 Shared memory v.s. distributed memory
2 2

3 R: HPC Matters Most R packages use a single core. Parallelism for R
Vectorization and other optimizations in R Interfaces for Compiled code Profiling tools: memory and CPU Large memory and out-of-memory data Parallelism for R Threads or shared memory: limited to one node implicit: minor changes (if at all) required Explicit: some assembly required MPI/Sockets or distributed memory: one or more nodes (explicit only) R with accelerator: Xeon Phi Coprocessors GPU accelerators 3 3

4 R in parallel: R basics .Rprofile/.Renviron file.path(getwd(),".Rprofile") R_LIBS $R_HOME $R_HOME_DIR $R_LIBS_USER ~]$ R RHOME /share/sw/free/R/3.2.0/lib64/R Rscript script_name.R R CMD BATCH script_name.R ~]$ R > source(“script_name”) > install.packages(“pkg”) > remove.packages(“pkg”) > help(solve) > example(solve) > ?<command-name> or ??<command-name> > ls(pos = "package:Rmpi") > .packages() > path.package("Rmpi") > installed.packages() > .libPaths() > .Library 4 4

5 Useful R commands and variables
help(fctn) displays help on any function fctn, as in Python. To invoke complex arithmetic, add 0i to a number. For example, sqrt(-1) returns NaN, but sqrt(-1 + 0i) returns 0 + 1i. sessionInfo() prints the R version, OS, packages loaded, etc. ls() shows which objects are defined. rm(list=ls()) clears all defined objects. dev.new() opens a new plotting window. The function sort() does not change its argument. Distribution function prefixes d, p, q, r: stand for density (PDF), probability ( CDF), quantile (CDF-1), and random sample. dnorm is the density function of a normal random variable and rnorm generates a sample from a normal random variable. dunif and runif: corresponding functions for a uniform random variable. 5 5

6 Useful R commands and variables
Assignment e <- m*c^2. m*c^2 -> e. Must use = for default function arguments or named arguments. Not reserved but better treated as: c, q, s, t, C, D, F, I, and T. Comments Comments begin with # and continue to the end of the line, as in Python or Perl. Missing values and NaN NaN, “not a number.”: overflows, undefined operation such as dividing by zero. NA, “not applicable.” missing data or NULL value. NA is a legal value inside an R vector and vectors received by R functions may contain non-null components. R functions must handle NA values. is.nan returns TRUE for NaN. is.na return true for NA or NaN. 6 6

7 Useful R commands and variables
Vectors A vector in R is a container vector, an ordered collection of numbers. The length of a vector is the number of elements in the container. Operations are applied componentwise. R allows adding two vectors of different lengths: elements of the shorter summand are recycled as often as necessary to create a vector the length of the longer summand. Scalars are vectors of length 1 Vectors are created using the c function: p <- c(2,3,5,7) Vectors in R are indexed starting with 1 and matrices are in column-major order. Elements of a vector can be accessed using []. Vectors expand when assigning to indexes past the end of the vector. Negative indices x[-n] returns a copy of x with the nth element removed. Boolean values can also be used as indices Five kinds of subscripts in R. The type of a vector is the type of the elements it contains. All elements of a vector must have the same underlying type: logical, integer, double, complex, character, or raw. 7 7

8 Useful R commands and variables
Lists Lists are like vectors, except elements need not all have the same type: integer, string, or a vector of Boolean values. Lists are created using the list function. Elements can be access by position using [[]]. Named elements may be accessed either by position or by name. Named elements of lists act like C structs, except a dollar sign rather than a dot is used to access elements. a <- list(name="Joe", 4, foo=c(3,8,9)) Now a[[1]] and a$name both equal the string “Joe”. Access a non-existent element of a list, say a[[4]] above, generates an error. Assignment to a non-existent element of a list extends the list. If the index assigned to is more than one past the end of the list, intermediate elements are created and assigned NULL values. Assignment to non-existent named fields, such as saying a$baz = TRUE. 8 8

9 Useful R commands and variables
Sequences seq(a, b, n) creates a closed interval from a to b in steps of size n. seq(a, b, length=n): sequence with n points and step size (b-a)/(n-1). seq(1, 10, 3) returns the vector containing 1, 4, 7, and 10. The step size argument n defaults to 1 in both R and Python. a:b is an abbreviation for seq(a, b, 1). Matrices Matrices in R are vectors of multiple dimensions m <- array( c(1,2,3,4,5,6), dim=c(2,3) ) creates a matrix m. R fills matrices by column, the first row of m has elements 1, 3, and 5. To fill m by row, m <- array( c(1,2,3,4,5,6), dim=c(2,3), by.row = TRUE). 9 9

10 Useful R commands and variables
Type conversion functions as.xxx converts its argument to type xxx. as.integer(3.2) returns the integer 3 as.character(3.2) returns the string “3.2”. Boolean operators T or TRUE for true values F or FALSE for false values. & and | apply element-wise on vectors. && and || are often used in conditional statements && and || use lazy evaluation as in C: the operators will not evaluate their second argument if the return value is determined by the first argument. 10 10

11 Useful R commands and variables
Functions f <- function(a=10, b) { return (a+b) } The function function returns a function, which could be assigned to a. Function statement can be used to create an anonymous function (lambda expression). return is a function and its argument must be contained in parentheses. The use of return is optional; otherwise the value of the last line executed is returned. A default value need not be a static type but could be a function of other arguments. Arguments are passed by value. Variables by reference: (1) non-local variables variables in the calling routine’s scope, or (2) pass in all needed values and return a list as output. Scope: R uses lexical scoping while S-PLUS uses static scope. Since variables cannot be declared — they pop into existence on first assignment — it is not always easy to determine the scope of a variable. You cannot tell just by looking at the source code of a function whether a variable is local to that function. 11 11

12 R: memory profiling Memory management: predict how much memory is needed and to make the most of the memory. write faster code because accidental copies are a major cause of slow code. basics of memory management: objects, functions, larger blocks of code. size of an object: base size of each object is 40B: metadata, two pointers to previous and next data, one pointer to attributes, length, and padding allocates vectors that are 8, 16, 32, 48, 64, or 128 bytes long Beyond 128 bytes, R ask for memory in multiples of 8 bytes. Shared objects: shared objects are stored once Global string pool: unique strings are stored in one place 12 12

13 R: memory profiling #install.packages("ggplot2"); #install.packages("pryr") require(ggplot2); require(pryr) sizes <- sapply(0:50, function(n) object_size(seq_len(n))) plot(0:50, sizes, xlab = "Length", ylab = "Size (bytes)", type = "s") plot(0:50, sizes - 40, xlab = "Length", ylab = "Bytes excluding overhead", type = "n") abline(h = 0, col = "grey80") abline(h = c(8, 16, 32, 48, 64, 128), col = "grey80") abline(a = 0, b = 4, col = "grey90", lwd = 4) lines(sizes - 40, type = "s") x1 <- 1:1e6 y1 <- list(1:1e6, 1:1e6, 1:1e6) object_size(x1); object_size(y1) object_size("banana"); object_size(rep("banana", 10)) 13 13

14 R: memory profiling mem_used(); mem_change()
mem_change(x <- 1:1e6); mem_change(rm(x)); mem_change(NULL) gc(): R does garbage collection automatically Memory leaks: formulas and enclosures f1 <- function() { x <- 1:1e6; 10} mem_change(x <- f1()) object_size(x) f2 <- function() { x <- 1:1e6; a ~ b} mem_change(y <- f2()) object_size(y) f3 <- function() { x <- 1:1e6; function() 10} mem_change(z <- f3()) object_size(z) 14 14

15 R: memory profiling with lineprof
require(devtools); require(lineprof); #devtools::install_github("hadley/lineprof") read_delim <- function(file, header = TRUE, sep = ",") { first <- scan(file, what = character(1), nlines = 1, sep = sep, quiet = TRUE) p <- length(first) all <- scan(file, what = as.list(rep("character", p)), sep = sep, skip = if (header) 1 else 0, quiet = TRUE) all[] <- lapply(all, type.convert, as.is = TRUE) if (header) { names(all) <- first } else { names(all) <- paste0("V", seq_along(all)) } as.data.frame(all)} library(ggplot2) write.csv(diamonds, "diamonds.csv", row.names = FALSE) source("read-delim.r") prof <- lineprof(read_delim("diamonds.csv")) #shine(prof) prof 15 15

16 Speed up R code: Rprof system.time({n=10^7; data1=rnorm(n); data2=rnorm(10^7); for(i in 1:n-1){data1[i]=data1[i]+data1[i+1]; data1[i]=exp(data1[i]^2)}; data1=data1*data2; matrix.data1=matrix(data1,nrow=100,byrow=TRUE); matrix.data2=matrix(data2,nrow=100,byrow=TRUE); almost.final.result=matrix.data1%*%t(matrix.data2); final.result=solve(almost.final.result)})[] Rprof("profiling.out") n=10^7; data1=rnorm(n); data2=rnorm(10^7); final.result=solve(almost.final.result) Rprof() 16 16

17 Speed up R code: Rprof > summaryRprof("profiling.out") $by.self
self.time self.pct total.time total.pct "rnorm" "%*%" "exp" "^" "+" "matrix" ":" "t.default" "-" "*" 17 17

18 Speed up R code: Rprof line_RProf_function=function() {
data1=rnorm(10^7) for(i in 1:(length(data)-1)) { data1[i]=data1[i]+data1[i+1] data1[i]=exp(data1[i]^2) } data2=rnorm(10^7) data1=data2*data1 matrix.data1=matrix(data1, nrow=100, byrow=TRUE) matrix.data2=matrix(data2, nrow=100, byrow=TRUE) almost.final.result=matrix.data1%*%t(matrix.data2) final.result=solve(almost.final.result) Rprof("profiling.out", line.profiling=TRUE) line_RProf_function() Rprof() 18 18

19 Speed up R code: Rprof > summaryRprof("profiling.out", lines="show") $by.self self.time self.pct total.time total.pct # # # # # # <no location> 19 19

20 How to Speed up R code HDD/SSD/RAM/CPU/Vectorization
Avoid/minimize reading from/writing to disk! To measure execution time: time.start=proc.time()[[3]] ### Your code ### time.end=proc.time()[[3]] time.end-time.start system.time({### Your code ###})[[3]] > system.time({data=rnorm(10^7)})[] user.self sys.self elapsed user.child sys.child Writing to disk: > system.time({data=rnorm(10^7) + write.table(data, "data.txt") })[] 20 20

21 Speed up R code:vectors
Measuring time > ptm <- proc.time() > for (i in 1:50) mad(stats::runif(500)) > proc.time() - ptm user system elapsed > system.time(for (i in 1:50) mad(stats::runif(500))) > n=100 > a=NULL > system.time(for (i in 1:n) a=c(a, sqrt(i))) > system.time(for (i in 1:n) a=c(a,stats::runif(500))) 21 21

22 Speed up R code:vectors
n=1000 a=NULL # vector grows one by one system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed > n=1000 a=vector(length=n) # vector declared, concatenation system.time(for (i in 1:n) {print(i); a=c(a,stats::runif(500))}) system.time(for (i in 1:n) {a[i]=stats::runif(500)}) # assignment 22 22

23 Speed up R code:vectors
n=1000 a=NULL system.time(for (i in 1:n) {a=c(a,stats::runif(500))}) user system elapsed 23 23

24 Speed up R code:vectors
n= a=vector(length=n) n= system.time(for(i in 1:n) a[i]=sqrt(i)*2) ^C Timing stopped at: user system elapsed system.time({for(i in 1:n) a[i]=sqrt(i); a=a*2}) 24 24

25 Speed up R code:vectors
m=100000 n=10000 a=matrix(nrow=m,ncol=n) rm(a) system.time({a=matrix(nrow=m,ncol=n)}) user system elapsed system.time({vrow=vector(length=m)}) system.time({vcol=vector(length=n)}) system.time(for(i in 1:m){for(j in 1:n){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed # elementwise multiplication, row first iteration > system.time(for(i in 1:m){for(j in 1:n){a[i,j]=vrow[i]*vcol[j]}}) user system elapsed system.time(for(i in 1:m){a[i,]=vrow[i]*vcol}) # vector operation, row wise assignments system.time(for(i in 1:n){a[,i]=vcol[i]*vrow}) # vector operation, column wise assignments system.time(for(i in 1:n){a[,i]=vcol[i]*vrow}) system.time(for(j in 1:n){for(i in 1:m){a[i,j]=vrow[i]*vcol[j]}}) 25 25

26 Speed up R code: vectors, matrices, and tables vs
Speed up R code: vectors, matrices, and tables vs. loops, dataframes, and lists Loops are, in general, more expensive, and so are data frames and lists. Memory/object management: ls(), rm(data), gc() system.time({ a=NULL for(i in 1:10000)a=c(a,i^2)})[] user.self sys.self elapsed user.child sys.child for(i in 1: )a=c(a,i^2)})[] system.time({a=vector(length= ) for(i in 1: )a[i]=i^2})[] 26 26

27 Speed up R code: vectors, matrices, and tables
Loops are, in general, more expensive, and so are data frames and lists. Memory/object management: ls(), rm(data), gc() system.time({a=vector(length= ) for(i in 1:n) a[i]=sin(i)*2*pi})[] user.self sys.self elapsed user.child sys.child system.time({twopi=2*pi a=vector(length= ) for(i in 1:n) a[i]=sin(i)*twopi})[] for(i in )a[i]=sin(i) a=2*pi*a})[] 27 27

28 How to Speed up R code Working with vectors
system.time({n=100000; y=rnorm(n); w=vector(length=n); for(i in 1:n)w[i]=sum(y[1:i]<y[i])})[] user.self sys.self elapsed user.child sys.child system.time({n=100000; y=rnorm(n); w=matrix(1:n,nrow=n,ncol=1) w=apply(w,1,function(x) sum(y[1:x]<y[x]))})[] 28 28

29 Speed up R: c functions with R
ksmooth1 <- function(data, xpts, h) { dens <- double(length(xpts)) n <- length(data) for(i in 1:length(xpts)) { ksum <- 0 for(j in 1:length(data)) { d <- xpts[i] - data[j] ksum <- ksum + dnorm(d / h)} 𝑓 𝑥 = 1 𝑛ℎ 𝑖=1 𝑛 𝐾 𝑥− 𝑥 𝑖 ℎ dens[i] <- ksum / (n * h)} return(dens)} data=rchisq(500, 3) xpts=seq(0, 10, length=10000) h=0.75 system.time({fx_estimated=ksmooth1(data, xpts, h)})[[3]] [1] 8.579 29 29

30 Speed up R: c functions with R
#include <R.h> #include <Rmath.h> void kernel_smooth(double *data, int *n, double *xpts, int *nxpts, double *h, double *result){ int i, j; double d, ksum; for(i=0; i < *nxpts; i++){ ksum - 0; for (j=0; j < *n; j++){ d = xpts[i] - data[j]; ksum += dnorm(d / *h, 0, 1, 0); } result[i] = ksum / ((*n) * (*h)); R CMD SHLIB ksmooth1.c icc -std=gnu99 -I/share/sw/free/R/3.2.2/lib64/R/include -DNDEBUG -I/usr/local/include -fpic -O2 -march=native -c ksmooth1.c -o ksmooth1.o icc -std=gnu99 -shared -L/share/sw/free/R/3.2.2/lib64/R/lib -L/usr/local/lib64 -o ksmooth1.so ksmooth1.o -L/share/sw/free/R/3.2.2/lib64/R/lib –lR ls ksmooth1.* ksmooth1.c ksmooth1.o ksmooth1.so 30 30

31 Speed up R: c functions with R
dyn.load("ksmooth1.so") ksmooth2 <- function(data, xpts, h) { n <- length(data) nxpts <- length(xpts) dens <- .C("kernel_smooth", as.double(data), as.integer(n), as.double(xpts), as.integer(nxpts), as.double(h), result = double(length(xpts))) return(dens[[6]])} data=rchisq(500, 3) xpts=seq(0, 10, length=10000) h=0.75 system.time({result=ksmooth2(data, xpts, h)})[[3]] [1] 0.187 31 31

32 Speed up R: MP with R $ wget > .libPaths() [1] "/home/zyzhang/R/x86_64-unknown-linux-gnu-library/3.2" [2] "/share/sw/free/R/3.2.2/lib64/R/library" >install.packages("/scratch/users/zyzhang/usertests/R/Tutorial/pnmath/pnmath_0.0-4.tar.gz", repos=NULL) $ R CMD INSTALL pnmath_0.0-4.tar.gz /home/zyzhang/R/x86_64-unknown-linux-gnu-library/3.2 > library(pnmath) > ls(pos="package:pnmath") [1] "calibratePnmath" "getNumPnmathThreads" "getNumProcs" [4] "getParallelThresholds" "setNumPnmathThreads" > getNumProcs() [1] 16 > getNumPnmathThreads() > system.time({A=matrix(1:10^9,nrow=1000); B=tan(sin(cos(tan(A))))})[] 32 32

33 Large memory and out-of-memory data
biglm incremental computations for lm() and glm() to data sets stored outside of R's main memory. ff file-based access to data sets that are too large to be loaded into memory biglars large-than-memory datasets for least-angle regression, lasso and stepwise regression. ffbase adds basic statistical functionality to the ff package. bigmemory: large objects in memory (as well as via files) referred to by external pointers transparent access from R without bumping against R's internal memory limits. Several R processes on the same computer can also share big memory objects. database and database-alike packages HadoopStreaming writing map/reduce scripts for use in Hadoop Streaming; it also facilitates operating on data in a streaming fashion which does not require Hadoop. speedglm fit (generalized) linear models to large data. bigrf Random Forests implementation for parallel execution and large memory. MonetDB.R allows R to access the MonetDB column-oriented, open source database system as a backend. 33 33

34 R in parallel: implicit parallelism
pnmath Open MP directives internal R functions makeing use of multiple cores --- Pnmath0: Pthreads instead of OpenMp if newer compilers are not available. Similar functionality is expected to become integrated into R 'eventually'. romp: Open MP using Fortran pre-alpha and R-Forge project romp was initiated but there is no package, yet. R/parallel package by Vera, Jansen and Suppi: C++-based with master-slave dispatch mechanism Rdsm package: threads-like, multicore and distributed shared memory RhpcBLASctl: Detects and explicitly select the number of available BLAS cores Rhpc: *apply() style dispatch via MPI. 34 34

35 R in parallel: explicit parallelism
Rmpi by Yu: access to numerous functions from the MPI API/R-specific extensions. LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPI pbdMPI: MPI interface to support SPMD pbdSLAP: scalable linear algebra packages pbdBASE: core classes and methods for distributed data types pbdDMAT: distributed dense matrices for "Programming with Big Data". pbdNCDF4: multiple processes access to the same files supports terabyte-sized files. pbdDEMO: examples and detailed vignette. pbdPROF: profiles MPI SPMD code with fpmpi, mpiP, or TAU. nws (NetWorkSpaces) from REvolution Computing, implemented on top of the Twisted networking toolkit for Python 35 35

36 R in parallel: explicit parallelism
snow (Simple Network of Workstations): communitarian abstraction for PVM, MPI, NWS and direct networking sockets snowFT fault-tolerance extensions to snow Snowfall: a more recent alternative to snow. Functions can be used in sequential or parallel mode. foreach: general iteration over elements in a collection No explicit loop counter. loop in parallel: doMC (parallel/multicore on single workstations), doSNOW, doMPI (using Rmpi) and doRedis. future: synchroneous (sequential) and asynchronous (parallel) evaluations via abstraction of futures, either via function calls or implicitly via promises. Global variables are automatically identified. Iteration over elements in a collection is supported. bigrf: Random Forests with parallel execution and large memory. Rborist: OpenMP predictor-level parallelism in the Random Forest algorithm efficient use of multicore hardware in restaging data and in determining splitting criteria, both of which are performance bottlenecks in the algorithm. 36 36

37 R in parallel: GPUs gputools: cudaBayesreg: rgpu: OpenCL
common data-mining algorithms implemented using a mixture of nVidia's CUDA langauge and cublas library. T rpud: optimized distance metric for NVidia-based GPUs. gcbd: benchmarking framework for BLAS and GPUs (using gputools) cudaBayesreg: rhierLinearModel from the bayesm package for high-performance statistical analysis of fMRI voxels. rgpu: bioinformatics analysis by using the GPU. OpenCL interface from R to OpenCL, hardware- and vendor neutral interfaces to GPU programming. WideLM package: use CUDA (4.1 or greater) to fit many 'skinny' regression models simultaneously from a single data set. HiPLARM: High-Performance Linear Algebra for R using multi-core and/or GPU support using the PLASMA / MAGMA libraries from UTK, CUDA, and accelerated BLAS. permGPU: permutation resampling inference for RNA microarray studies on the GPU gmatrix: matrix and vector operations with intermediate computations kept on the coprocessor and reused, to minimize data movement. 37 37

38 R in parallel: parallel backend
Parallel computation depends upon a parallel backend that must be registered before performing the computation: doMC, doSNOW, doPrallel, doMPI Setting the parallel backend require(doMC) Loading required package: doMC Loading required package: foreach Loading required package: iterators Loading required package: parallel # registerDoMC() # Default is 2 cores # registerDoMC(32) # For example, register 32 cores registerDoMC(8) # use 8 cores # getDoParWorkers() # Check number of registered cores # registerDoSEQ() # For sequential, also necessary when # changing # of cores before registerDoMC() getDoParWorkers() 38 38

39 R in parallel: foreach R looping constructs: Loop: foreach:
repeat, while apply, lapply, sapply, eapply, mapply, rapply Loop: for (vector counter) {Statements} foreach: parallel execution on multiple cores or nodes Returns an object (default as a list) > ls(pos="package:foreach") [1] "%:%" "%do%" "%dopar%" [4] "accumulate" "foreach" "getDoParName" [7] "getDoParRegistered" "getDoParVersion" "getDoParWorkers" [10] "getDoSeqName" "getDoSeqRegistered" "getDoSeqVersion" [13] "getDoSeqWorkers" "getErrorIndex" "getErrorValue" [16] "getexports" "getResult" "makeAccum" [19] "registerDoSEQ" "setDoPar" "setDoSeq" [22] "times" "when" 39 39

40 R in parallel: foreach with combine option
library(foreach) x <- foreach(a=1:1000, b=rep(10, 2)) %do% {a + b} m <- matrix(rnorm(9), 3, 3) foreach(i=1:ncol(m), .combine=c) %do% mean(m[,i]) d <- data.frame(x=1:10, y=rnorm(10)) s <- foreach(d=iter(d, by='row'), .combine=rbind) %dopar% d Identical(d, s) library(iterators) a <- matrix(1:16, 4, 4) b <- t(a) foreach(b=iter(b, by='col'), .combine=cbind) %dopar% (a %*% b) qsort <- function(x) { n <- length(x) if (n == 0) { x } else {p <- sample(n, 1) smaller <- foreach(y=x[-p], .combine=c) %:% when(y <= x[p]) %do% y larger <- foreach(y=x[-p], .combine=c) %:% when(y > x[p]) %do% y c(qsort(smaller), x[p], qsort(larger)) }} qsort(runif(12)) 40 40

41 R in parallel: foreach and combine
combine with cbind, +, * c function concatenate all the results cbind() function combines vector, matrix or data frame by columns rbind() function combines vector, matrix or data frame by rows .multicombine=TRUE: more than 2 arguments, .maxcombine=10 : sets maximum arguments, default 100 .inorder: the order in which the arguments are combined, default is TRUE cfun <- function(...) NULL x <- foreach(i=1:4, .combine='cfun', .multicombine=TRUE) %do% rnorm(4) foreach(i = 1:3) %do% sqrt(i) x <- foreach(i=1:3, .combine='c') %do% exp(i) x <- foreach(i=1:4, .combine='cbind') %do% rnorm(4) x 41 41

42 R in parallel: foreach and iterator
iterators vector, list, matrix, data frame, a file or a data base query iterators package Irnorm: returns a specified number of random numbers icount > library(iterators) > x <- foreach(a=irnorm(4, count=4), .combine='cbind') %do% a > x > set.seed(123) > x <- foreach(a=irnorm(4, count=1000), .combine='+') %do% a > x [1] 42 42

43 R in parallel: foreach and parallel execution
%do% changed to %dopar% vector, list, matrix, data frame, a file or a data base query iterators package Irnorm: returns a specified number of random numbers Icount: Returns an iterator that counts starting from one library(iterators) x <- foreach(a=irnorm(4, count=4), .combine='cbind') %do% a x set.seed(123) x <- foreach(a=irnorm(4, count=1000), .combine='+') %do% a x [1] 43 43

44 R in parallel: combine option
x <- iris[which(iris[,5] != "setosa"), c(1,5)] trials < ptime <- system.time({ r <- foreach(icount(trials), .combine=cbind) %dopar% { ind <- sample(100, 100, replace=TRUE) result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit)) coefficients(result1)}})[3] ptime elapsed 5.601 44 44

45 R in parallel: parallel
> ls(pos="package:parallel") [1] "clusterApply" "clusterApplyLB" "clusterCall" [4] "clusterEvalQ" "clusterExport" "clusterMap" [7] "clusterSetRNGStream" "clusterSplit" "detectCores" [10] "makeCluster" "makeForkCluster" "makePSOCKcluster" [13] "mc.reset.stream" "mcaffinity" "mccollect" [16] "mclapply" "mcMap" "mcmapply" [19] "mcparallel" "nextRNGStream" "nextRNGSubStream" [22] "parApply" "parCapply" "parLapply" [25] "parLapplyLB" "parRapply" "parSapply" [28] "parSapplyLB" "pvec" "setDefaultCluster" [31] "splitIndices" "stopCluster" 45 45

46 R in parallel: parallel
parallel, detectCores(), system.time(), lapply(), mclappy() require(parallel, quiet=TRUE) detectCores() [1] 32 n.cores <- detectCores() pause <- function(i) {Sys.sleep(i)} > system.time(lapply(1:10, pause(0.25))) user system elapsed system.time(mclapply(1:10, pause(0.25), mc.cores = n.cores)) 46 46

47 R in parallel: mcparallel()
mcparallel(): forks a R process that evaluates the expression mccollect(): collect results from one or more parallel processes require(parallel, quiet=TRUE) detectCores() [1] 32 n.cores <- detectCores() system.time({ a1 <- mcparallel(1:5) a2 <- mcparallel(1:10) a3 <- mcparallel(1:15) a4 <- mcparallel(1:20) res <- mccollect(list(a1,a2,a3,a4)) }) user system elapsed 47 47

48 R in parallel: parallel backend, doMC
detectCores() [1] 32 getDoParRegistered() [1] FALSE getDoParWorkers() [1] 1 registerDoMC() [1] TRUE [1] 16 registerDoMC(32) 48 48

49 R in parallel: doMC library(doMC) #foreach, iterators, parallel
options(cores = 4) registerDoMC() system.time({data=vector(length=1000); data=foreach(i=1:1000) %dopar% {sqrt(1/(sin(i))^2)-sum(rnorm(10^6))};data=unlist(data) })[] user.self sys.self elapsed user.child sys.child options(cores = 1) 49 49

50 R in parallel: doMC for randomForest
x <- matrix(runif(500), 100) y <- gl(2, 50) library(randomForest) rf <- foreach(ntree=rep(250, 4), .combine=combine) %do% randomForest(x, y, ntree=ntree) rf library(doMC,quiet=TRUE) detectCores() registerDoMC() getDoParRegistered() getDoParWorkers() rf <- foreach(ntree=rep(250, 4), .combine=combine, .packages='randomForest') %dopar% 50 50

51 R in parallel: applyKernel
applyKernel <- function(newX, FUN, d2, d.call, dn.call=NULL, ...) { ans <- vector("list", d2) for(i in 1:d2) { tmp <- FUN(array(newX[,i], d.call, dn.call), ...) if(!is.null(tmp)) ans[[i]] <- tmp} ans} applyKernel(matrix(1:16, 4), mean, 4, 4) foreach(i=1:d2) %dopar% { # iterate over dimensions FUN(array(newX[,i], d.call, dn.call), ...)}} foreach(x=iter(newX,by='col')) %dopar% { # use iter FUN(array(x,d.call, dn.call),...)}} 51 51

52 R in parallel: applyKernel
applyKernel <- function(newX, FUN, d2, d.call, dn.call=NULL, ...) { foreach(x=iblkcol(newX, 3), .combine='c', .packages='foreach') %dopar% { foreach(i=1:ncol(x)) %do% FUN(array(x[,i], d.call, dn.call), ...)}} iblkcol <- function(newX, nc){} applyKernel(matrix(1:16, 4), mean, 4, 4) 52 52

53 R in parallel: Rmpi installation
Rmpi: An interface to use MPI R. (Hao Yu of U. of Western Ontario) with Master/slave paradigm wget tar zxvf Rmpi_0.6-5.tar.gz ml load R/3.2.1/gnu/4.8.2/intel2015mkl/zyzhang ml load openmpi/1.8.7/gcc R CMD INSTALL Rmpi --configure-args="--with-mpi-type=OPENMPI --with-mpi-include=/share/sw/free/openmpi/1.8.7/gcc/4.4/include --with-mpi-libpath=/share/sw/free/openmpi/1.8.7/gcc/4.4/lib“ 53 53

54 R in parallel: Rmpi salloc –N 1 –n 16 –p normal,owners, --time=24:00:00 –exclusive squeue | grep zyzhang ssh –XY ml load R/3.2.5.intel.tcltk ml list library(Rmpi) mpi.spawn.Rslaves() 1 slaves are spawned successfully. 0 failed. master (rank 0, comm 1) of size 2 is running on: sh-1-24 slave1 (rank 1, comm 1) of size 2 is running on: sh-1-24 mpi.comm.size() [1] 2 54 54

55 R in parallel: Rmpi mpi.abort mpi.cart.shift mpi.get.processor.name mpi.parApply mpi.scatterv mpi.allgather mpi.close.Rslaves mpi.get.sourcetag mpi.parCapply mpi.send mpi.allgather.Robj mpi.comm.c2f mpi.hostinfo mpi.parLapply mpi.sendrecv mpi.allgatherv mpi.comm.disconnect mpi.iapply mpi.parMM mpi.sendrecv.replace mpi.allreduce mpi.comm.dup mpi.iapplyLB mpi.parRapply mpi.send.Robj mpi.any.source mpi.comm.free mpi.info.create mpi.parReplicate mpi.setup.rngstream mpi.any.tag mpi.comm.get.parent mpi.info.free mpi.parSapply mpi.spawn.Rslaves mpi.apply mpi.comm.is.null mpi.info.get mpi.parSim mpi.status.maxsize mpi.applyLB mpi.comm.maxsize mpi.info.set mpi.probe mpi.test mpi.barrier mpi.comm.rank mpi.intercomm.merge mpi.proc.null mpi.testall mpi.bcast mpi.comm.remote.size mpi.iparApply mpi.quit mpi.testany mpi.bcast.cmd mpi.comm.set.errhandler mpi.iparCapply mpi.realloc.comm mpi.test.cancelled mpi.bcast.data2slave mpi.comm.size mpi.iparLapply mpi.realloc.request mpi.testsome mpi.bcast.Rfun2slave mpi.comm.spawn mpi.iparMM mpi.realloc.status mpi.universe.size mpi.bcast.Robj mpi.comm.test.inter mpi.iparRapply mpi.recv mpi.wait mpi.bcast.Robj2slave mpi.dims.create mpi.iparReplicate mpi.recv.Robj mpi.waitall mpi.cancel mpi.exit mpi.iparSapply mpi.reduce mpi.waitany mpi.cart.coords mpi.finalize mpi.iprobe mpi.remote.exec mpi.waitsome mpi.cart.create mpi.gather mpi.irecv mpi.request.maxsize mpi.cartdim.get mpi.gather.Robj mpi.isend mpi.scatter mpi.cart.get mpi.gatherv mpi.isend.Robj mpi.scatter.Robj mpi.cart.rank mpi.get.count mpi.is.master mpi.scatter.Robj2slave 55 55

56 R in parallel: Rmpi require(Rmpi) #help(mpi.apply) x=c(1,7:10)
mpi.spawn.Rslaves(nslaves=5) mpi.apply(x,runif) meanx=1:5 meanx mpi.apply(meanx,rnorm,n=2,sd=4) mpi.hostinfo() slave.hostinfo() mpi.comm.size() mpi.comm.rank() paste("I am",mpi.comm.rank(),"of",mpi.comm.size()) mpi.remote.exec(paste("I am", mpi.comm.rank(),"of",mpi.comm.size())) mpi.close.Rslaves() 56 56

57 R in parallel: Rmpi Communicating data/cmd and executing commands
library(Rmpi) mpi.spawn.Rslaves(nslaves=5) mpi.bcast.cmd(id<-mpi.comm.rank()) mpi.bcast.cmd(x<-paste("I am no.", id)) mpi.bcast.cmd(mpi.gather.Robj(x)) x<-"I am a master" mpi.gather.Robj(x) mpi.remote.exec(x) # note the difference between gather and remote execution mpi.close.Rslaves() mpi.quit() 57 57

58 R in parallel: Rmpi Send data/comands and execute commands
mpi.spawn.Rslaves(nslaves=3) x<-c("fruits","apple","banana","orange") mpi.bcast.cmd(x<-mpi.scatter.Robj(x)) x<-mpi.scatter.Robj(x) mpi.remote.exec(x); x mpi.bcast.cmd(x<-mpi.allgather.Robj(x)); x<-mpi.allgather.Robj(x) mpi.close.Rslaves() 58 58

59 R in parallel: Rmpi Communicating data/cmd and executing commands
mpi.spawn.Rslaves(nslaves=3) red<-function(option="sum"){ mpi.reduce(x,type=2,op=option)} mpi.bcast.Robj2slave(red) x<-c(1,2,3,4) mpi.bcast.cmd(x<-mpi.scatter.Robj(x)); x<-mpi.scatter.Robj(x) mpi.remote.exec(x); x mpi.remote.exec(red("sum")); mpi.reduce(x,2,"sum") x mpi.close.Rslaves() 59 59

60 R in parallel: Rmpi Communicating data/cmd and executing commands
mpi.send(x,type,dest,tag, comm) mpi.recv(x,type,source,tag,comm) mpi.send.Rojb(obj,dest,tag,comm=1) mpi.recv.Robj(source,tag,comm=1,status=0) mpi.isend() mpi.irecv() mpi.wait() mpi.barrier() library(Rmpi) mpi.spawn.Rslaves(nslaves=3) srecv<-function(){ if(mpi.comm.rank()==2) x<-mpi.recv(x,1,0,1,1)} mpi.bcast.Robj2slave(srecv) x<-as.integer(21.34) x mpi.send(x,1,2,1,1) mpi.bcast.cmd(x<-integer(1)) mpi.remote.exec(x) mpi.close.Rslaves() 60 60

61 R in parallel: Rmpi slurm.Rmpi #!/bin/bash #SBATCH --nodes=2
#SBATCH --tasks-per-node=2 #SBATCH --time=10:0:0 #SBATCH -p normal,owners module load R/3.2.5.intel.tcltk # use "-np 1" to start only one instance of R. The MPI tasks will be started internally. mpirun -np 1 Rscript Rmpi.R Rmpi.R library(Rmpi) # initialize an Rmpi environment ns <- mpi.universe.size() - 1 mpi.spawn.Rslaves(nslaves=ns) # send these commands to the slaves mpi.bcast.cmd( id <- mpi.comm.rank() ) mpi.bcast.cmd( ns <- mpi.comm.size() ) mpi.bcast.cmd( host <- mpi.get.processor.name() ) # all slaves execute this command mpi.remote.exec(paste("I am", id, "of", ns, "running on", host)) # close down the Rmpi environment mpi.close.Rslaves(dellog = FALSE) mpi.exit() 61 61

62 R in parallel: Rmpi rmpi-rnorm.R library(Rmpi)
# initialize an Rmpi environment ns <- mpi.universe.size() - 1 mpi.spawn.Rslaves(nslaves=ns) #Tell all slaves to return a message identifying themselves. mpi.remote.exec(paste("I am",mpi.comm.rank(),"of",mpi.comm.size())) mpi.remote.exec(paste(mpi.comm.get.parent())) #Send execution commands to the slaves x<-5 x<-mpi.remote.exec(rnorm,x) length(x) x #Use mpi.apply instead to generate a huge increasing list of rnorms (41 in total). # x<-mpi.apply(seq(20,800,20),rnorm) x<-mpi.apply(seq(20,800,length.out = ns),rnorm) # close down the Rmpi environment mpi.close.Rslaves(dellog = FALSE) mpi.exit() 62 62

63 R in parallel: doMPI doMPI package is a “parallel backend” for the foreach package. doMPI package acts as an adaptor to the Rmpi package. The Rmpi package is a R interface to an implementation of MPI. Rmpi package defines about 110 R functions, only a few of which are only for internal use. MPI standard includes many more functions that aren’t supported by Rmpi, such as the file functions. A small percentage of those functions are needed in order to start using MPI effectively. MPI(Message Passing Interface) API for passing messages between different computers. The foreach package provides a parallel for-loop construct for R. doMPI: ease of use of parallel for-loops efficiency of MPI. 63 63

64 R in parallel: Rmpi functions
mpi.send.Robj(object,destination,tag) object <- mpi.recv.Robj(mpi.any.source(),mpi.any.tag()) info <- mpi.get.sourcetag() mpi.bcast.Robj2slave() mpi.bcast.cmd(R code) results <- mpi.remote.exec(R code) 64 64

65 R in parallel: doMPI > library(doMPI)
Loading required package: foreach foreach: simple, scalable parallel programming from Revolution Analytics Use Revolution R for scalability, fault tolerance and more. Loading required package: iterators Loading required package: Rmpi > cl <- startMPIcluster(count=2) > registerDoMPI(cl) > closeCluster(cl) > mpi.finalize() [1] "Exiting Rmpi. Rmpi cannot be used unless relaunching R." > mpi.quit() 65 65

66 R in parallel: doMPI x <- foreach(i=1:3, .combine="c") %dopar% sqrt(i) x <- foreach(seed=c(7, 11, 13), .combine="cbind") %dopar% { set.seed(seed) ; rnorm(3) } x <- seq(-8, 8, by=0.5) v <- foreach(y=x, .combine="cbind") %dopar% { r <- sqrt(x^2 + y^2) +.Machine$double.eps ; sin(r) / r } persp(x, x, v) sinc <- function(y) {r <- sqrt(x^2 + y^2) + .Machine$double.eps; sin(r)/r} r <- lapply(x, sinc) v <- do.call("cbind", r) 66 66

67 R in parallel: doMPI/foreach/.csv
ifiles <- list.files(pattern="\\.csv$") ofiles <- sub("\\.csv$", ".png", ifiles) foreach(i=ifiles, o=ofiles, .packages="randomForest") %dopar% { d <- read.csv(i) rf <- randomForest(Species~., data=d, proximity=TRUE) png(filename=o) MDSplot(rf, d$Species) dev.off() NULL } 67 67

68 R in parallel: Rmpi message.pass <- function() {
# Get each slave's rank myrank <- mpi.comm.rank() # Get partner slave's rank (some hackery to avoid master) otherrank <- (myrank+1) %% mpi.comm.size() otherrank <- otherrank + (otherrank==0) # Send a message to the partner mpi.send.Robj(paste("I am rank",myrank), dest=otherrank, tag=myrank) # Receive the message & tag (includes source) recv.msg <- mpi.recv.Robj(mpi.any.source(),mpi.any.tag()) recv.tag <- mpi.get.sourcetag() paste("Received message '",recv.msg,"' from process ",recv.tag[1],". \n",sep="") } 68 68

69 R in parallel: Rmpi and snow
SNOW, Simple Network of Workstations for embarrassedly parallel jobs Simple interface that can be built on top of Rmpi or sockets (we use Rmpi) Functionality: clusterCall, clusterApply, clusterSplit, parRapply, parCapply, parLapply, parSapply, parMM SNOW example: Initialize the cluster with a call to makeMPICluster Do work and execute functions on all nodes by using snow commands Terminate the mpi environment with the StopCluster command 69 69

70 R in parallel: Rmpi and snow
> ls(pos = "package:snow") [1] "[.cluster" "addClusterOptions" [3] "checkCluster" "checkForRemoteErrors" [5] "closeNode" "closeNode.default" [7] "closeNode.NWSnode" "closeNode.SOCKnode" [9] "clusterApply" "clusterApplyLB" [11] "clusterCall" "clusterEvalQ" [13] "clusterExport" "clusterMap" [15] "clusterSetupRNG" "clusterSetupRNGstream" [17] "clusterSetupSPRNG" "clusterSplit" [19] "defaultClusterOptions" "docall" …… [65] "sinkWorkerOutput" "slaveLoop" [67] "snow.time" "splitCols" [69] "splitIndices" "splitList" [71] "splitRows" "staticClusterApply" [73] "stopCluster" "stopCluster.default" [75] "stopCluster.MPIcluster" "stopCluster.NWScluster" [77] "stopCluster.spawnedMPIcluster" "stopNode" 70 70

71 R in parallel: Rmpi and snow
SNOW example: cl <- makeSOCKcluster(c("localhost","localhost")) clusterApply(cl, 1:2, get("+"), 3) clusterEvalQ(cl, library(boot)) x<-1,clusterExport(cl, "x") clusterCall(cl, function(y) x + y, 2) library(Rmpi); library(snow); cluster <- makeMPIcluster(4) sayhello <- function() { info <- Sys.info()[c("nodename", "machine")] paste("Hello from", info[1], "with CPU type", info[2]) } names <- clusterCall(cluster, sayhello) print(unlist(names)) parallelSum <- function(m, n){ A <- matrix(rnorm(m*n), nrow = m, ncol = n) row.sums <- parApply(cluster, A, 1, sum) print(sum(row.sums))} parallelSum(500, 500) stopCluster(cluster) 71 71

72 R in parallel: Rmpi and snow
library(snow); library(Rmpi) nclus=6; cl <-makeMPIcluster(nclus) n=2000; mc= mcperclus=round(mc/nclus) x=matrix(runif(n),n,1) x=cbind(1,x) #function of 2xn matrix x, n, nmc: number of columns per core estimatebetas=function(x,n,nmc){ e=matrix(rnorm(n*nmc),n,nmc) y=1+rep(x[,2],nmc)+e solve(t(x)%*%x)%*%t(x)%*%y } ptim=proc.time()[3] b=clusterCall(cl,estimatebetas,x=x,n=n,nmc=mcperclus) b=cbind(b[[1]],b[[2]],b[[3]],b[[4]],b[[5]],b[[6]]) tim=proc.time()[3]-ptim cat(dim(b)," ",apply(b,1,mean),"\n") cat(mc," iterations with sample sizes of ",n," took ",tim,"seconds.\n") stopCluster(cl) 72 72

73 R in parallel: Rmpi svd, initalization
library(Matrix) library(irlba) library("Rmpi") nslvs <- 15 mpi.spawn.Rslaves(nslaves=nslvs) if (mpi.comm.size() < 2) { print("More slave processes are required.") mpi.quit()} .Last <- function(){ if (is.loaded("mpi_initialize")){ if (mpi.comm.size(1) > 0){ print("Please use mpi.close.Rslaves() to close slaves.") mpi.close.Rslaves() } print("Please use mpi.quit() to quit R") .Call("mpi_finalize")}} # Assumes: thedata,n,num.sv 73 73

74 R in parallel: Rmpi svd data
n=500 # dim of matrices num.sv = n/2 # number of singular values to find m = 20 # number of matrices set.seed(1234) rand1 <- runif(m*n); rand2 <- runif(m*n) x1 <- apply(cbind(rand1,rand2),1,min) x2 <- abs(rand1-rand2) x3 <- 1-x1-x2 # Sample 3 column indices from vector 1:n for each set of weights (m*n) col.ind.matrix <- sapply(1:(m*n),function(i) sample(1:n,size=3)) col.ind.asvector <- matrix(col.ind.matrix,nrow=(3*m*n)) row.ind.matrix <- sapply(1:n,function(x) rep(x,3)) # The row indices for one matrix (dim n x 1) row.ind.asvector <- matrix(row.ind.matrix,nrow=(3*n)) weights.asvector <- matrix(rbind(x1,x2,x3),nrow=(3*m*n)) thedata = data.frame(weights=weights.asvector,row=row.ind.asvector,col=col.ind.asvector) 74 74

75 R in parallel: Rmpi svd, slave function
partialsvd <- function() {# sent: 1=ready_for_task, 2=done_task, 3=exiting received: 1=task, 2=done_tasks junk <- 0; done <- 0 while (done != 1) {# Signal being ready to receive a new task mpi.send.Robj(junk,0,1) task <- mpi.recv.Robj(mpi.any.source(),mpi.any.tag()) # Receive a task task_info <- mpi.get.sourcetag() tag <- task_info[2] if (tag == 1) { matrixNumber <- task$matrixNumber # Construct the corresponding sparse matrix col.ind.temp = thedata$col[(3*n*(matrixNumber-1)+1):(3*n*matrixNumber)] weight.ind.temp = thedata$weights[(3*n*(matrixNumber-1)+1):(3*n*matrixNumber)] # Generate row indices row.ind.temp = matrix(sapply(1:n,function(x) rep(x,3)),nrow=(3*n)) weight.mat= sparseMatrix(i=row.ind.temp,j=col.ind.temp,x=weight.ind.temp) result.svd = irlba(weight.mat,nu=num.sv) # Send a results message back to the master results <- list(result=result.svd,matrixNumber=matrixNumber) mpi.send.Robj(results,0,2)} else if (tag == 2) {done <- 1 } # We'll just ignore any unknown messages } mpi.send.Robj(junk,0,3)} 75 75

76 R in parallel: Rmpi svd start slaves
mpi.bcast.Robj2slave(thedata) mpi.bcast.Robj2slave(n) mpi.bcast.Robj2slave(num.sv) mpi.bcast.cmd(library("Matrix")) # for using sparse matrices mpi.bcast.cmd(library("irlba")) # for fast computation of singular values mpi.bcast.Robj2slave(partialsvd) # Send the function to the slaves mpi.bcast.cmd(partialsvd()) # Call the function in all the slaves # Create task list tasks <- vector('list') for (i in 1:m) { tasks[[i]] <- list(matrixNumber=i) } # Create data structures to store the results svd.result = list() weightMat = list() junk <- 0 closed_slaves <- 0 n_slaves <- mpi.comm.size()-1 t1.mpi <- proc.time() 76 76

77 R in parallel: Rmpi svd send tasks
while (closed_slaves < n_slaves) { # Receive a message from a slave message <- mpi.recv.Robj(mpi.any.source(),mpi.any.tag()) message_info <- mpi.get.sourcetag() slave_id <- message_info[1] ; tag <- message_info[2] slave_id; tag if (tag == 1) {# slave ready for a task. Give it the next task, or tell it tasks are done. if (length(tasks) > 0) {# Send a task, and then remove it from the task list mpi.send.Robj(tasks[[1]], slave_id, 1); tasks[[1]] <- NULL } else {mpi.send.Robj(junk, slave_id, 2) } } else if (tag == 2) {# Do something with the results. Store in the data structure matrixNumber <- message$matrixNumber svd.result[[matrixNumber]] <- message$result } else if (tag == 3) {closed_slaves <- closed_slaves + 1 } closed_slaves n_slave} 77 77

78 R in parallel: Rmpi svd results
mpi.close.Rslaves() t.mpi <- proc.time()-t1.mpi # system time for mpi implementation # Extract the singular values singular.vals = sapply(svd.result,with,d) # Plot the approximate singular values found for the first 5 matrices x = 1:(n/2) matrixNumber = 1 plot(x=x,y=singular.vals[,1],ylab="approx. singular values", main=paste(n/2,"largest approximate singular values of matrices 1 to 5")) for (matrixNumber in 2:5){ points(x=singular.vals[,matrixNumber],col=matrixNumber) } 78 78

79 R in parallel: Rmpi slurm batch job
/scratch/users/zyzhang/R-runs/Tutorial/Rmpi]$ more slurm.Rmpi #!/bin/bash #SBATCH --job-name=Rmpi #SBATCH --output=Rmpi.o #SBATCH --error=Rmpi.e #SBATCH --time=4:00:00 #SBATCH --qos=normal ##SBATCH --exclusive #SBATCH --mem=60000 ##SBATCH -p normal #SBATCH --nodes=2 #SBATCH --ntasks-per-node=2 #SBATCH –reservation=R_workshop ##SBATCH --export=All module purge ml load R/3.2.5.intel.tcltk mpirun -np 1 R CMD BATCH test1.R /scratch/users/zyzhang/R-runs/Tutorial/Rmpi]$ sbatch slurm.Rmpi 79 79

80 R in parallel: Rmpi example
31 slaves are spawned successfully. 0 failed. master (rank 0 , comm 1) of size 32 is running on: sh-5-11 slave1 (rank 1 , comm 1) of size 32 is running on: sh-5-11 slave2 (rank 2 , comm 1) of size 32 is running on: sh-5-11 slave3 (rank 3 , comm 1) of size 32 is running on: sh-5-11 slave30 (rank 30, comm 1) of size 32 is running on: sh-5-13 slave31 (rank 31, comm 1) of size 32 is running on: sh-5-13 > proc.time() user system elapsed 80 80

81 Managing software packages: Modules
export MODULEPATH=/share/sw/modules/all:$MODULEPATH export MODULEPATH=/scratch/users/zyzhang/sw/modules:$MODULEPATH module load module list module avail module purge module replace module show module show allinea /share/sw/modules/Core/allinea/5.0: whatis("latest version of Allinea debugging tools ") prepend_path("PATH","/share/sw/licensed/allinea/forge-5.0/bin") prepend_path("PATH","/share/sw/licensed/allinea/reports-5.0/bin") prepend_path("LD_LIBRARY_PATH","/share/sw/licensed/allinea/forge-5.0/lib") prepend_path("LD_LIBRARY_PATH","/share/sw/licensed/allinea/reports-5.0/lib") help([[ Allinea Unified (DDT, MAP) and reports ddt, map check out the doc directory for the user guide ]]) 81 81

82 Installation examples: gdal for R/gromacs
Installation of gdal required by rgal for R: module load intel/13sp1up1 cd /scratch/users/zyzhang svn checkout gdal cd gdal which icc /share/sw/licensed/intel/composer_xe_2013_sp /bin/intel64/icc ./configure --prefix=/scratch/users/zyzhang/gdal/ CC=icc CXX=icpc make >& make.log & make install >& makeinstall.log & echo $MODULEPATH /scratch/users/zyzhang/sw/modules:/share/sw/modules/all:/share/sw/modules/Core 82 82

83 R installation: an example
module load R/3.1.0.m gdal/1.11 geos/3.4.2 proj.4/4.9.1 -bash-4.1$ R > require(maptools) Loading required package: maptools Checking rgeos availability: TRUE > require(rgdal) Loading required package: rgdal rgdal: version: 0.9-3, (SVN revision 530) Geospatial Data Abstraction Library extensions to R successfully loaded Loaded GDAL runtime: GDAL , released 2015/02/10 Path to GDAL shared files: /hsgs/software/gdal/1.11.2/share/gdal Loaded PROJ.4 runtime: Rel , 04 March 2015, [PJ_VERSION: 491] Path to PROJ.4 shared files: (autodetected) Linking to sp version: 1.1-0 > install.packages("randomForest", repos = getOption("repos")) 83 83


Download ppt "Harnessing the Power of High Performance Computing for R"

Similar presentations


Ads by Google