Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDM Center Advancing, integrating and deploying efficient statistical computing to high- throughput scientific applications Nagiza F. Samatova (ORNL) Srikanth.

Similar presentations


Presentation on theme: "SDM Center Advancing, integrating and deploying efficient statistical computing to high- throughput scientific applications Nagiza F. Samatova (ORNL) Srikanth."— Presentation transcript:

1 SDM Center Advancing, integrating and deploying efficient statistical computing to high- throughput scientific applications Nagiza F. Samatova (ORNL) Srikanth Yoginath (ORNL) Guruprasad Kora (ORNL) Chongle Pan (UTK/ORNL) SDM NC State October 5-7, 2005 Contact: Nagiza Samatova,

2 SDM Center Outline About Parallel R (pR) Motivation About R and its parallelization efforts Task and data parallelism with Parallel R (pR) Initial/Future Deployment across Applications GIS data analysis with GRASS and pR Clustered Climate Regimes using pR Quantitative Proteomics in Biology using pR High Energy Physics (HEP) with ROOT, R and pR Towards SDM Components Integration Kepler scenario with pR CCA scenario with pR Web Services scenario with pR Future Extensions Generic Parallel Agent for ease of pR extension Task Parallel Engine optimized for data-intensive analyses

3 SDM Center Tera-(Flop & Byte) Analyses Could Be Routine for Scientific Applications But… Hits Algorithmic Complexity: Calculate means O(n) Calculate FFT O(n log(n)) Calculate PCA O(r c) Hierarchical clust. O(n 2 ) Algorithmic Complexity: Calculate means O(n) Calculate FFT O(n log(n)) Calculate PCA O(r c) Hierarchical clust. O(n 2 ) Climate Now: 20-40TB per simulated year 5 yrs: 100TB/yr 5-10PB/yr Astrophysics Now and 5 yrs: Can soak up anything! Fusion Now: 100Mbytes/15min 5 yrs: 1000Mbytes/2 min 1Tflop/sec

4 SDM Center Statistical Computing with R About R (http://www.r-project.org/):http://www.r-project.org/ R is an Open Source (GPL), most widely used programming environment for statistical analysis and graphics; similar to S. Provides good support for both users and developers. Highly extensible via dynamically loadable add-on packages. Originally developed by Robert Gentleman and Ross Ihaka. Towards Enabling Parallel Computing in R: > … > dyn.load( “foo.so”) >.C( “foobar” ) > dyn.unload( “foo.so” ) > … > dyn.load( “foo.so”) >.C( “foobar” ) > dyn.unload( “foo.so” ) > library(mva) > pca <- prcomp(data) > summary(pca) > library(mva) > pca <- prcomp(data) > summary(pca) Rmpi (Hao Yu): R interface to LAM-MPI. rpvm (Na Li and Tony Rossini): R interface to PVM; requires knowledge of parallel programming. snow (Luke Tierney): general API on top of message passing routines to provide high-level (parallel apply) commands; mostly demonstrated for embarrassingly parallel applications. > library (rpvm) >.PVM.start.pvmd () >.PVM.addhosts (...) >.PVM.config () > library (rpvm) >.PVM.start.pvmd () >.PVM.addhosts (...) >.PVM.config () snow API

5 SDM Center Motivation behind Parallel R (pR) Ideal Programming Requirements: Be able to use existing high level (i.e. R) code Require minimal extra efforts for parallelizing Have Identical/similar (presumably easy-to-use) interface to R’s Be able to test codes in sequential settings Provide efficient and scalable (in terms of problem size and number of processors) performance

6 SDM Center Providing Task and Data Parallelism in pR Likelihood Maximization. Re-sampling schemes: Bootstrap, Jackknife, etc. Animations Markov Chain Monte Carlo (MCMC). – Multiple chains. – Simulated Tempering: running parallel chains at different “temperature“ to improve mixing. Task-parallel analyses: k-means clustering Principal Component Analysis (PCA) Hierarchical (model-based) clustering Distance matrix, histogram, etc. computations Data-parallel analyses:

7 SDM Center Parallel R (pR) Distribution Releases History: pR enables both data and task parallelism (includes task-pR and RScaLAPACK) (version 1.8.1) RScaLAPACK provides R interface to ScaLAPACK with its scalability in terms of problem size and number of processors using data parallelism (release 0.5.1) task-pR achieves parallelism by performing out-of- order execution of tasks. With its intelligent scheduling mechanism it attains significant gain in execution times (release 0.2.7) pMatrix provides a parallel platform to perform major matrix operations in parallel using ScaLAPACK and PBLAS Level II & III routines Also: Available for download from R’s CRAN web site (www.R-Project.org) with 37 mirror sites in 20 countrieswww.R-Project.org

8 SDM Center About GRASS (grass.itc.it) GRASS (Geographic Resources Analysis Support System) is a raster/vector GIS, image processing system, and graphics production. GRASS contains over 350 programs and tools to render maps and images on monitor and paper; manipulate raster, vector, and sites data; process multi spectral image data; create, manage, and store spatial data. It is Free (Libre) Software/Open Source released under GNU GPL. About GRASS (grass.itc.it) GRASS (Geographic Resources Analysis Support System) is a raster/vector GIS, image processing system, and graphics production. GRASS contains over 350 programs and tools to render maps and images on monitor and paper; manipulate raster, vector, and sites data; process multi spectral image data; create, manage, and store spatial data. It is Free (Libre) Software/Open Source released under GNU GPL.

9 SDM Center

10 SDM Center Subtract background noise from data Generate Covariance Chromatogram Apply Savitzky-Golay Smoother Calculate cut-off for search Find Window with Max. SN ratio …..

11 SDM Center Ratio Calculations with Parallel R ::::::::::: chroList<-list.files(pattern="*.chro"); cat ("Chro", "samSN", "refSN", "PPCSN", "HR", "PCA", "PCASN", file="Pratio-Peptide.txt"); for (i in 1:length(chroList)) { currResult [i]  Rratio(filename=chroList[i]); } for (i in 1:length(chroList)) { cat (chroList[i], currResult$samSN, … file="Pratio-Peptide.txt"); } ::::::::::: chroList<-list.files(pattern="*.chro"); cat ("Chro", "samSN", "refSN", "PPCSN", "HR", "PCA", "PCASN", file="Pratio-Peptide.txt"); for (i in 1:length(chroList)) { currResult [i]  Rratio(filename=chroList[i]); } for (i in 1:length(chroList)) { cat (chroList[i], currResult$samSN, … file="Pratio-Peptide.txt"); } ::::::::::: Serial Version: ::::::: chroList<-list.files(pattern="*.chro"); cat ("Chro", "samSN", "refSN", "PPCSN", "HR", "PCA", "PCASN", file="Pratio-Peptide.txt"); PE ( for (i in 1:length(chroList)) { currResult [i]  Pratio(filename=chroList[i]); } ) for (i in 1:length(chroList)) { cat (chroList[i], currResult$samSN, … file="Pratio-Peptide.txt"); } ::::::::::::: ::::::: chroList<-list.files(pattern="*.chro"); cat ("Chro", "samSN", "refSN", "PPCSN", "HR", "PCA", "PCASN", file="Pratio-Peptide.txt"); PE ( for (i in 1:length(chroList)) { currResult [i]  Pratio(filename=chroList[i]); } ) for (i in 1:length(chroList)) { cat (chroList[i], currResult$samSN, … file="Pratio-Peptide.txt"); } ::::::::::::: Parallel Version:

12 SDM Center Initial Feedback is Positive!

13 SDM Center R in High Energy Physics (HEP, Root) (Adam Lyon,Fermi Lab) library(lattice) d=read.table("data.dat",head=T) w=data[data$event=="OpenFile",] w$min = w$dur/60.0 bwPlot( fromStation ~ min | station, data=w, subset=(min<60), xlab= "Minutes",main="Wait Time for …" ) modpollux=nls(speed ~ alpha*(1- alpha*beta/(alpha*bet a+mb)), data= client[pollux,], start=c(alpha=2.0, beta=0.50), trace=T)

14 SDM Center R in High Energy Physics (HEP, Root) In HEP, we typically run successive skims to reduce the data size (601 TB down to 100s of Meg or a few Gig) R seems to want to hold everything in memory If data is small enough, bring it into R If can reduce data to something R can hold, bring that subset of data into R -- have the full power of R If can't even do above, then have some R apparatus to read in data one row at a time and update an R object (e.g. histograms) [But you don't get the full power of R] Rprof significantly sped up fit (replace ^) Lessons learned: We explore using R, a statistical analysis package from the statistics community, in an HEP enviornment R has already proven useful for analyzing monitoring and benchmarking data We have ideas on how R can be used to read large datasets We've done some "proof of principle" studies of physics analysis with R As we learn more about R, we expect to be more surprised at its capabilities (Adam Lyon,Fermi Lab)

15 SDM Center Options for R and Root Interfacing no interest from R community in non-I/O functions of Root In order of work required : 1) R and Root remain separate-- use the more appropriate tool for the task. Use text files to communicate between the two if necessary. 2) Root loads R's math and low level statistical libraries as shared objects Minimalist approach for some functionality Some access to the math and statistics C code functions from R These C functions take basic C types, so no translation necessary But: no upper level functions written in the R language available 3) R and Root remain separate, but: R package to read Root Trees directly into R data frames. Still use best tool for particular task Now easier to get HEP data into R 4) Allow calling of selected high level R functions from within Root Root runs the R interpreter translation is necessary R functions: understand Root objects Root: understand R return objects Expose only some R functions may reduce amount of translation (Adam Lyon,Fermi Lab)

16 SDM Center More Advanced Integration Options 5) R prompt from the Root prompt R needs seamless knowledge of objects in current Root session At end of R session, new R variables translated into Root objects Root runs the R interpreter Translation for all types of Root variables into R and all types of R variables returned to Root. A major undertaking 6) Root prompt from within R Harder than 5: R is C but Root is C++ I don't see much interest in this Things get interesting starting at 3) I have a version prototype for reading Root trees into R. Required for all options above 3. I’ll try to work on this as time permits Both Root and R interface to Python Translate with Python as intermediary? Not sure if that's performant enough (Adam Lyon,Fermi Lab)

17 SDM Center KEPLER/R/pR Integration RScript “X  read.table(DataFileName) library(RScaLAPACK) a  sva.prcomp(X) summary(a) plot(a$loadings) RScript “X  read.table(DataFileName) library(RScaLAPACK) a  sva.prcomp(X) summary(a) plot(a$loadings) RExpression actor by Dan Higgins, Kepler/SEEK … process=runtime.exec(Command); System.out.println(process) … process=runtime.exec(Command); System.out.println(process) … Kepler has a mathematical operations actor RExpression to virtually access/plot any function/graphs from R/pR.

18 SDM Center CCA-based Component Integration Might be used if there is a need: To effectively develop/deploy/reuse parallel R (pR). To expose the power of pR and R to the outside world. To easily access task and data parallel modules of pR by applications written in different languages. To integrate pR modules with other technologies of SDM center (e.g. SciRUN2). To provide a usage model that hides implementation details. To have plug-n-play flexibility in deployment. CCA Common Component Architecture

19 SDM Center algorithms.sidl driver.sidl Babel Compiler Driver_SVD_Impl SVDFunction_Impl libsvd.so Babel Runtime R Library 1.Define component’s public interface in SIDL. 2.Generate native language stub code using Babel 3.Add in function and driver implementation. 4.Compile and link with Babel and R libraries runtime libraries. Four Easy Steps: Componentizing Singular Value Decomposition function of pR/R Lots of THANKS to Jim Kohl & Wael Elwasif for their help!!!

20 SDM Center algorithms.sidl driver.sidl package algorithms version 1.0 { class SVDFunction implements function.FunctionPort, gov.cca.Component { void rsvd( in int iSize ); // gov.cca.Component methods: void setServices(in gov.cca.Services servicesHandle) throws gov.cca.CCAException; } package drivers version 1.0 { class SVDDriver implements gov.cca.ports.GoPort, gov.cca.Component { int go(); void setServices(in gov.cca.Services services) throws gov.cca.CCAException; } 1. Create SVD’s interface in SIDL

21 SDM Center Babel Compiler babel –s C algorithms.sidl babel –s C driver.sidl …/* Driver_SVD_Impl */ struct algorithms_SVD__data *psvd; /* Access private data structure */ pd = algorithms_SVD__get_data(self); … algorithms_SVDPort function; /* Get the function pointer. */ … iReturn = function->rsvd(i); … 2-3. Generate Native Language Stub Code using Babel & Add Implementations.../* SVDFunction - rsvd() */ /* postscript() */ PROTECT(fun = Rf_findFun(Rf_install("postscript"), R_GlobalEnv)); eval(e, R_GlobalEnv);... /* svd(rnorm(iSize),1,1) */ fun = Rf_findFun(Rf_install("svd"), R_GlobalEnv);... /* svd( 1:10, 1, 1 )*/ ret = eval(e, R_GlobalEnv);... fun = Rf_findFun(Rf_install("plot"), R_GlobalEnv);... /* plot( svd$u )*/ eval(e, R_GlobalEnv); No R scripts – Direct C  Efficiency!

22 SDM Center SVDFunction SVDPort GoPort Driver Ccaffeine Ready for Deployment of R’s SVD!

23 SDM Center Web Service based Integration of pR Request- Response ports defined by WSDL Task-Parallel job Parallel-solve Parallel-kmeans Parallel-R [pR] Service Task-pR RScaLAPACK Other parallel libraries through Generic PA Internet Custom or Native R Interface Biology Custom or Native R Interface Climate Custom or Native R Interface Fusion Might be used if there is a need: To provide easy access to HP parallel statistical computations. To make pR a loosely coupled component that can be dynamically composed. To initiate & control parallel analyses from a less powerful machine.

24 SDM Center SVD.wsdl Java Compiler SVDService_impl libsvdServer.so Axis Runtime Library R Library 1.Create SVD’s interface in WSDL 2.Develop and deploy the SVD Service Object (libsvdServer.so). 3.Develop the SVD Client Interface (libSVDClient.so). Three Major Stages: Enabling Parallel R as a Web Service using Apache-AxisC++: SVD Example libsvdClient.so RSVDInterface_Impl abccc

25 SDM Center - - - 1. Create SVD’s interface in WSDL

26 SDM Center java org.apache.axis.wsdl.wsdl2ws.WSDL2Ws -lc++ -sserver –oserver./SVD.wsdl gcc –shared –o libsvdServer.so *.cpp -lR -L$AXISCPP_DEPLOY/lib SVDService.cpp … xsd__double SVDService: :GetSVD(xsd__double Value[]) { ……. /* svd(rnorm(iSize),1,1) */ fun = Rf_findFun(Rf_install("svd"), R_GlobalEnv);... /* svd( 1:10, 1, 1 )*/ ret = eval(e, R_GlobalEnv);... return (svd$d) } AxisServiceException.cpp AxisServiceException.h SVDService.h Server related cpp and corresponding header files libsvdServer.so SVD.wsdl 2. Develop the SVD Service Object Comments: WSDL file is processed to create cpp files. The cpp file functions are modified to compute R svd on the input data. A shared library libsvdServer.so is built using gcc. This library is used by Apache-axis to service the user’s request. abcc

27 SDM Center java org.apache.axis.wsdl.wsdl2ws.WSDL2Ws -lc++ -sclient –oclient./SVD.wsdl gcc –shared –o libsvdClient.so *.cpp -lR -L$AXISCPP_DEPLOY/lib RSVDInterface.cpp …. SEXP RSVDInterface(SEXP InputVector) { SVDService srv; SEXP retval; int size; …. PROTECT(allocVector(REALSXP,size); …. retVal =srv.GetSVD(InputVector) ; UNPROTECT(1); return retVal; } SVDService.h Server related cpp and corresponding header files libsvdClient.so SVD.wsdl 2. Develop the SVD Client Interface Comments: WSDL file is processed to create cpp files A RSVDInterface.cpp file is created to get the input and service detail from R and request for required service. A shared library libsvdClient.so is built using gcc. This library is loaded into R environment. The service is requested through R interface. abcc SVDService.cpp

28 SDM Center pR Service: Above XML statements are added in axis “server.wsdd” file. It specifies axis to deploy a service called SVDService and accept calls for method GetSVD R Client: Meanwhile the R user Loads the libSVDClient.so into his environment Calls the higher-level R-function to access the SVDService. N-tier Architecture for pR-Service and R-Client Interaction libSVDClient.so > dyn.load(libSVDClient.so) > RSVDInterface(x) R-Client Environment Axis-WebService Module libSVDServer.so APACHE Web Server Results 4 InputVector pR Service

29 SDM Center Extensibility of Parallel R (pR) R Environment R Environment Parallel Agent Parallel Agent Third Party Parallel Codes ScaLAPACK Matrix Parallel k-means Alok’s Data Mining RScaLAPACK pMatrix pAlok Define R function parameters & returns Map R functions to defined function interfaces Define the function interfaces Set parallel environment limits for your functions Define data distribution function (Optional) Convert your MPI/PVM routine(s) into a set of functions. Create a shared library of your functions. Place it in a predefined location. C/Fortran MPI Robject

30 SDM Center Generic Parallel Agent for Ease of Extension Features: It is an interface that allows third party MPI routines to be called from within ParallelR environment. Users can readily plug-in and use their MPI codes along with ParallelR provided routines. It is MPI implementation independent. No knowledge of R or ParallelR is required to get MPI codes included in to ParallelR. A simple XML schema for generating interface definitions. Different classes of MPI routines can be included in to the ParallelR environment. Status: Completed the design phase Goal: To provide a framework to easily add different parallel MPI algorithms to the R-environment.

31 SDM Center Task Parallel Engine Optimized for Data-Intensive Analyses Tasks Jointly with Dr. Xiaosong Ma (NC State) and her student, Jain Li Current Progress: Research phase, semi-complete design document Goal: To provide a generic task-parallel engine that would be capable of optimizied tasks scheduling, monitoring, and resource mapping.

32 SDM Center Parallelization and Scheduling Parallelize independent tasks – task parallelism Task itself can be parallelized – data parallelism Overall goal – Given a system, minimize parallel R’s execution time Task 1 Task 2 Task 3 Task 4 Task 5

33 SDM Center Problems and Challenges To detect independent tasks Data dependency analysis (NP-hard) To schedule independent tasks Granularity Load balance Data locality To perform resource allocation Data locality Minimize communications or I/O master worker 2 worker 1 worker n assign collect comm Scheduling & Resource allocation problem (NP-hard): Given a sequence of tasks t 1, t 2, …, t n, each task has data size d i and requires c i computation time and may require more than one processor. There is network latency l i to transfer data size d i and overhead o i to initialize the transfer. Our objective is to minimize total execution time of tasks sequence while preserving dependence relations and subject to P processors available.

34 SDM Center Integrated Data & Task Parallelism To parallelize a single task, use third-party package (e.g., SUIF, Polaris) Task scheduling subject to limited resources, e.g., worker processors Need to estimate task runtime Performance database Runtime performance monitoring


Download ppt "SDM Center Advancing, integrating and deploying efficient statistical computing to high- throughput scientific applications Nagiza F. Samatova (ORNL) Srikanth."

Similar presentations


Ads by Google