Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scott Michael Indiana University July 6, 2017

Similar presentations


Presentation on theme: "Scott Michael Indiana University July 6, 2017"— Presentation transcript:

1 Scott Michael Indiana University July 6, 2017
Performance Benchmarking of the R Programming Environment on Knight's Landing Scott Michael Indiana University July 6, 2017 Intro Slide

2 Who am I? Theoretical Astrophysicist NOT a statistician
HPC application optimization and performance tuning Lead the Research Analytics team in Research Technologies at Indiana University

3 Contributors IU Eric Wernert Jefferson Davis James McCombs Esen Tuna
TACC Bill Barth Tommy Minyard David Walling

4 Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions

5 IU, The Stampede Supercomputer, and Xeon Phi
IU Research Technologies has a partnership with TACC collaborating on systems and support Stampede – largest XSEDE machine by core count Wrangler – data intensive computing and 20 PB out of region replication Jetstream – XSEDE production science cloud IU supports data intensive and “high productivity” languages on Stampede Including R, python, and Matlab Large transition between Stampede 1 & 2

6 Evolution of Xeon Phi Knight’s Corner Knight’s Landing
Coprocessor only Coprocessor or Self-hosted 1 TF peak (DP) 3 TF peak (DP) 8GB device + system memory 16GB MCDRAM + system memory

7 R Support on Stampede 1 & 2 Primary support on Stampede 1 for R
Support several methods for distributed R (pbdR, Rmpi, snow, etc.) R built in offload mode Configured R to use GPUs in portion of Stampede via HiPLAR However, much of the R workload on Stampede didn’t rely on KNC Stampede 1 Nodes 6,400 Interconnect FDR IB Filesystem 14 PB Lustre Node Configuration Processor Dual E “SandyBridge” Phi SE10P Memory 32GB DDR3 8GB GDDR5 Stampede 2 Nodes 4,200 Interconnect OmniPath v1 Node Configuration Processor Phi 7250 Memory 16GB GDDR4

8 R Performance on KNL KNL the sole processor on Stampede 2
Has shown good performance for large scale HPC codes (MD, climate, astro, etc.) How does KNL perform with a language like R?

9 KNL Architecture Intel(R) Xeon Phi(TM) CPU 1.60GHz (68 physical cores) Features of note for KNL Tiled architecture supporting 4 SMT threads per physical core

10 KNL Architecture (cont.)
Features of note for KNL 16GB on chip MCDRAM to act as fast memory can be configured into several modes

11 Benchmarking Strategy
Look at industry standard performance benchmarks for R on KNL and compare to SNB Further explore some exemplar workflows in each language and compare to benchmark results Compare both single node and multinode benchmarks

12 Benchmarking Strategy
R standard benchmark: R-25 benchmark Very old, fixed (small) problem sizes, report output challenging to parse Reasonable mix of mini-kernels focused on dense matrix operations and linear solvers R benchmark for scalability focused on similar kernels to R-25 Built to distribute and for flexibility, currently available on CRAN at RHPCBenchmark

13 Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions

14 R Benchmark Results Generally R lacks multithreading (some exceptions include mclapply) so we rely on the threading in MKL Standard profiling/tracing tools are challenging to employ Instrumenting entire R interpreter creates too much overhead

15 R Benchmark Results Benchmarks include
Cholesky decomp, eigendecomp, LS fit, linear solve, QR decomp, matrix cross, matrix det, matrix-matrix, matrix-vector Multiple threads per core aren’t useful Contrast to KNC

16 R Benchmark Results For some benchmarks single core KNL outperforms SNB

17 R Benchmark Results Need large matrices to make full use of all 68 cores

18 R Benchmark Results For math intensive kernels R interpreter overhead isn’t bad

19 Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions

20 RHPCBenchmark Package
The RHPCBenchmark initial release is available on CRAN Provides a variety of dense matrix, sparse matrix, and machine learning benchmarks Users can configure the set of benchmarks to run and benchmark parameters Results are provided in .csv files and a data frame for further analysis

21 Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions

22 Next Steps for R Performance
Internode performance Higher level functions Many R packages don’t rely on the building blocks tested (e.g. nnet, cluster) Other classes of functions Sparse matrix operations Data wrangling operations

23 Talk Overview Targeting productivity languages for Xeon Phi based architecture: Motivation and History Benchmark results and lessons learned The RHPCBenchmark package Future directions Conclusions

24 Conclusions R performance on KNL better for dense matrix operations (3x SNB) and close to native C performance Performance is best for large matrices SNB does perform better for small matrices New RHPCBenchmark offers flexibility in benchmarking your hardware and R build

25 Questions? Suggestions?
Scott Michael James McCombs

26 Backups: KNL Speedup in R

27 Backups: KNL vs. IvyBridge

28 Backups: KNL Flat vs. Cached


Download ppt "Scott Michael Indiana University July 6, 2017"

Similar presentations


Ads by Google