Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.

Similar presentations


Presentation on theme: "1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook."— Presentation transcript:

1 1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI) 2 Department of Geography and Geographic Information Science 3 Department of Computer Science 4 National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign 5 5 Center of Excellence for Geospatial Information Science U.S. Geological Survey (USGS)AutoCarto’12 A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data

2 Outline Overview Overview –Map re-projection –pRasterBlaster: HPC Solution to Map Re- Projection Performance Profiling Performance Profiling –pRasterBlaster Computational and Scaling Bottlenecks Conclusion Conclusion 2

3 Introduction Map re-projection Map re-projection –A important cartographic operation Desktop application: mapIMG Desktop application: mapIMG –Challenges exist when scaling for coarse-scale spatial dataset –Re-projecting a 1GB raster dataset can take 45-60 minutes Parallel computing techniques will help scaling to large datasets Parallel computing techniques will help scaling to large datasets –Raster was born to be parallelized

4 Parallelizing Map Re-Projection  Map re-projection on large dataset is too slow or even impossible on desktop machines pRasterBlaster pRasterBlaster –mapIMG in HPC (High-Performance Computing) environment –Early Days Row-wise decomposition Row-wise decomposition I/O occurred directly in program inner loop I/O occurred directly in program inner loop –Rigorous geometry handling and novel resampling Resampling options for categorical data and population counts (also standard continuous data resampling methods) Resampling options for categorical data and population counts (also standard continuous data resampling methods) –Able to project/re-project large maps in short amount of time

5 pRasterBlaster Fast and accurate raster re-projection in three (primary) steps Fast and accurate raster re-projection in three (primary) steps Step 1: Calculate and partition output space Step 1: Calculate and partition output space Step 2: Read input and re-project Step 2: Read input and re-project Step 3: Combine temporary files Step 3: Combine temporary files

6 Performance Profiling: Motivation and Objectives Exploit performance profiling tools to make pRasterBlaster more scalable and efficient Exploit performance profiling tools to make pRasterBlaster more scalable and efficient –Early version was not scalable to large number of processors –Resolve computational bottlenecks to allow pRasterBlaster leverage thousands of processors Demonstrate techniques of using performance profilers Demonstrate techniques of using performance profilers –Potentially useful many GIS applications

7 What is performance profiling? A form of dynamic program analysis A form of dynamic program analysis Measures Measures –memory footprint of program –time complexity of program –usage of particular instructions –frequency and duration of function calls Aids program optimization Aids program optimization 7

8 How do profilers work? Statistical profilers Statistical profilers –Operate by sampling –Probes the program at regular intervals –Pros: Low overhead –Cons: Typically less numerically accurate and specific 8

9 How do profilers work? Instrumenting profilers Instrumenting profilers –Instrument target programs with additional instructions to collect required information –Pros: Much more accurate than statistical profilers –Cons: Potentially slow the program (since new instructions are added) Different kinds of instrumenting profilers Different kinds of instrumenting profilers –Manual instrumenting Done by the programmers Done by the programmers –Automatic profilers Software instruments automatically Software instruments automatically TAU and IPM used in this research. TAU and IPM used in this research. 9

10 Manual Instrumenting The traditional way of instrumenting C code is with the time system call, provided by the time.h library. Here is a code fragment that demonstrates its use: #include #include int main(void) { time_t start, finish;...time(&start); /* section to be timed */ time(&finish); printf("Elapsed time: %d\n", finish - start);......} 10

11 Manual Instrumenting in Parallel Programs Instrument the portion of the program running on individual processors Instrument the portion of the program running on individual processors #include #include int main(void) { time_t start, finish;...time(&start); /* section to be timed */ time(&finish); printf("Elapsed time on Process %d: %d\n", my_rank, finish - start);......} 11

12 IPM (Integrated Performance Monitoring) IPM is a portable profiling infrastructure for MPI programs – –Provides a low-overhead performance profile of the performance aspects and resource utilization of the parallel program – –Communication, computation, and IO are the primary focus – –http://ipm-hpc.sourceforge.nethttp://ipm-hpc.sourceforge.net We initially profiled pRasterBlaster with IPM to understand how communication, computation and IO usage breakdown for this application 12

13 TAU TAU (Tuning and Analysis Utilities) TAU performance system is a portable profiling and tracing toolkit – –Analysis of parallel programs written in Fortran, C, C++, Java, Python – –http://tau.uoregon.eduhttp://tau.uoregon.edu TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and state IPM is designed to profile MPI applications, while TAU is used to profile any kind of parallel applications 13

14 TAU for pRasterBlaster 14

15 TAU for pRasterBlaster 15

16 Computational Bottleneck I: Symptom

17

18

19 Cause: Workload Distribution Issue N rows on P processor cores When P is smallWhen P is big

20 Solution: Load Balancing 20 N rows on P processor cores When P is smallWhen P is big

21 Computational Bottleneck I: Summary Symptom Symptom –Load imbalance –Detected by TAU first –Verified by manual instrumenting Cause Cause –Workload distribution algorithm problem (not obvious on small platforms) Solution Solution –Revised algorithm for distributing workload 21

22 Computational Bottleneck II: Symptom 22

23 Computational Bottleneck II: Symptom 23

24 Computational Bottleneck II: Cause

25 Computational Bottleneck II: Analysis Spatial data-dependent performance anomaly Spatial data-dependent performance anomaly –The anomaly is data dependent –Four corners of the raster were processed by processors whose indexes are close to the two ends Exception handling in C++ is costly Exception handling in C++ is costly –Coordinate transformation on nodata area was handled as an exception Solution Solution –Remove C++ exception handling part 25

26 Computational Bottleneck II: Performance Improvement

27 Computational Bottleneck II: Summary Symptom Symptom –Processors responsible for polar regions spent more time than those processing equatorial region Cause Cause –Corner cells were mapped to invalid input raster cells generating exceptions –C++ exception handling was expensive Solution Solution –Removed C++ exception handling – Corner cells need not to be processed They now contribute less time of computation They now contribute less time of computation 27

28 Conclusions Performance profiling identified computational bottlenecks in pRasterBlaster Performance profiling identified computational bottlenecks in pRasterBlaster We demonstrated the value of profilers for pRasterBlaster – –The techniques is likely valuable for other GIS application Performance profiling is an important tool for developing scalable and efficient high performance applications Performance profiling is an important tool for developing scalable and efficient high performance applications

29 Future Work Identify and resolve remaining performance issues in pRasterBlaster Identify and resolve remaining performance issues in pRasterBlaster –Recently identified I/O is the next major road-block 29


Download ppt "1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook."

Similar presentations


Ads by Google