Presentation is loading. Please wait.

Presentation is loading. Please wait.

MASS CUDA Performance Analysis and Improvement

Similar presentations


Presentation on theme: "MASS CUDA Performance Analysis and Improvement"— Presentation transcript:

1 MASS CUDA Performance Analysis and Improvement
Ahmed Musazay Faculty Advisor: Dr. Munehiro Fukuda

2 MASS Multi Agent Spatial Simulation
Allows non-computing specialists to parallelize simulations Concept of Place and Agent objects Three versions: C++, Java, CUDA High-level abstraction to non-computing specialists

3 CUDA C/C++ extension by NVidia
A heterogeneous parallel programming interface. Host – CPU , and Device – GPU Functions executing on the GPU are called Kernel functions Take configuration parameters for number of threads -fast, but difficult to use, hard to tune up perf -utilize performance and also bring high level abstraction of mass

4 MASS-CUDA Current version – written by Nathaniel Hart for Master’s thesis Ported C++ version into current CUDA version Object oriented- allows users to extend Place and Agent objects Designed with intention of using multiple GPU cards Nate’s work- porting from mass to cuda

5 Problem Performance issues Difficult to tune performance
Goal of project: Understand MASS Library and how it works Write unit tests to find where performance issues occur Propose solutions that can be implemented to increase performance of MASS CUDA

6 Heat2D Fourier’s heat equation Place objects – Metal
Simulation describing spread of heat in a given region over period of time Place objects – Metal Ran at four different sizes 250x250, 500x500, 1000x1000, 2000x2000

7 Test Case: Running Heat2D - Primitive Array
Heat2D simulation using array of doubles No objects created to contain information as opposed to MASS Simulation functions written as kernel functions

8 Results

9 Proposed Solution Store all data in MASS as user-defined primitive type arrays Index mapping to unique element Pros Fast accesses Can run larger simulations, requiring less heap memory overhead Cons User programmability

10 Test Case: Running Heat2D - Place objects
Ran simulation with same objects used in MASS, without using library function calls Metal & MetalState derived from Library classes containing same memory and internal functions Simulation functions re-written in CUDA as kernel functions

11 Results

12 Proposed Solution Remove unnecessary functionality that may be slowing library down Excessive memory transfers between host and device Partitioning logic Pros Can work on adding only a single feature of library at time, making sure meeting performance standard More computation spent on actual simulation rather than management Cons Scalability of library will be missing early in development

13 Test Case: Running Heat2D – Coalesced Accesses
Ran simulation using primitive values, but taking advantage of coalesced memory accesses Kernel functions taking array parameters as native dimension – 2D array cudaMallocPitch(), cudaMalloc3D()

14 Results

15 Proposed Solution Let MASS run the simulation in its native dimension (1D, 2D, 3D) Pros Faster memory accesses, increasing performance Cons Extra overhead of determining dimensions to run function as Will only be able to natively run up to 3 dimensions

16 Conclusion Removing unused features, implementing one feature at a time Coalesced memory accesses – using native array dimensions Using primitive arrays Consider : Shared memory

17 Final Words Relevant courses: Special thanks to:
CSS 430 Operating Systems CSS 422 Hardware and Computer Organization Special thanks to: Dr. Fukuda Nathaniel Hart


Download ppt "MASS CUDA Performance Analysis and Improvement"

Similar presentations


Ads by Google