Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.

Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty of Engineering, University of Catania The enormous advances made in this last decade in the field of Grid Infrastructure, make very attractive their use to execute HPC parallel algorithms. As known, many factors may influence the relevant performance and scalability, among which the scheduling policy adopted at the broker level, the MPI libraries and their implementation used and the features of network adopted to connect the computing resources. In this paper, performance evaluation of a Grid Infrastructure realised by the Consortium COMETA under the PI2S2 project, will be presented. The performance evaluation has been carried out in terms of speedup, efficiency and scalability analyses of execution of parallel algorithms using the MPI paradigm. Well known benchmarks have been tested, among which : Matrix Multiplication (figure 1) and Pascal's Triangle (figure 2) algorithms. One of the aim of the performance evaluation was that to point out the influence of the kind of network used to interconnect computing resources. In particular the use of low latency and wide band communication network InfiniBand has been compared to Gigabit Ethernet. Another goal was relevant to the analysis of the influence of the MPI library and its implementation; for this reason, different implementations of MPI library have been considered: MPICH, MVAPICH, MVAPICH2.The main results achieved will be showed and the relevant conclusions will be figured out; among them, the advantages in higher parallel speedup using InfiniBand over Gigabit Ethernet, the minimal communication overhead offered by MVAPICH and superlinear speedup effect will be pointed out. N°of ElementData Size 75000600 KB 1500001.2 MB 3000002.4 MB 6000004.8 MB Pascal Triangle Algorithm (figure 2) Matrix Multiplication Algorithm (figure 1) Let's focus on the MVAPICH implementation and test different problem sizes. The bigger the problem the higher superlinear speedup achievable. This graph shows the speedup effect due to different data distribution on a growing number of resources. Given a problem size we can find the number of resources that will yeld superlinear speedup. Case A Case B We show the result of two different communication strategies: point-to-point (case A) and collective (case B) implementation considering the smallest and largest size than the test data. This algorithm shows superlinear speedup effect and efficiency value higher than one. How is this possible? What are the potential causes? MVAPICH has the best performance. Employing a higher number of processing units the available main memory grows, but also the cache memory does. If the problem size fits into the aggregated cache the resulting speedup can be superlinear. In both cases MVAPICH's performance resulted the best compared to the other implementations available. MVAPICH is a specialized library designed to use an InfiniBand interconnect. It offers minimal communication overhead allowing superior scalability on the infrastructure. On a distributed memory architecture the different access speed to memory levels can bring to superlinear speedup effects. This speedup can be explained when we envisage a realistic computational model considering full memory hierarchy and access time. L2 cache size per CPU in the COMETA Consortium Grid infrastructure is 1MB.If the problem size fits in a single processor L2 cache there is no superlinear effect; as the problem size grows the peak point will move to the right, in correspondence to the minimum number of CPUs that can hold the whole problem in their aggregated cache. The algorithm is scalable up to 16 CPUs, after that there is a remarkable communication overhead that cover the benefits of parallelism. According to Gustafson, an algorithm is scalable when, as the problem size and the number of CPUs grow, the execution time is fixed. If a massively parallel computation isn’t efficient for a code-problem pair, it could be efficient for the same code on a larger problem size.The aim of parallelism is to maximize the throughput keeping a costant execution time. When the input data size is small using a larger number of processor causes a speedup reduction due to the increased overhead that covers the parallelization advantages.

Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.

Similar presentations

Presentation on theme: "Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty.

Similar presentations

Presentation on theme: "Performance Evaluation of Parallel Algorithms on a Computational Grid Environment Simona Blandino 1, Salvatore Cavalieri 2 1 Consorzio COMETA, 2 Faculty."— Presentation transcript:

Similar presentations

About project

Feedback