Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang.

Similar presentations


Presentation on theme: "A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang."— Presentation transcript:

1 A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang

2 Introduction Relatively little is known about the performance of Distributed Shared Memory systems compared to Message Passing systems. We compare the performance of the TreadMarks DSM system with two popular message passing systems, MPICH-MPI, and PVM.

3 Introduction Three applications are compared, Mergesort, Mandelbrot Set Generation, and Backpropergation Neural Network. Each application represents a different class of problem.

4 TreadMarks DSM Provides locks and barriers as primitives. Provides locks and barriers as primitives. Uses Lazy Release Consistency. Uses Lazy Release Consistency. Granularity of sharing is a page. Granularity of sharing is a page. Creates page differentials to avoid the false sharing effect. Creates page differentials to avoid the false sharing effect. Version 1.0.3.3 Version 1.0.3.3

5 Parallel Virtual Machine Provides concept of a virtual parallel machine. Provides concept of a virtual parallel machine. Exists as a daemon on each node. Exists as a daemon on each node. Inter-process communication is mediated by the daemons. Inter-process communication is mediated by the daemons. Design for flexibility. Design for flexibility. Version 3.4.3. Version 3.4.3.

6 MPICH - MPI Standard interface for developing Message Passing Applications. Standard interface for developing Message Passing Applications. Primary design goal is performance. Primary design goal is performance. Primarily defines communications primitives. Primarily defines communications primitives. MPICH is a reference platform for the MPI standard. MPICH is a reference platform for the MPI standard. Version 1.2.4 Version 1.2.4

7 System 32 Node Linux Cluster 32 Node Linux Cluster 800mhz Pentium with 256 MB 800mhz Pentium with 256 MB Redhat 7.2 Redhat 7.2 100mbit Ethernet 100mbit Ethernet Results determined for 1, 2, 4, 8, 16, 24, and 32 processes. Results determined for 1, 2, 4, 8, 16, 24, and 32 processes.

8 Mergesort Parallelisation Strategy used is Divide and Conqueror. Parallelisation Strategy used is Divide and Conqueror. Synchronisation between pairs of nodes. Synchronisation between pairs of nodes. Loosely Synchronous class problem. Loosely Synchronous class problem. Coarse grained synchronisation Coarse grained synchronisation Irregular synchronisation points. Irregular synchronisation points. Alternate phases of computation and communication. Alternate phases of computation and communication.

9 Mergesort Results (1)

10 Mergesort Results (2)

11 Mandelbrot Set Strategy used is Data Partitioning. Strategy used is Data Partitioning. Work Pool is used as computation time of sections differs. Work Pool is used as computation time of sections differs. Work Pool size >= 2 * num processes. Work Pool size >= 2 * num processes. Embarrassingly Parallel class problem. Embarrassingly Parallel class problem. May involve complex computation, but there is very little communication. May involve complex computation, but there is very little communication. Give indication of performance Under ideal conditions. Give indication of performance Under ideal conditions.

12 Mandelbrot Set Results

13 Neural Network (1) Strategy is Data Partitioning. Strategy is Data Partitioning. Each processor trains the network on a subsection of the data set. Each processor trains the network on a subsection of the data set. Changes are summed and applied at the end of each epoch. Changes are summed and applied at the end of each epoch. Requires large data sets to be effective. Requires large data sets to be effective..

14 Neural Network (2) Synchronous class problem. Synchronous class problem. Characterised by algorithm that carries out the same operation on all points in the data set. Characterised by algorithm that carries out the same operation on all points in the data set. Synchronisation occurs at regular points. Synchronisation occurs at regular points. Often applies to problems that use data partitioning. Often applies to problems that use data partitioning. A large number of problems appear to belong to the synchronous class. A large number of problems appear to belong to the synchronous class.

15 Neural Network Results (1)

16 Neural Network Results (2)

17 Neural Network Results (3)

18 Conclusion In general the performance of DSM is poorer than that of MPICH or PVM. In general the performance of DSM is poorer than that of MPICH or PVM. Main reasons identified are: Main reasons identified are: The increased use of memory associated with the creation of page differentials. The increased use of memory associated with the creation of page differentials. False sharing affect due to the granularity of sharing. False sharing affect due to the granularity of sharing. Differential accumulation in the gather operation. Differential accumulation in the gather operation.


Download ppt "A Performance Comparison of DSM, PVM, and MPI Paul Werstein Mark Pethick Zhiyi Huang."

Similar presentations


Ads by Google