Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University.

Similar presentations


Presentation on theme: "1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University."— Presentation transcript:

1 1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University of Otago New Zealand

2 2 March 17, 2006Zhiyi’s RSL Motivation DSM applications are not as efficient as MPI on cluster computers DSM applications are not as efficient as MPI on cluster computers

3 3 March 17, 2006Zhiyi’s RSL VOPP VODCA is a system supporting View- Oriented Parallel Programming (VOPP) VODCA is a system supporting View- Oriented Parallel Programming (VOPP) Why a new programming style? Why a new programming style? Improve the performance of DSM applications on cluster computers Improve the performance of DSM applications on cluster computers Provide a programming style better than MPI Provide a programming style better than MPI Message passing is notoriously known as a difficult programming style Message passing is notoriously known as a difficult programming style VODCA is a system supporting View- Oriented Parallel Programming (VOPP) VODCA is a system supporting View- Oriented Parallel Programming (VOPP) Why a new programming style? Why a new programming style? Improve the performance of DSM applications on cluster computers Improve the performance of DSM applications on cluster computers Provide a programming style better than MPI Provide a programming style better than MPI Message passing is notoriously known as a difficult programming style Message passing is notoriously known as a difficult programming style

4 4 March 17, 2006Zhiyi’s RSL What is a view? Suppose M is the set of data objects in shared memory Suppose M is the set of data objects in shared memory A view is a group of data objects from the shared memory A view is a group of data objects from the shared memory  V, V  M  V, V  M Views must not overlap each other Views must not overlap each other  Vi, Vj, i  j, Vi  Vj =   Vi, Vj, i  j, Vi  Vj =  Suppose there are n views in shared memory Suppose there are n views in shared memory ∑ Vi=M ∑ Vi=M Suppose M is the set of data objects in shared memory Suppose M is the set of data objects in shared memory A view is a group of data objects from the shared memory A view is a group of data objects from the shared memory  V, V  M  V, V  M Views must not overlap each other Views must not overlap each other  Vi, Vj, i  j, Vi  Vj =   Vi, Vj, i  j, Vi  Vj =  Suppose there are n views in shared memory Suppose there are n views in shared memory ∑ Vi=M ∑ Vi=M

5 5 March 17, 2006Zhiyi’s RSL VOPP Requirements The programmer should divide the shared data into a number of views according to the data flow of the parallel algorithm. The programmer should divide the shared data into a number of views according to the data flow of the parallel algorithm. A view should consist of data objects that are always processed as an atomic set in a program. A view should consist of data objects that are always processed as an atomic set in a program. Views can be created and destroyed anytime. Views can be created and destroyed anytime. Each view has a unique view identifier Each view has a unique view identifier The programmer should divide the shared data into a number of views according to the data flow of the parallel algorithm. The programmer should divide the shared data into a number of views according to the data flow of the parallel algorithm. A view should consist of data objects that are always processed as an atomic set in a program. A view should consist of data objects that are always processed as an atomic set in a program. Views can be created and destroyed anytime. Views can be created and destroyed anytime. Each view has a unique view identifier Each view has a unique view identifier

6 6 March 17, 2006Zhiyi’s RSL VOPP Requirements (cont.) View primitives such as acquire_view and release_view must be used when a view is accessed. View primitives such as acquire_view and release_view must be used when a view is accessed.acquire_view(View_A); A = A + 1; release_view(View_A); acquire_Rview and release_Rview can be used when a view is only read by a processor. acquire_Rview and release_Rview can be used when a view is only read by a processor. View primitives such as acquire_view and release_view must be used when a view is accessed. View primitives such as acquire_view and release_view must be used when a view is accessed.acquire_view(View_A); A = A + 1; release_view(View_A); acquire_Rview and release_Rview can be used when a view is only read by a processor. acquire_Rview and release_Rview can be used when a view is only read by a processor.

7 7 March 17, 2006Zhiyi’s RSL Example A VOPP program for a producer/consumer problem A VOPP program for a producer/consumer problem If(prod_id == 0){ acquire_view(1); produce(x); release_view(1); } barrier(0); acquire_Rview(1); consume(x); release_Rview(1);

8 8 March 17, 2006Zhiyi’s RSL Advantages of VOPP Keep the convenience of shared memory programming Keep the convenience of shared memory programming Focus on data partitioning and data access instead of data race and mutual exclusion Focus on data partitioning and data access instead of data race and mutual exclusion View primitives automatically achieve mutual exclusion View primitives automatically achieve mutual exclusion View primitives are not extra burden View primitives are not extra burden The programmer can finely tune the parallel algorithm by careful view partitioning The programmer can finely tune the parallel algorithm by careful view partitioning Keep the convenience of shared memory programming Keep the convenience of shared memory programming Focus on data partitioning and data access instead of data race and mutual exclusion Focus on data partitioning and data access instead of data race and mutual exclusion View primitives automatically achieve mutual exclusion View primitives automatically achieve mutual exclusion View primitives are not extra burden View primitives are not extra burden The programmer can finely tune the parallel algorithm by careful view partitioning The programmer can finely tune the parallel algorithm by careful view partitioning

9 9 March 17, 2006Zhiyi’s RSL Philosophy of VOPP Shared memory is a critical resource that needs to be used with care Shared memory is a critical resource that needs to be used with care If there is no need to use shared memory, don’t use it If there is no need to use shared memory, don’t use it Justification is wanted before a view is created Justification is wanted before a view is created Shared memory is a critical resource that needs to be used with care Shared memory is a critical resource that needs to be used with care If there is no need to use shared memory, don’t use it If there is no need to use shared memory, don’t use it Justification is wanted before a view is created Justification is wanted before a view is created

10 10 March 17, 2006Zhiyi’s RSL VOPP vs. MPI Easier for programmers than MPI Easier for programmers than MPI For problems like task queue, programming with MPI is horrific. For problems like task queue, programming with MPI is horrific. Can mimic any finely-tuned MPI program Can mimic any finely-tuned MPI program Shared message  view Shared message  view Send/recv  acquire_view Send/recv  acquire_view Essential differences Essential differences View is location transparent View is location transparent More barriers in VOPP More barriers in VOPP Easier for programmers than MPI Easier for programmers than MPI For problems like task queue, programming with MPI is horrific. For problems like task queue, programming with MPI is horrific. Can mimic any finely-tuned MPI program Can mimic any finely-tuned MPI program Shared message  view Shared message  view Send/recv  acquire_view Send/recv  acquire_view Essential differences Essential differences View is location transparent View is location transparent More barriers in VOPP More barriers in VOPP

11 11 March 17, 2006Zhiyi’s RSL Implementation VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing VODCA version 1.0 VODCA version 1.0 Released as an open source software Released as an open source software A library run at the user space A library run at the user space Based on View-based Consistency Based on View-based Consistency Use an efficient consistency protocol VOUPID Use an efficient consistency protocol VOUPID VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing VODCA version 1.0 VODCA version 1.0 Released as an open source software Released as an open source software A library run at the user space A library run at the user space Based on View-based Consistency Based on View-based Consistency Use an efficient consistency protocol VOUPID Use an efficient consistency protocol VOUPID

12 12 March 17, 2006Zhiyi’s RSL View-based Consistency Condition for View-based Consistency Before a processor Pi is allowed to access a view by calling acquire_view or acquire_Rview, all previous write accesses to data objects of the view must be performed with respect to Pi according to their causal order. In VOPP, barriers are only used for synchronization and have nothing to do with consistency maintenance for DSM. Condition for View-based Consistency Before a processor Pi is allowed to access a view by calling acquire_view or acquire_Rview, all previous write accesses to data objects of the view must be performed with respect to Pi according to their causal order. In VOPP, barriers are only used for synchronization and have nothing to do with consistency maintenance for DSM.

13 13 March 17, 2006Zhiyi’s RSL Consistency protocols They are page based They are page based Update protocol Update protocol Modify immediately Modify immediately Invalidation protocol Invalidation protocol Use a write notice to invalidate a page Use a write notice to invalidate a page When the page is accessed, a page fault causes the fetch of diffs which are applied on the page When the page is accessed, a page fault causes the fetch of diffs which are applied on the page They are page based They are page based Update protocol Update protocol Modify immediately Modify immediately Invalidation protocol Invalidation protocol Use a write notice to invalidate a page Use a write notice to invalidate a page When the page is accessed, a page fault causes the fetch of diffs which are applied on the page When the page is accessed, a page fault causes the fetch of diffs which are applied on the page

14 14 March 17, 2006Zhiyi’s RSL Consistency protocols (cont.) Home-based protocol Home-based protocol Based on invalidate protocol, but Based on invalidate protocol, but For each page, use a copy as its home For each page, use a copy as its home When a diff is created, it is applied to the home copy immediately When a diff is created, it is applied to the home copy immediately When the page is accessed, a page fault causes the fetch of the home copy (Pros: resolve the diff accumulation problem) When the page is accessed, a page fault causes the fetch of the home copy (Pros: resolve the diff accumulation problem) Home-based protocol Home-based protocol Based on invalidate protocol, but Based on invalidate protocol, but For each page, use a copy as its home For each page, use a copy as its home When a diff is created, it is applied to the home copy immediately When a diff is created, it is applied to the home copy immediately When the page is accessed, a page fault causes the fetch of the home copy (Pros: resolve the diff accumulation problem) When the page is accessed, a page fault causes the fetch of the home copy (Pros: resolve the diff accumulation problem)

15 15 March 17, 2006Zhiyi’s RSL The VOUPID protocol View-Oriented Update Protocol with Integrated Diff View-Oriented Update Protocol with Integrated Diff Based on the update protocol Based on the update protocol Diffs of a page of a view are merged into a single diff Diffs of a page of a view are merged into a single diff The single diff is used to update the page when the view is acquired The single diff is used to update the page when the view is acquired View-Oriented Update Protocol with Integrated Diff View-Oriented Update Protocol with Integrated Diff Based on the update protocol Based on the update protocol Diffs of a page of a view are merged into a single diff Diffs of a page of a view are merged into a single diff The single diff is used to update the page when the view is acquired The single diff is used to update the page when the view is acquired

16 16 March 17, 2006Zhiyi’s RSL Experiment Use a cluster computer Use a cluster computer The cluster computer, in Tsinghua Univ., consists of 128 Itanium 2 running Linux 2.4, connected by InfiniBand. Each node has two 1.3 GHz processors and 4 Gbytes RAM. We run two processes on each node. The cluster computer, in Tsinghua Univ., consists of 128 Itanium 2 running Linux 2.4, connected by InfiniBand. Each node has two 1.3 GHz processors and 4 Gbytes RAM. We run two processes on each node. We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN). We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN). Use a cluster computer Use a cluster computer The cluster computer, in Tsinghua Univ., consists of 128 Itanium 2 running Linux 2.4, connected by InfiniBand. Each node has two 1.3 GHz processors and 4 Gbytes RAM. We run two processes on each node. The cluster computer, in Tsinghua Univ., consists of 128 Itanium 2 running Linux 2.4, connected by InfiniBand. Each node has two 1.3 GHz processors and 4 Gbytes RAM. We run two processes on each node. We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN). We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN).

17 17 March 17, 2006Zhiyi’s RSL Related systems TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system based on traditional parallel programming. TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system based on traditional parallel programming. Message Passing Interface (MPI) is a standard for message passing-based parallel programming. We used LAM/MPI. Message Passing Interface (MPI) is a standard for message passing-based parallel programming. We used LAM/MPI. TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system based on traditional parallel programming. TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system based on traditional parallel programming. Message Passing Interface (MPI) is a standard for message passing-based parallel programming. We used LAM/MPI. Message Passing Interface (MPI) is a standard for message passing-based parallel programming. We used LAM/MPI.

18 18 March 17, 2006Zhiyi’s RSL Performance of NN

19 19 March 17, 2006Zhiyi’s RSL Performance of IS

20 20 March 17, 2006Zhiyi’s RSL Performance of SOR

21 21 March 17, 2006Zhiyi’s RSL Performance of Gauss

22 22 March 17, 2006Zhiyi’s RSL Future work on VOPP More benchmarks/applications More benchmarks/applications Performance evaluation on larger clusters Performance evaluation on larger clusters Optimized implementation of barriers for VOPP Optimized implementation of barriers for VOPP More auxiliary utilities for VOPP programmers More auxiliary utilities for VOPP programmers A view-based debugger for VOPP A view-based debugger for VOPP A fault-tolerant system for VODCA A fault-tolerant system for VODCA More benchmarks/applications More benchmarks/applications Performance evaluation on larger clusters Performance evaluation on larger clusters Optimized implementation of barriers for VOPP Optimized implementation of barriers for VOPP More auxiliary utilities for VOPP programmers More auxiliary utilities for VOPP programmers A view-based debugger for VOPP A view-based debugger for VOPP A fault-tolerant system for VODCA A fault-tolerant system for VODCA

23 23 March 17, 2006Zhiyi’s RSL Questions?


Download ppt "1 March 17, 2006Zhiyi’s RSL VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing Dr Zhiyi Huang Dept of Computer Science University."

Similar presentations


Ads by Google