Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The VAMPIR and PARAVER performance analysis tools applied to a wet chemical etching parallel algorithm S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for.

Similar presentations


Presentation on theme: "1 The VAMPIR and PARAVER performance analysis tools applied to a wet chemical etching parallel algorithm S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for."— Presentation transcript:

1 1 The VAMPIR and PARAVER performance analysis tools applied to a wet chemical etching parallel algorithm S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for Computational Science and Engineering 2 Department of Mechanical and Environmental Engineering and Department of Mathematics University of California, Santa Barbara http://www.engineering.ucsb.edu/~hpscicom

2 2 Acknowledgements This material is based upon work supported by the National Science Foundation under Grant #0086262. This research was conducted using the resources of the San Diego Supercomputer Center. http://www.npaci.edu/Horizon/guide_linked/bh_tools_txt.html

3 3 Outline of Presentation 1.Introduction (Physical problem) 2.Problem formulation 3.Fixed domain formulation 4.Numerical algorithm 5.Test case 6.Performance tools and considerations a. VAMPIR b. PARAVER 7.Diagnostic example 8.Conclusions

4 4 Physical problem A gap of width “2a” and length “L” is to be etched in a flat plate. The remainder of the plate is covered with a protective (photoresist) layer. Since it is assumed that L >>2a, the problem can be considered as two-dimensional. Figure 1. Physical problem

5 5 Simplifying assumptions i. There is no convection in the etching medium ii. The etching process is isotropic iii. The thickness of the photoresist layer is infinitely small iv. Only one component of the etching liquid determines the process

6 6 Problem formulation Mathematical model: The etching fluid  (t) is bounded by the outer boundary  1 ; the photoresist layer   (t); and the moving boundary S(t). D\  (t) denotes part of the solid. Figure 2. Side view of physical problem showing mathematical problem setup.

7 7 Fixed domain formulation Figure 3. Fixed domain mathematical formulation.

8 8 Numerical Algorithm The basic numerical algorithm is: with inand

9 9 Numerical algorithm (cont.) with in (the rectangular region of the plate’s cross section)

10 10 Test case Maxrow1 (# of rows in the top region) = 280 Maxcol1 (# of columns in the top region) = 321 Maxrow2 (# of rows in the bottom region) = 80 Maxcol2 (# of columns in the bottom region) = 161 Maxtime (# of time steps) = 5  t (size of time steps) = 1  (successive over-relaxation factor) = 1.935 B (non-dimensional number) = 10.0

11 11 Domain decomposition Figure 5. Domain decomposition of mathematical problem into sixteen subregions showing the flow of computations.

12 12 Load balancing information for the test case Processors 248163264 Bottom Processors 111248 Bottom Points 12888 644032201610 Top Processors 137142856 Top Points 89880 3017412840642032101605 Diff Points 77000172944020105

13 13 Figure 4. Ideal versus obtained speedup

14 14 Figure 6. Moving boundaries at various times.

15 15 Performance tools and considerations The parallel program is monitored while it is executed. Monitoring produces performance data that is interpreted in order to reveal areas of poor performance. The program is then altered and the process is repeated until an acceptable level of performance is reached.

16 16 VAMPIR (Visualization and Analysis of MPI Resources – 2.0) VAMPIR 2.0 is a post-mortem trace visualization tool from Pallas GmbH http://www.pallas.com It uses the profile extensions to MPI and permits analysis of the message events where data is transmitted between processors during execution of a parallel program. It has a convenient user-interface and an excellent zooming and filtering. Global displays show all selected processes.

17 17 Global Timeline: detailed application execution over time axis Activity Chart: presents per-process profiling information Summaric Chart: aggregated profiling information Communication Statistics: message statistics for each process pair Global Communication Statistics: collective operations statistics I/O Statistics: MPI I/O operation statistics Calling Tree: global dynamic calling tree

18 18

19 19

20 20

21 21

22 22

23 23

24 24

25 25

26 26

27 27 PARAVER (Parallel Program Visualization and Analysis Tool) PARAVER is a flexible parallel program visualization and analysis tool based on an easy-to-use Motif GUI (graphical user interface) PARAVER was developed to respond to the basic need to have a qualitative perception of the application behavior by visual inspection and then to be able to focus on the detailed quantitative analysis of the problems.

28 28 Paraver (Parallel Program Visualization and Analysis Tool) Powerful flexible parallel program visualization tool based on an easy-to-use Motif GUI (graphical user interface) Developed by : European Center for Parallelism of Barcelona (CEPBA) Universitat Politecnica de Catalunya http://www.cepba.upc.es/

29 29 Paraver is designed to visualize and analyze - Communication and load balance - Combining OpenMP and MPI - Hardware performance and counters Usage - Compile programs with special libraries - Run programs to produce trace files - View and analyze traces - Designed to help in program understanding and optimization

30 30

31 31

32 32

33 33

34 34

35 35

36 36

37 37

38 38 Inefficient programming example Load imbalance (inefficient memory use) Cache misses and page faults Stride minimization (efficient memory use)

39 39 Load imbalance - VAMPIR Figure 7. Load imbalance in the plate region.

40 40 Load imbalance - PARAVER Figure 8. Load imbalance in the plate region.

41 41 The memory hierarchy

42 42 Array Allocation

43 43 Example of coding Figure 9. A piece of the etching code (non-optimized on the left and optimized on the right). c- c- calculate error in the top region and update u1old do 370 i = iam*numrows1 + 1,lastrow do 380 j = 2,maxcol1 if (abs(u1new(i,j) - u1old(i,j)).gt.err) then err = abs(u1new(i,j) - u1old(i,j)) endif u1old(i,j) = u1new(i,j) 380 continue 370 continue c- do 380 j = 2,maxcol1 do 370 i = iam*numrows1 + 1,lastrow if ( abs(u1new(i,j) - u1old(i,j)).gt. err) then err = abs(u1new(i,j) - u1old(i,j)) endif u1old(i,j) = u1new(i,j) 370 continue 380 continue c-

44 44 Load balance - VAMPIR Figure 10. Approximate load balance.

45 45 Load balance - PARAVER Figure 11. Approximate load balance.

46 46 Conclusions A significant factor that affects the performance of a parallel application is the balance between communication and workload. The challenge of the message passing model is in reducing message traffic over the interconnection network. To fully understand the performance behavior of such applications, analysis and visualization tools are needed. Two such tools, VAMPIR and PARAVER, were used to analyze the performance of the etching application. It was seen that optimization of the parallel code can be carried out in an iterative process involving these tools to investigate performance issues.

47 47 Web Sites Project site http://www.engineering.ucsb.edu/~hpscicom San Diego Supercomputer Center http://www.npaci.edu/Horizon/guide_linked/bh_tools_txt. html VAMPIR http://www.pallas.com PARAVER http://www.cepba.upc.es/


Download ppt "1 The VAMPIR and PARAVER performance analysis tools applied to a wet chemical etching parallel algorithm S. Boeriu 1 and J.C. Bruch, Jr. 2 1 Center for."

Similar presentations


Ads by Google