Presentation is loading. Please wait.

Presentation is loading. Please wait.

ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0.

Similar presentations


Presentation on theme: "ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0."— Presentation transcript:

1 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0

2 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-2 9.0 New Features Why Distributed Memory Parallel Computing? 1.Do you want to process large (>2 MDOF) linear models faster? 2.Are you creating models larger than available hardware resources, especially 32-bit resources, can effectively solve? 3.Do you want to reduce the time to completion of long running nonlinear analyses? 4.Do you want to increase overall throughput by processing all jobs as fast as possible? 5.Would you like to change your CAE process to maximize throughput by minimizing engineering time?

3 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-3 9.0 New Features Distributed ANSYS (D-ANSYS) –What is it? –Benefits! –Supported Features! –Timing Results! Parallel Performance for ANSYS –Available Solvers –Licensing Distributed Memory Requirements New Boundaries - What’s Possible with Parallel Computing? Questions Distributed Memory Parallel Computing Overview

4 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-4 9.0 New Features Distributed ANSYS (D-ANSYS) What is it? All of the ANSYS solution routines executing in parallel in distributed memory –Uses MPI (Message Passing Interface) for communication middleware –Uses cable for communication for distributed memory hardware –Treats shared memory the same way as distributed memory when used on a shared memory system All phases of the solution process are performed in parallel –Element Formulation(Stiffness Matrix Generation) –Matrix Solution(Linear Equation Solving) –Stress Recovery(Results Calculations)

5 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-5 9.0 New Features Distributed ANSYS (D-ANSYS) What is it? Treats shared memory and distributed memory in exactly the same way so: –You may run Distributed ANSYS on a single machine with multi-processors or –on multiple machines with one or more processors in each machine (Sustained 20 mb/sec cable) Hardware platforms supported at release: –Unix:HP-UX –Unix:SGI IRIX –Linux 32-bit: Intel IA-32 –Linux 64-bit: Itanium IA-64 –Linux 64-bit:AMD Opteron (Beta)

6 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-6 9.0 New Features Distributed ANSYS (D-ANSYS) Benefits! Whole solution phase is in parallel! –Now all of the ANSYS /SOLUTION phase is in parallel which includes stiffness matrix generation, linear equation solving and results calculation. –Means that more of the analysis is performed in parallel –Means that less wall clock time is required to perform an analysis –It is scalable, between 2X to 8X speedup on 2 to 16 processors!

7 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-7 9.0 New Features Support Solvers: –Distributed PCG solver (EQSLVE,DPCG) –Distributed JCG solver (EQSLVE,DJCG) –Distributed Sparse Solver (EQSLVE,DSPARSE) Factorization of the matrix and back/forward substitution is done in distributed parallel mode Best performance we have observed is 6X to 8X speedup on 12 to 16 processors –Existing Shared Memory Sparse Solver (EQSLVE,SPARSE) can be used The solver itself runs only on the master process (other parts run in distributed parallel) May be run in shared memory parallel mode on the master machine (/CONFIG,NPROC,N) Distributed ANSYS (D-ANSYS) Supported features?

8 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-8 9.0 New Features Analysis Types Supported: –Structural Analyses Supported (Single Field) For any single field structural problems Any combination of ux, uy, uz, rotx, roty, rotz and warp DOF Linear Static Analysis Nonlinear Static Analysis Full transient analyses –Thermal Analyses Supported (Single Field) Temperature degree of freedom only Steady State Thermal Full Transient Thermal Distributed ANSYS (D-ANSYS) Supported features?

9 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-9 9.0 New Features Structural Nonlinearities Supported: –Large strain, large deflection (NLGEOM,ON) for all structural elements –Nonlinear material properties specified by the TB command –Contact nonlinearities modeled by contact elements 169 through 178 and 52 –Gasket elements (192 – 195) –Pre-tension elements (179) –18X elements with U/P formulations and contact 169 to 178 are ONLY supported by shared memory sparse solver Distributed ANSYS (D-ANSYS) Supported features?

10 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-10 9.0 New Features DPCG within Distributed ANSYS 44 MDOF Engine Block Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

11 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-11 9.0 New Features DPCG within Distributed ANSYS 7.1 MDOF Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

12 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-12 9.0 New Features DSPARSE within Distributed ANSYS 3.5 MDOF Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

13 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-13 9.0 New Features DSPARSE within Distributed ANSYS 0.8 MDOF Stress Analysis Distributed ANSYS (D-ANSYS) Timing Results!

14 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-14 9.0 New Features DSPARSE within Distributed ANSYS 1.7 MDOF Nonlinear Stress Analysis, 9 Newton Iterations Distributed ANSYS (D-ANSYS) Timing Results!

15 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-15 9.0 New Features Parallel Performance for ANSYS Available Solvers Distributed ANSYS Solvers: –All three phases distributed Distributed Preconditioned Conjugate Gradient (DPCG) Distributed Jacobi Conjugate Gradient (DJCG) Distributed Sparse (DSPARSE) –Element Formulation and Stress Recovery Distributed Sparse (SPARSE)

16 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-16 9.0 New Features Parallel Performance for ANSYS Available Solvers ANSYS Solvers (Sequential ANSYS): –Only Matrix Solution Distributed Distributed Preconditioned Conjugate Gradient (DPCG) Distributed Jacobi Conjugate Gradient (DJCG) Distributed Domain Solver (DDS) –Only Matrix Solution Distributed on Shared Memory Algebraic Multi-Grid Solver (AMG)

17 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-17 9.0 New Features Parallel Performance for ANSYS Licensing Computational Solvers Master/Parent Prep/Post Meshing Licensing per ANALYSIS: One ANSYS License per ANALYSIS One Parallel Performance for ANSYS CAD One ANSYS License One PPFA License CPU #1 CPU #2 CPU #3 CPU #4 CPU #1 CPU #2 CPU #3 CPU #4 CPU #1 CPU #2 CPU #3 CPU #4 CPU #1 CPU #2 CPU #3 CPU #4 Machine #1Machine #2Machine #3Machine #4 NOT per CPU!!!!

18 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-18 9.0 New Features Parallel Performance for ANSYS Distributed Memory Requirements How much memory is required? –Machine #1 must contain its workload and the entire preconditioner. –Machines 2 through N need to contain their workload only. –Rules of thumb for PCG/DPCG 1GB of Memory/1 Million DOF 100 Mb / 1 Million DOF for the Preconditioner

19 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-19 9.0 New Features For a given amount of memory, how big a problem can be solved? –For the Machine #1: –For Machines 2 through N: Where MSAVE_Factor =1.0 for MSAVE,OFF 0.7 for SOLID95 and MSAVE,ON 0.5 for SOLID92 and MSAVE,ON Parallel Performance for ANSYS Distributed Memory Requirements

20 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-20 9.0 New Features For a given problem, how much memory is needed? –For the Machine #1: –For Machines 2 through N: Where MSAVE_Factor =1.0 for MSAVE,OFF 0.7 for SOLID95 and MSAVE,ON 0.5 for SOLID92 and MSAVE,ON Parallel Performance for ANSYS Distributed Memory Requirements

21 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-21 9.0 New Features How big a problem can be solved? –Example 1: 32 bit PC’s with 1 Gb RAM Each including Machine #1 8 Machines in Total MSAVE,OFF For Machine #1: Parallel Performance for ANSYS Distributed Memory Requirements

22 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-22 9.0 New Features How big a problem can be solved? –Example 2: 32 bit PC’s with 2.2 Gb RAM Available on Each (with the /3 Gb switch) including the Machine #1 4 Machines in Total MSAVE,ON Model consisting wholly of Solid92 elements For Machine #1: Parallel Performance for ANSYS Distributed Memory Requirements

23 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-23 9.0 New Features Wing Test Case Model –37,072,698 Nodes –8,744,744 SOLID95 Elements –111,218,094 Degrees of Freedom!!!!!! –14,481 DOF Constraints –1 Load Case System Resources –Used 6-CPUs out of 8 CPUs on an SGI Altix System –Used 50 Gbytes of 64 Gbytes available –Linux 64-bit Operating System What’s Possible with Parallel Computing New Boundaries – 111 MDOF

24 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-24 9.0 New Features Number of Hosts/Processors: 6 Degrees of Freedom: 111218094 Elements: 8744744 Assembled: 0 Implicit: 8744744 DETAILS OF PCG SOLVER SOLUTION TIME(secs) Cpu Wall Element Matrix Assembly 750.48 1219.34 Preconditioner Construction 676.60 1087.32 Preconditioner Factoring 9.27 14.45 Preconditioned CG Iterations 26121.80 26310.64 Multiply With A 11713.69 11898.36 Solve With Precond 9331.68 9331.00 ************************************************************ TOTAL PCG SOLVER SOLUTION CP TIME = 177799.24 secs TOTAL PCG SOLVER SOLUTION ELAPSED TIME = 30819.09 secs DPCG solves 111 Million DOFs! 6-CPU run on 8 CPU SGI Altix System 8.6 Hours Solver Time 6 Distributed MPI processors Using MSAVE,ON memory saving option What’s Possible with Parallel Computing New Boundaries – 111 MDOF

25 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-25 9.0 New Features The Enablers: –Size is conquered by addressing Big Memory –Time is conquered by Parallel Computing The Measure of Success: –Demonstrated solving a variety of large problems in hours, NOT days! –Solved 111 MDOF Structural problem in 8.6 Hours solver time What’s Possible with Parallel Computing New Boundaries – 111 MDOF

26 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-26 9.0 New Features CAE Breakthrough: A Customer’s Perspective “ANSYS' ability to solve models this large opens the door to an entirely new simulation paradigm. … Now, it will be possible to simulate a detailed, complete model directly; potentially shortening design time from months to weeks. … This may greatly reduce additional design costs and can provide an even shorter time to market." Jin Qian Senior Analyst Deere & Company Technical Center What’s Possible with Parallel Computing New Boundaries – 111 MDOF

27 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-27 9.0 New Features Inventor Test Case Model –3,500,550 Nodes –2,405,636 Elements –18,993 DOF Constraints –10.5 Million Degrees of Freedom!!!!!! System Resources –11 Xeon CPUs on a Linux Networx Cluster –Machine #1 used 2.4 Gb of Memory (/3 Gb Switch) –Total Memory 9.0 Gb –Linux 32-bit Operating System What’s Possible with Parallel Computing New Boundaries – 32-bit, 10.5 MDOF

28 ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. October 1, 2004 Inventory #002156 5-28 9.0 New Features Number of Hosts/Processors: 11 Degrees of Freedom: 10501650 Elements: 2405636 Assembled: 336114 Implicit: 2069522 Nodes: 3500550 DETAILS OF PCG SOLVER SOLUTION TIME(secs) Cpu Wall Element Matrix Assembly 40.78 51.19 Preconditioner Construction 260.66 375.64 Preconditioned CG Iterations 3321.51 3358.00 Multiply With A 1915.64 1915.72 Solve With Precond 448.09 448.12 ****************************************************************************** TOTAL PCG SOLVER SOLUTION CP TIME = 40126.00 secs TOTAL PCG SOLVER SOLUTION ELAPSED TIME = 3764.74 secs Distributed ANSYS & DPCG solves 10.5 MDOFs! 11-CPU run on an Linux Networx Cluster with Linux 32-bit 1.05 Hours Solver Time???? 11 Distributed MPI processors Using MSAVE,ON memory saving option What’s Possible with Parallel Computing New Boundaries – 32-bit, 10.5 MDOF


Download ppt "ANSYS, Inc. Proprietary © 2004 ANSYS, Inc. Chapter 5 Distributed Memory Parallel Computing v9.0."

Similar presentations


Ads by Google