Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant.

Similar presentations


Presentation on theme: "Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant."— Presentation transcript:

1 Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant Mahawar, Texas A&M University Acknowledgements: National Science Foundation.

2 HiPC 2004 1 Outline Inductance Extraction Underlying Linear System The Solenoidal Basis Method Hierarchical Algorithms Parallel Formulations Experimental Results

3 HiPC 2004 2 Inductance Extraction Inductance Property of electric circuit to oppose change in its current Electromotive force (emf) is induced Self Inductance, Mutual Inductance – between conductors Inductance extraction Signal delays in circuits depend on parasitic R, L, C At high frequency – signal delays dominated by parasitic inductance Accurate estimation of inductive coupling for circuit components Credits: oea.com

4 HiPC 2004 3 Inductance Extraction … Inductance Extraction For a set of s conductors – compute s x s impedance matrix Z Z – self and mutual impedance among conductors Conductors are discretized using a uniform two dimensional mesh for accurate impedance calculation

5 HiPC 2004 4 Constraints Current density at a point Voltage drop across filaments – filament current & voltage Kirchoff’s law at nodes Potential difference in terms of node voltage Inductance matrix – function of 1/r

6 HiPC 2004 5 Linear System System Matrix Characteristics: R – diagonal; B – sparse; L – dense Solution Method Iterative methods – GMRES Dense matrix-vector product with L hierarchical methods, matrix-free approach Challenge Effective Preconditioning in absence of system matrix

7 HiPC 2004 6 Solenoidal Basis Method Linear system with modified RHS Solenoidal basis Automatically satisfies conservation laws - Kirchoff’s current law Mesh currents - basis for filament current Solenoidal basis matrix P: Current obeys Kirchoff’s law: Reduced system

8 HiPC 2004 7 Problem Size Number of unknowns for ground plane problem Mesh Potential Nodes Current Filaments Linear System Solenoidal functions 33x331,0892,1123,2011,024 65x654,2258,32012,5454,096 129x12916,64133,02449,66516,384 257x25766,049131,584197,63365,536 513x513263,169525,312788,481262,144

9 HiPC 2004 8 Hierarchical Methods Matrix-vector product with n x n matrix – O (n 2 ) Faster matrix-vector product Matrix-free approach Appel’s algorithm, Barnes-Hut method Particle-cluster interactions – O (n lg n) Fast Multipole method Cluster-cluster interactions – O (n) Hierarchical refinement of underlying domain 2-D – quad-tree, 3-D – oct-tree Rely on decaying 1/r kernel functions Compute approximate matrix-vector product at the cost of accuracy

10 HiPC 2004 9 Hierarchical Methods … Fast Multipole Method (FMM) Divides the domain recursively into 8 sub-domain Up-traversal computes multipole coefficients to give the effects of all the points inside a node at a far-way point Down-traversal computes local coefficients to get the effect of all far-away points inside a node Direct interactions – for near by points Computation complexity – O ((d+1) 4 *N) d – multipole degree

11 HiPC 2004 10 Hierarchical Methods … Hierarchical Multipole Method (HMM) Augmented Barnes-Hut method or variant of FMM Up-traversal Same as FMM For each particle Multipole-acceptance-criteria (MAC) - ratio of distance of the particle from the center of the box to the dimension of the box use MAC to determine if multipole coefficients should be used to get the effect of all far-away points or not Direct interactions – for near by points Computation complexity – O ((d+1) 2 *N lg N)

12 HiPC 2004 11 ParIS: Parallel Solver Application - inductance extraction Solve reduced system with preconditioned iterative method Iterative method – GMRES Dense matrix-vector product with preconditioner and coefficient matrix Dense matrix-vector product dominates the computational cost of the algorithm Use of hierarchical methods to computes potential – inductive effect on filaments Vector inner products Negligible computation and communication cost

13 HiPC 2004 12 Parallelization Scheme Two tier parallelization Each conductor - filaments and associated oct-tree Conductors – across MPI processes Within a conductor – OpenMP process Pruning of tree to obtain sub-trees Computation at top few levels of the tree is sequential OpenMP

14 HiPC 2004 13 Experiments Experiments on Interconnect Cross over problem 2 cm long, 2mm wide Distance between conductors within a layer -.3 mm and across layers - 3 mm Non-uniform distribution of conductors Comparison between FMM and HMM Parallel Platform Beowulf cluster – Texas A&M University 64bit AMD – Opteron LAM/MPI on SuSE-Linux – GNU compilers 1.4 GHz, 128 dual-processor nodes, Gigabit ethernet

15 HiPC 2004 14 Cross Over Interconnects

16 HiPC 2004 15 Parameters d – multipole degree α – multipole acceptance criteria s – number of particles per leaf node in tree Since d and α influence accuracy of matrix-vector product Impedance errors are kept similar – within 1% of a reference value computed by FMM with d = 8 Scaled Efficiency E = BOPS/p BOPS = average number of base operations per second p = number of processors used

17 HiPC 2004 16 Experimental Results Effect of multipole degree (d) for different choice of s FMM code HMM code

18 HiPC 2004 17 Experimental Results Effect of multipole degree (d) for different choice of s Time in secs d FMM code HMM code s=2s=8s=32s=128s=2s=8s=32s=128 149.518.312.729.925.721.521.334.8 2225.862.525.332.846.836.531.341.9 41513.3398.2110.850.7110.884.563.061.9

19 HiPC 2004 18 Experimental Results Effect of MAC on HMM for different choice of s and d Varying s Varying d

20 HiPC 2004 19 Experimental Results … Effect of MAC on HMM for different choice of s and d Time in secs αd=1d=2d=4 121.536.584.5 1.540.170.6158.2 αs=2s=8s=32 146.836.531.3 1.589.370.659.5

21 HiPC 2004 20 Experimental Results Effect of multipole degree (d) on the HMM code on p processors for two different choice of s s = 8 s = 32

22 HiPC 2004 21 Experimental Results Effect of multipole degree (d) on the HMM code on p processors for two different choice of s Time in secs d s = 8 s = 32 p=1p=2p=4p=8p=1p=2p=4p=8 121.526.550.9105.821.324.448.894.1 236.546.596.5184.331.338.377.9157.5 484.5101.9220.9436.863.078.2169.6347.9

23 HiPC 2004 22 Experimental Results Effect of multipole degree (d) on the FMM code on p processors for two different choice of s s = 8 s = 32

24 HiPC 2004 23 Experimental Results Effect of multipole degree (d) on the FMM code on p processors for two different choice of s Time in secs d s = 8 s = 32 p=1p=2p=4p=8p=1p=2p=4p=8 118.325.734.559.212.713.940.494.4 262.572.587.5131.325.326.658.0126.3 4398.2431.4470.9683.3110.8113.4165.7277.8

25 HiPC 2004 24 Experimental Results … Parallel efficiency of the extraction codes for different choice of d FMM code HMM code

26 HiPC 2004 25 Experimental Results Parallel efficiency of the extraction codes for different choice of d d FMM code HMM code p=1p=2p=4p=8p=1p=2p=4p=8 10.990.930.940.860.980.740.87 21.000.920.900.920.990.860.970.98 41.000.980.930.941.000.931.040.98

27 HiPC 2004 26 Experimental Results … Ratio of execution time of FMM to HMM code on p processor for different choice of d s = 8 s = 32

28 HiPC 2004 27 Experimental Results Ratio of execution time of FMM to HMM code on p processor for different choice of d d s = 8 s = 32 p=1p=2p=4p=8p=1p=2p=4p=8 10.91.00.70.6 0.81.0 21.71.60.90.70.80.7 0.8 44.74.22.11.61.81.41.00.8

29 HiPC 2004 28 Concluding Remarks FMM execution time – O ((d+1) 4 N) HMM execution time - O ((d+1) 2 N lg N) For HMM increase in MAC (α) – increase in time and accuracy for matrix-vector product FMM achieves higher parallel efficiency for large d When the number of particles per leaf node (s) is smaller, HMM outperforms FMM in execution time Parallel implementation, ParIS, is scalable and achieves high parallel efficiency

30 HiPC 2004 29 Thank You !!


Download ppt "Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant."

Similar presentations


Ads by Google