Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007.

Similar presentations


Presentation on theme: "Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007."— Presentation transcript:

1 Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007

2 Outline Research Directions Simulation test case results Overview of Simulation Commercial Package Alternating direction implicit (ADI) Method General Operator Splitting Method Distributed Computing Conclusions and Future Works

3 Research Directions Simulation: SPICE, STA Network on Chip: topology and wire styles, Power, and Clock Networks Data Path Components: adders, shifters, multipliers, division Packaging: passive distortion compensation

4 6x6 Bump Simulation Results The Circuit: –184K Capacitors, 17K Current Sources, 120K Inductors and 246K Resistors. –306K Nodes Accuracy: –Waveform and measurement results match Fujitsu ’ s with less than 0.002% error. Runtime / Memory Comparison: CPU_TimeMemoryComputer Used UCSD678s600.2MPentium 4 3.2G, Linux Fujistu Log File1845s771Munknown

5 6x6 Bump Simulation Results Measurement results and waveform Min_pwr_l_est_10000954Min_18269323Min_33085875 UCSD0.99807900.99673570.9934251 Fujistu Log File0.99806200.99669400.9933790 Error0.002%0.004%0.005% (Red curve is UCSD result)

6 703KR Simulation Results The Circuit: –514K Capacitors, 76K Current Sources, 370K Inductors and 703K Resistors. –1.3M Nodes Accuracy: –Measurement results match Fujitsu ’ s with less than 0.02% error. Runtime / Memory Comparison: CPU_TimeMemoryComputer Used UCSD2575s (0.7h)1.7GPentium 4 3.2G, Linux Fujistu Log File864561s (240h)2.28Gunknown

7 703KR Simulation Results Measurement results and waveform Min_33096003Min_33096004Min_33097557 UCSD0.94009880.94211570.9370827 Fujistu Log File0.93996100.94192600.9368400 Error0.015%0.02%0.026% (UCSD results only. Fujitsu waveform is not available for comparison)

8 Further Speed-ups Reduce iteration count by 50% for pure linear circuits (like 6x6 bump and 703KR) –2x speed up More effective time step control –DVDT, breakpoint, truncation error. 1.5 - 3x speed up Use Multigrid solver –1.5 - 2x speed up for medium circuits (6x6 bump) –2x – 10x speed up for large circuits (703KR) Parallel simulation –4 or more processors on linux cluster –32 to hundreds of processors on supercomputer. Overall speed-up –6x - 60x speed up without parallel simulation –12x - 1000x speed up with parallel simulation

9 Performance and capacity prediction Cases 10x-100x larger than 703KR. Preferred SolverCpu TimeMemory Small - Medium 0.3M nodes LU Decomposition11 minutes600M Medium - Large 1.3M nodes Multigrid43 minutes1.7G Huge 10–100 M nodes Multigrid + Parallel 5 – 100 hours15G - 200G

10 Overview of Simulation Our research Fast speed with SPICE accuracy Nonlinear devices Efficient matrix solvers Effective integration methods Time step controls according to different integration methods Distributed computing Yes Load Circuit Device Evaluation LU Decomposition N-R Converge? Next Time Point Time Step Control Integration Approximation Linearization No

11 Overview of Simulation Matrix Solver LU Decomposition Iterative Approach Integration Time Step Control ADI Nonlinear Devices Two Stage Newton Raphson Distributed Computing Commercial Implementation

12 Overview of Simulation Integration Time Step Control ADI (two-way partitioning) Operator Splitting (multi-way) Distributed Computing MPI Partitioning Three Ph.D. Students

13 Commercial Package: Fastrack Design Founded in January 2001 Headquartered in San Jose Privately funded, cash-flow positive Two Business Units Design Services Technology Products

14 Analog Designs Design # Elements Sim. Len HSpicemSPICESPEEDUPFACTOR LVDS1349020us80h26h 3.1X Oscillator2221 ms13,706s2,670s 5.1X Biasing Circuit 49197200ns427s82s 5.2X PLL1605040us67d12d 5.6X PLL (post-layout) 300K40us290d (est)16d 18.1X

15 Digital Blocks DesignNameDevicesRuntime Speedup Factor MOSRCmSPICE Traditional Spice ALU10.1k12.7k7.5k6.9m7m1.0X CONTROL69k83.7k52.5k1.5h9.5h6.3X YN_BLK205K242.8k203.9k3.5h> 2d>13.7X THP437k499.3k313.5k5.0hCOULD NOT RUN ∞ VCON936k753k561k15.0hCOULD NOT RUN ∞

16 Memory Blocks Design#Tr#R#C # Vectors / Sim. Length mSPICE Run Time BRAM (pre)220K050022.5 hours SRAM (pre) 8Kx8 SP 410K0027 hours eRAM (post) 256x16 72K28K427K48ns8 hours BRAM (post)220K1320K870K218 hours 100% accurate Spice simulation

17 mSPICE-Parallel Industry’s first practical parallel Spice simulation solution –Increases capacity further –Dramatically improves throughput Uses Matrix Level Partitioning –No loss of accuracy –Client-Server configuration –Minimal memory requirement for client nodes

18 Client-Server Configuration Server distributes sub-matrices to clients Clients communicate partial solutions Minimal memory requirements for clients 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1

19 Experimental Results DesignTotalElements Sim. Length Runtime 1-proc2-proc4-proc ASIC1.2M8ns12.2h7.0h (1.7X) 5.1h (2.4X) 38IO SSO1.4M30ns3.0h2.0h (1.5X) 1.4h (2.2X) Signal-power2.1M1.2us13d7d18h (1.7X) 5d12h (2.4X) 4096x8 RAM (extracted) 2.3M10ns32h18.5h (1.7X) 13.4h (2.4X) 120IO SSO3.5M30ns6.2h4.1h (1.5X) 3.1h (2.0X)

20 ADI: Previous Works 1999, Namiki and Ito –the alternating direction implicit (ADI) is used to simulate a 2D TE wave. 2001, Zheng etc. –extend to 3D problem 2001 & 2003, Lee and Chen –ADI is used to transmission line modeled power grid The alternation is among different geometric directions, so the simulated geometric structure is constrained.

21 Alternating Direction Implicit (ADI) ADI Integration Method –Two way partition of the circuit –One partition is used for each backward integration –Unconditional stable (A-stable: independent of time step size) –Time step size according to local truncation error.

22 Alternating Direction Implicit (ADI) ADI method formulation Circuit partition algorithm Local truncation error estimation Stability discussion Experimental results

23 SPICE Formulation Equations for RLC circuits where C: capacitance matrix L: inductance matrix R: resistance matrix G: conductance matrix E: incidence matrix

24 ADI Formulation Transient simulation –Split the resistors and inductors branches into two parts G = G1 + G2 E = E1 + E2 R = R1 + R2 –Alternate Backward and Forward integration on each partition

25 ADI Formulation (Cont.) Equations of ADI method –the size of left-hand-side matrix remains unchanged –the number of non-zero elements is decreased –direct solving methods can be efficient

26 Experiments of non-zero fill-ins A small ASIC Design Spice matrix : Dimension: 10,286 The number of non-zero elements: 46,655 The number of non-zero fill-ins: 90,960 A large I/O Design Spice matrix : Dimension: 615,436 The number of non-zero elements: 2,126,246 Sub-matrix1Sub-matrix2Total # non-zero fill-ins # non-zero elements # non-zero fill-ins # non-zero elements # non-zero fill-ins Case 138,5722,61842,02010,04012,658 Case 21,176,20812,421,534950,03814,772,06827,193,602

27 Local Truncation Error (LTE) Time step control using LTE –In circuit transient analysis, the next time step can be estimated from the local truncation error at the present time point –LTE is defined as the difference between the calculated solution and the exact solution –To ensure the consistency, the local truncation error should not exceed the error tolerance, thus the time step can be estimated using

28 Local Truncation Error (Cont.) LTE of ADI method (1) equations let,, and then

29 Local Truncation Error (Cont.) LTE of ADI method (2) Estimate exact solution we characterize the input as a simple ramp over the interval (t n, t n+1 ), the exact analytic solution with time step  t n:

30 Local Truncation Error (Cont.) LTE of ADI method (3) Estimate ADI solution

31 Local Truncation Error (Cont.) LTE of ADI method (3) Estimate ADI solution

32 Local Truncation Error (Cont.) LTE of ADI method (4) LTE estimation

33 Local Truncation Error (Cont.) LTE of ADI method (5) Time step control

34 Local Truncation Error (Cont.) LTE of ADI method (5) Time step control

35 Stability Discussion The stability is concerned with whether the accumulated error grows or decays as time evolves through a series of time steps. One-step integration approximations, the error is accumulated by a factor of If the final steady state error vector is smaller than the initial, then the integration method is stable. In ADI integration method: –It can be proved to be unconditional stable

36 Experimental Results Circuit1Cuicuit2Circuit31k-cell #Nodes10,00040,00090,00010,200 #Transistors0006,500 Period10ns SPICE3CPU time (sec)77.8485.33,061.1181.6 #steps115 114193 ADICPU time (sec)28.6117.8275.2523.3 #steps102 949 Speedup2.7x4.1x11.1x-

37 Voltage drop of Circuit3 (power mesh with sinks)

38 Signal in 1k_cell (ASIC design)

39 General Operator Splitting General operator splitting method –Multiple way partitions –Each partition is considered separately in each time step simulation –No geometry constrains –Local truncation error is used to dynamically control time step size

40 General Operator Splitting Fundamental theory Operator splitting formulation Local truncation error estimation Stability discussion Experimental results

41 Fundamental theory In circuit transient simulation, the integration approximation is actually the approximation of the exponential operator The exponential operators can be approximated in any order using a general scheme of fractal decomposition The decomposition of exponential operators corresponds to the circuit multi-way partition  New integration approximation in transient simulation

42 Fundamental theory Approximation of exponential operator –General circuit equation and solution –If we characterize the input as a simple ramp over the interval (t n, t n+1 ), the exact analytic solution with time step  t n –Exponential operator approximation Forward Euler Backward Euler Trapezoidal

43 Fundamental theory Decomposition of exponential operators (Masuo Suzuki, 1991, Physics) –Function –First order: –Second order: –Third order: –(2m-1)th and (2m)th order:

44 Fundamental theory Decomposition of exponential operators

45 General Operator Splitting Formulation Transient simulation: –Apply the second order approximation –In each time step, every partition is calculated separately and trapezoidal integration is used for every partition –The size of left-hand-side matrix may be changed –The number of non-zero elements is definitely decreased –Can be easily extended to multi-way partitions

46 General Operator Splitting Formulation Equations

47 Local Truncation Error (Cont.) LTE of general operator splitting method Estimate solution

48 Local Truncation Error (Cont.) LTE of general operator splitting method Estimate solution

49 Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation

50 Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation

51 Local Truncation Error (Cont.) LTE of general operator splitting method LTE estimation

52 Stability Discussion The trapezoidal integration method is unconditional stable for stable system. In our operator splitting method, trapezoidal method is used for all the sub-systems still unconditional stable

53 Experimental Results Circuit1Cuicuit2Circuit3 #Nodes10,00040,00090,000 #Transistors000 Period10ns SPICE3CPU time (sec)77.8485.33,061.1 #steps115 114 GOSCPU time (sec)164.71011.63435.9 #steps102 Comparison2.1x2x1.1x

54 Voltage drop of Circuit3 (power mesh with sinks)

55 Conclusions We investigate alternating direction implicit and general operator splitting integration methods for transistor-level circuit transient simulation. In both methods, the circuit will be divided into several sub-circuits, thus the direct matrix solver is still efficient because the matrix is simplified. Both methods are second order accurate and unconditional stable. Overhead: –Circuit partition –Each time step consists of many sub-steps, each sub-step is a N-R iteration process Better for circuits with large linear network

56 Distributed Processors –Cluster –Supercomputer –Multi-Core Processors (Intel Dual/Quad-Core, IBM Cell etc.) Standard –MPI –Partitioning –Matrix Solver Capabilities –Speed-up (10-100+) –Memory Capacity (10-100+) Distributed Computing

57 Future Works ADI method –More experiments General operator splitting method –Design and implement multi-way circuit partition algorithm –Implement multi-way general operator splitting program –Derive LTE for general multi-way situation –More experiments Distributed Computing –MPI Standard –Distributed Partitioning, Matrix Solver

58


Download ppt "Transient Analysis CK Cheng UC San Diego CK Cheng UC San Diego Jan. 25, 2007."

Similar presentations


Ads by Google