Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P.

Similar presentations


Presentation on theme: "A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P."— Presentation transcript:

1 A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P. Khatri Texas A&M University, College Station, TX

2 Outline  Motivation  Computation of Don’t Cares  ACODC Algorithm  Proof of correctness  Experimental Results  Possible extensions  Conclusions

3 Motivation … …..  z1z1 z2z2 z3z3 zpzp x1x1 x2x2 x3x3 xnxn y j = F j y1y1 y2y2 ywyw  Technology independent logic optimization  Typically compute Don’t Cares after a higher level description of a design is encoded and translated into gate level description.  Don’t Cares (DCs)  eXternal Don’t Cares (XDCs)  Satisfiability Don’t Cares (SDCs)  Observability Don’t Cares (ODCs)

4 Motivation - 2  The DCs computed are a function of the PIs and internal variables of the Boolean network  Image computation used to express the DCs in terms of node fanins  ROBDD based operation  Finally, the node function is minimized (using ESPRESSO) with respect to the computed (local) DCs  Literal count reduction is the figure of merit

5 Don’t Cares  ODC based  Very powerful, represent maximum flexibility  Minimizing a node j with respect to its ODC requires recomputation of other nodes’ ODCs  Compatible ODC (CODC) based  Subset of ODC, requires ordering of fanins  Recomputation not required, useful in many cases  In either case, image computation required  To obtain DCs in the fanin support of the node  Involves ROBDD computation  Not robust

6  Note that  is the consensus operator  The first fanin has which is the maximum flexibility  A new edge e ik should have its CODC as the conjunction of with the condition that other inputs j < i are not insensitive to input y j ( ) or are independent of y j ( ) CODC Computation  Traverse circuit in reverse topological order  CODC of primary output z initialized to its XDC  Computation performed in 2 phases for each node  Phase 1 ykyk fkfk y1y1 y2y2 y i-1 yiyi y 1 < y 2 < … < y i 

7 CODC Computation  Phase 2 - image computation using ROBDDs  Build global BDDs of each node in the network, including POs  For large circuits this step fails  This is the main weakness of the CODC computation  Next compute CODCs of node k in terms of PIs  Substitute each internal node literal by its global BDD  Compute image of this function in the space of local fanins of node k  Yields CODC in terms of local fanins of node k  Finally, call ESPRESSO on the cover of node k, with the newly computed CODC as don’t care

8 Contributions of this Work  Perform CODC based Don’t Care computation approximately  Yields 25X speedup  Yields 33X reduction in memory utilization  Obtains 80% of the literal reduction of the full CODC computation  Handles large circuits extremely fast (circuits which CODC based computation times out on)  Formal proof of correctness of the approximate CODC technique

9 Approximate CODCs  Consider a sub-network rooted at the node j of interest  Sub-network can have user defined topological depth k  Compute the CODC of j in the sub-network (called ACODC)  This ACODC is a subset of the CODC of j jjjj j

10 Algorithm Traverse η in reverse topological order for (each node j in network η) do η j = extract_subnetwork(j,k) ACODC(j) = compute_acodc(η j,j) optimize(j,ACODC(j)) end for

11 Proof of Correctness  Terminology  Boolean network ηxz  X primary inputs  Z primary outputs  W and V are two cuts  ηxw, ηvz and ηvw define sub-networks  is the CODC of y k where P is either X or V and Q is either W or Z  is the CODC of y k mapped back to its fanin support after image computation vw x z y1y1 y2y2 y i-1 yiyi ykyk fkfk

12 Cutset as Primary Output  To show ≥  For any PO z, = ø  For, ≠ ø  For W nodes as POs, = ø  CODC computation of y k is identical for both cases except last term in equation  In general, the last term for a node in first case, contains last term for same node in latter case since ≥  Hence ≥ w x z ykyk fkfk y1y1 y2y2 y i-1 yiyi

13 Cutset as Primary Input  Define  To compute ACODC at y k, compute, then compute image I 1 of this on the V space, and then project the result back to local fanins of y k  The full CODC is.We then compute the image I 2 of this on the X space, and next project the result back to local fanins of y k  I 3 is projection of I 2 on V  Hence  Therefore I 3 ≥ I 1  Finally, ≥ v x z ykyk fkfk y1y1 y2y2 y i-1 yiyi I1I1 I2I2 I3I3

14 Cutsets as Primary Input and Primary Output  This result follows directly from the previous two proofs as they are orthogonal  Hence ≤ w x z ykyk fkfk y1y1 y2y2 y i-1 yiyi v  Therefore, an ACODC computation which utilizes a sub- network of depth k rooted at any node yields a subset of the full CODC of the node.  This proves the correctness of our method.

15 Experimental Results  Implemented in SIS  Used mcnc91 and itc99 benchmark circuits  Run on IBM IntelliStation (1.7 GHz Pentium-4 with 1 GB RAM) running Linux  Our algorithm is built as a replacement to full_simplify  Read design and run ACODC algorithm followed by sweep  Compare our method by running full_simplify followed by sweep

16 Metrics for Comparison  3 measures of effectiveness for comparison with full_simplify  Effectiveness #1 compares the ratio of the number of minterms computed by our technique compared to that for full_simplify  Effectiveness #2 compares the number of nodes for which ACODCs and CODCs are identical  We also compare the literal count reduction obtained by both techniques

17 Effectiveness Results CircuitEff1 (k=4)Eff1 (k=6)Eff2 (k=4)Eff2 (k=6)Lits-originalLits % (fs)Lits % (k=4)Lits % (k=6) C135598.04 98.34 10324.653.88 C190881.5684.6987.1388.89149737.1430.6631.46 C267094.13 86.79 204339.3032.94 C43271.43 92.81 37219.899.95 C49998.56 97.34 6167.796.50 C88080.0084.4494.5695.7770311.109.6710.38 C354085.4397.8184.1597.51293433.7826.8928.42 dalu78.0079.8675.7879.55358839.689.70 i1099.34 85.45 537629.5527.47 b01_C92.68 83.33 8045.0043.75 b03_C68.8975.5687.2389.4325460.0039.3741.34 b04_C63.42 85.35 126731.9628.65 b05_C74.7084.8576.3888.02185845.8014.96 b06_C92.11 87.10 8351.8045.78 b07_C69.5281.9091.0295.2174911.8811.08 b08_C98.33 96.09 3069.809.48 b09_C79.0095.0083.0490.1827761.0044.0445.85 b10_C80.6583.8792.9094.1935312.3911.05 b11_C83.4485.9489.5091.65137822.7114.36 b12_C67.1079.0487.1790.92196724.055.80 b13_C65.08 91.53 55818.8110.57 AVG81.9785.5287.8589.52-28.3622.3422.82  Literal reduction about 80% of full_simplify  Very little improvement from k=4 to k=6

18  Runtime is about 25X better than full_simplify  Memory utilization is about 33X better than full_simplify Runtime and Memory Results CircuitTime (fs)Time % (k=4)Time % (k=6) C135539.281.661.80 C190854.682.402.50 C267011.774.204.66 C4324.911.251.45 C4992.411.201.31 C8802.050.700.72 C3540835.6425.2527.45 dalu210.096.237.12 i10332.228.569.21 b01_C0.030.05 b03_C0.190.20 b04_C12.151.471.66 b05_C24.502.432.50 b06_C0.04 b07_C4.040.840.86 b08_C0.30 b09_C0.370.260.27 b10_C0.400.300.32 b11_C9.970.260.34 b12_C81.723.233.55 b13_C0.280.200.21 AVG-0.0370.041 Mem (fs)Mem (k=4)Mem (k=6) 3127323066 1062883066 1727184088 2892266132 991348176 7358420443066 3217468176 4997581124212264 5079344088 1022 11753020443066 255505110 1022 265722044 1022 20441022 224842044 153304088 20441022 -0.0280.032

19 Circuit#Nodes#Literal s Node%Lit%Time(s)Mem C6288241648004.393.693.651022 C75523466609840.6826.156.119198 b1497681891717.1510.12117.60105582 b14_165701288620.039.5619.0450078 b20196833821318.648.6865.3969496 b20_1139002707418.968.9539.3434748 b21200283899317.918.4866.3765408 b21_1138992716417.569.4538.3234748 AVG--18.779.49-- Results for Large Circuits  full_simplify did not complete for all the examples below  k = 4 for these experiments Maximum runtime < 2 minutes Peak memory utilization < 106K BDD nodes

20 Possible Extensions  Can compute AODCs in a similar fashion  Yields more flexibility at a node  However, each node must be minimized after its AODC computation  Compatibility not maintained  Useful if only node minimization is desired  Compatibility is useful if the nodes are to be optimized simultaneously at a later stage  Proof of correctness is similar

21 Conclusions  Presented a robust technique for ACODC computation  Dynamic extraction of sub-networks to compute CODCs  ACODCs computed exactly once for a node  19% reduction in node count and 9.5% reduction in literal count (large circuits)  23% reduction in literal count as compared to 28.5% for full_simplify (medium circuits)  25X better run-time than full_simplify  33X better memory utilization than full_simplify


Download ppt "A Robust Algorithm for Approximate Compatible Observability Don’t Care (CODC) Computation Nikhil S. Saluja University of Colorado Boulder, CO Sunil P."

Similar presentations


Ads by Google