Alan Mishchenko Robert Brayton UC Berkeley

Alan Mishchenko Robert Brayton UC Berkeley
Sequential Optimization by Detecting and Merging Sequentially Equivalent Nodes Alan Mishchenko Robert Brayton UC Berkeley

Overview Introduction Computations Verification Experiments
SAT sweeping Induction Partitioning Verification Experiments Future work

Introduction Combinational synthesis
Cuts at the register boundary Preserves state encoding, scan chains & test vectors No sequential optimization – easy to verify The traditional sequential synthesis Runs retiming, re-encoding, uses sequential don’t-cares, etc Changes state encoding, invalidates scan chains & test vectors Some degree of sequential optimization – hard to verify The proposed sequential synthesis Merges sequentially equivalent registers and internal nodes Minor change to state encoding, scan chains & test vectors Some degree of sequential optimization – easy to verify!

Comb and Seq Equivalences
Comb equivalence: nodes A and B are equal for all input combinations Seq equivalence: nodes A and B are equal only for the input combinations contained in reachable states A B A B Structural view Functional view Complete state space Reachable state space

How It Works? Can handle 1M gates and 100K flops flat with runtime in minutes on a single core Uses SAT solver to compute subsets of reachable states without computing full reachable states No BDDs, no partitioning, no hints based on names

Why Improvements? Why do we get 10%+ average reduction in the number of flops/area/power for some design families and benchmark suites? Redundant registers generated by RTL compilers Design reuse, when redundant blocks (or redundant functions) are kept for the sake of design integrity Logic duplication for modularity

Combinational SAT Sweeping
Applying SAT to the output ? SAT Naïve CEC approach – SAT solving Build output miter and call SAT works well for many easy problems Better CEC approach – SAT sweeping based on incremental SAT solving Detects possibly equivalent nodes using simulation Candidate constant nodes Candidate equivalent nodes Runs SAT on the intermediate miters in a topological order Refines the candidates using counterexamples Proving internal equivalences in a topological order A B SAT-1 ? D C SAT-2 SAT-3

Sequential SAT Sweeping
Sequential SAT sweeping is similar to combinational one in that it detects node equivalences The difference is, the equivalences are sequential They hold only in the reachable state space Every comb. equivalence is a seq. one, not vice versa It makes sense to run comb. SAT sweeping beforehand Sequential equivalence is proved by K-step induction Base case Inductive case Efficient implementation of induction is key!

Base Case Inductive Case
Candidate equivalences: {A,B}, {C,D} ? D C SAT-2 ? Proving internal equivalences in a topological order in frame K A B SAT-1 ? D C SAT-4 ? PIk A B SAT-3 PI1 ? C D D C SAT-2 A ? Assuming internal equivalences to in uninitialized frames 0 through K-1 B A B SAT-1 PI1 PI0 C D Initial state A Proving internal equivalences in initialized frames 0 through K-1 B PI0 Symbolic state

Efficient Implementation
Both base and inductive cases of K-step induction are runs of combinational SAT sweeping Tricks and know-hows of combinational sweeping are applicable The same integrated package can be used Starts with simulation Performs node checking in a topological order Benefits from the counter-example simulation Speculative reduction Has to do with how the assumptions are made (see next slide)

Speculative Reduction
Inputs to the inductive case Sequential circuit The number of frames to unroll (K) Candidate equivalence classes One node in each class is designated as the representative node Currently the representatives are the first nodes in a topological order Speculative reduction moves fanouts to the representative nodes Makes 80% of the constraints redundant Dramatically simplifies the resulting timeframes (observed 3x reductions) Leads to saving x in runtime during incremental SAT solving A A B B Adding assumptions without speculative reduction Adding assumptions with speculative reduction

Other Observations Surprisingly, the following are found to be of little or no importance for speeding up the inductive prover The quality of initial equivalence classes How much simulation (semi-formal filtering) was applied AIG rewriting on speculated timeframes Although AIG can be reduced 20%, incremental SAT runs the same The quality of AIG-to-CNF conversion Naïve conversion (1 AIG node = 3 clauses) works just fine Open question: Given these observations, how to speed up this type of incremental SAT?

Verification after Sequential Synthesis
X N1 Poison and antidote are the same! Two conceptually similar inductive provers can be used during synthesis – to prove seq equivalence of registers and nodes during verification – to prove seq equivalence of registers, nodes, and POs of two circuits Verification mentioned here is formal, that is, “unbounded” and “general-case” No limit on the input sequence is imposed (unlike BMC) No information about synthesis is passed to the verification tool The runtimes of synthesis and verification are comparable Scales to 100K-register designs – due to partitioning for induction Synthesis problem X … N1 N2 M Equivalence checking problem

Integrated SEC Flow The following is the sequence of transformations currently applied by the integrated SEC in ABC (command “dsec”) creating sequential miter (“miter -c”) PIs/POs are paired by name; if some registers have don’t-care init values, they are converted by adding new PIs and muxes; all logic is represented in the form of an AIG sequential sweep (“scl”) removes logic that does not fanout into POs structural register sweep (“scl -l”) removes stuck-at-constant and combinationally-equivalent registers most forward retiming (“retime –M 1”) (disabled by switch “–r”, e.g. “dsec –r”) moves all registers forward and computes new initial state partitioned register correspondence (“lcorr”) merges sequential equivalent registers (completely solves SEC after retiming) combinational SAT sweeping (“fraig”) merges combinational equivalent nodes before running signal correspondence for ( K = 1; K  16; K = K * 2 ) signal correspondence (“ssw”) // merges seq equivalent signals by K-step induction AIG rewriting (“drw”) // minimizes and restructures combinational logic most forward retiming // moves registers forward after logic restructuring sequential AIG simulation // targets satisfiable SAT instances post-processing (“write_aiger”) if sequential miter is still unsolved, dumps it into a file for future use

Example of Seq. Synthesis in ABC
abc 01> r iscas/blif/s38417.blif // reads in an ISCAS’89 benchmark abc 02> st; ps // shows the AIG statistics after structural hashing s : i/o = 28/ 106 lat = and = (exor = 178) lev = 31 abc 03> ssw –K 1 -v // performs one round of signal correspondence using simple induction Initial fraiging time = sec Simulating 9096 AIG nodes for 32 cycles ... Time = sec Original AIG = Init 2 frames = 84. Fraig = 82. Time = sec Before BMC: Const = Class = Lit = After BMC: Const = Class = Lit = 0 : Const = Class = L = LR = NR = 1 : Const = Class = L = LR = NR = … 28 : Const = Class = L = LR = NR = 29 : Const = Class = L = LR = NR = SimWord = 1. Round = Mem = 0.38 Mb. LitBeg = LitEnd = 753. ( %). Proof = Cex = Fail = 0. FailReal = 0. C-lim = ImpRatio = % NBeg = NEnd = (Gain = %). RBeg = REnd = (Gain = %). AIG simulation = sec AIG traversal = sec SAT solving = sec Unsat = sec Sat = sec Fail = sec Class refining = sec TOTAL RUNTIME = sec abc 04> ps // shows the AIG statistics after merging equivalent registers and nodes s : i/o = 28/ 106 lat = and = (exor = 116) lev = 31 abc 04> dsec –r // runs the unbounded SEC on the resulting network against the original one Networks are equivalent. Time = sec

Experimental Results Public benchmarks Industrial benchmarks
25 test cases ITC ’99 (b14, b15, b17, b20, b21, b22) ISCAS ’89 (s13207, s35932, s38417, s38584) IWLS ’05 (systemcaes, systemcdes, tv80, usb_funct, vga_lcd, wb_conmax, wb_dma, ac97_ctrl, aes_core, des_area, des_perf, ethernet, i2c, mem_ctrl, pci_spoci_ctrl) Industrial benchmarks 50 test cases Nothing else is known Workstation Intel Xeon 2-CPU 4-core, 8Gb RAM

ABC Scripts Baseline Register correspondence (Reg Corr)
choice; if; choice; if; choice; if // comb synthesis and mapping Register correspondence (Reg Corr) scl –l // structural register sweep lcorr // register correspondence using partitioned induction dsec –r // SEC Signal correspondence (Sig Corr) ssw // signal correspondence using non-partitioned induction

Public Benchmarks Columns “Baseline”, “Reg Corr” and “Sig Corr” show geometric means.

ITC / ISCAS Benchmarks (details)

IIWLS’05 Benchmarks (details)

ITC / ISCAS Benchmarks (runtime)

IWLS’05 Benchmarks (runtime)

Industrial Benchmarks
In case of multiple clock domains, optimization was applied only to the domain with the largest number of registers.

Future Continue tuning for scalability Experiment with new ideas
Speculative reduction Partitioning Experiment with new ideas Unique-state constraints Interpolate when induction fails Synthesizing equivalence Go beyond merging sequential equivalences Add logic restructuring using subsets of unreachable states Add retiming (improves delay on top of reg/area reductions) Add iteration (led to improvements in other synthesis projects) etc

Alan Mishchenko Robert Brayton UC Berkeley

Similar presentations

Presentation on theme: "Alan Mishchenko Robert Brayton UC Berkeley"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alan Mishchenko Robert Brayton UC Berkeley

Similar presentations

Presentation on theme: "Alan Mishchenko Robert Brayton UC Berkeley"— Presentation transcript:

Similar presentations

About project

Feedback