Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequential Equivalence Checking : Need and Challenges

Similar presentations


Presentation on theme: "Sequential Equivalence Checking : Need and Challenges"— Presentation transcript:

1 Sequential Equivalence Checking : Need and Challenges
Anmol Mathur Chief Architect Calypto Design Systems

2 Outline Why sequential equivalence checking?
Combinational vs sequential equivalence checking Existing system-level design/verification flows System-level to RTL equivalence checking RTL-RTL sequential equivalence checking Comparison to sequential property checking Taming sequential equivalence checking Demonstration of SLEC from Calypto 11/19/2018

3 Combinational Equivalence Checking
Most prevalent equivalence checking tools available today Appropriate for RTL to gate-level verification Expects exact 1-1 flop mapping and matching interfaces 11/19/2018

4 The Power of One-One State Mapping
Very strong inductive invariant Assuming that the one-one mapped flops (and inputs) are equal at time k, the next state functions (inputs to flops) and output functions (mapped primary outputs) are equal at time k+1 No state space analysis required Only combinational input constraints and output don’t cares needed 11/19/2018

5 System-Level to GDSII Process Flow
SLM Algorithmic Micro-architecture Manual Process Manual Process Process Flow RTL Imp. User Control Broad Control Limited Control Broad Control 11/19/2018

6 Levels of Systems Software Hardware SOC Hardware/software
Interface verification Hardware SOC Boundary assertion verification IP Block verification 11/19/2018

7 System-level Models Higher level of abstraction resulting in faster simulation turnaround time Performing architectural tradeoffs and performance validation Platform for software development This slide will motivate the main reasons why system-level models are designed. It will point out the verification flows people use for validating these models and how these\models are used in the RTL functional verification flow. 11/19/2018

8 Functional Verification Landscape
System level Simulation RTL-SL co-simulation Simulation/emulation based verification Assertion based verification RTL level RTL-gate Equivalence Checking Gate level 11/19/2018

9 RTL-RTL Equivalence Checking
Reasons Incremental development Feature creep Performance tuning Process migration Reuse across projects Common Refinements Buffer/Cache/Memory resizing Tuning cache replacement and coherence algorithms Pipeline insertion for performance Register retiming 11/19/2018

10 Sequential Equivalence Checking
Algorithmic Micro-architecture Algorithmic Micro-architecture SLM SEC Process Flow Process Flow RTL Imp. User Control User Control Broad Control Broad Control Limited Control Limited Control Broad Control Broad Control 11/19/2018

11 Sequential Equivalence Checking
SLM RTL Interface mapping Interface constraints SEC Abstraction mappings The intent here is to show where SLEC fits into the big picture of SLM-RTL verification. Define what are the inputs and outputs. Discuss what is the value proposition. Proved equivalences Counterexamples 11/19/2018

12 Key Advantages of SEC Allows complete verification of the RTL with respect to the SLM (the independently verified golden model) without testbenches Verification of the RTL is limited to the behaviors specified in the SLM Allows verification of RTL blocks to happen before the whole RTL or SLM is completed 11/19/2018

13 SEC vs Assertion Checking
Usability issues with assertion-based RTL verification Need to write design properties in a formal temporal logic Properties not independently verifiable How many properties are enough? Capturing input constraints and output don’t cares Technology issues with assertion-based verification Comparing a complete design against a very incomplete specification (property) Sequential analysis problem is harder than equivalence checking of two designs! 11/19/2018

14 Outline Why sequential equivalence checking?
Taming sequential equivalence checking Notions of sequential equivalence Key technology challenges Demonstration of SLEC from Calypto 11/19/2018

15 Combinational vs Sequential EC
Bit-accurate Identical data types Composite data types Precision/rounding differences Sequential EC Re-encoding of state Sequential differences Serial vs parallel interfaces scheduling pipelining Combinational EC FFs match Data representation differences 11/19/2018

16 Scheduling The SLM could perform a computation in parallel while RTL schedules operations in multiple cycles with a scheduling FSM Introduces additional states due to scheduling Some operations that can use the same resource become temporally disjoint resulting in resource sharing A B C + SLM + O B A C Capture some other typical micro-architectural differences between the SLM and RTL. Put in diagram here for parallel vs sequential design RTL reset + clk o 11/19/2018

17 Micro-architectural Abstractions
RTL has detailed micro-architectures like: Scan chains Sleep mode logic Clock gating Memory caches Bus-based communication along with bus arbitration Serial communication with handshakes P C SLM P C Handshake controller RTL 11/19/2018

18 Data Type Abstractions
SLM expression (E) could use C-like data types such as float, int, long and user defined data types RTL expression (E’) could use finite precision bit-vectors (signed/unsigned) to represent fixed-point data values RTL may explicitly perform rounding and truncation on intermediate computations function sum_of_product( float a, float b, float c) { return a *b + c; } SLM module sum_of_product(a,b,c, out); input signed [7:0] a, b, c; wire signed [15:0] prod; wire signed [7:0] trunc_prod; assign prod = a * b; assign trunc_prod = prod >> 8; assign out = trunc_prod + c; endmodule This slide will introduce the difference between the data types (specially for arithmetic expressions) that are used in the output and next-state functions in M and M’ RTL 11/19/2018

19 Notions of Sequential Equivalence
Cycle-accurate equivalence Starting from reset, designs produce identical outputs every cycle when equal inputs are applied Sequential hardware equivalence (Pixley) Requires equivalence from a set of states reached via an initializing sequence Safe replacement (Singhal, Pixley, Aziz, Brayton) No assumption about reset states 11/19/2018

20 FSM Refinement The states in RTL that correspond to states in SLM, are referred to as synchronizing states Si Sj Si’ Sj’ Refinement mapping SLM RTL Transient states Define the notion of synchronizing states and how they will be used in the process of equivalence checking 11/19/2018

21 Sequential Equivalence
SLM RTL Refinement mapping Transient states Starting from corresponding reset states, for corresponding inputs, if the outputs are equal in all corresponding synchronizing states, then SLM and RTL are equivalent Define the notion of synchronizing states and how they will be used in the process of equivalence checking 11/19/2018

22 Transactions : State View
SL transaction SLM Refinement mapping RTL RTL transaction Encapsulates one or more units of computation for the design being verified Self-contained since it brings the machine back to a synchronizing state 11/19/2018

23 Transactions Functional decomposition of the behaviors of a machine
Transaction 1 : opcodes ADD, SUB, MULT Transaction 2 : opcodes DIV, MOD Allows sequential verification problem to be contained Unconstrained problem is intractable Verification plan naturally decomposes behaviors Debugging ease Allows composition of different behaviors Sequential composition Parallel composition 11/19/2018

24 Transaction : Memory Design 1 Design 2
RD WR Mem ADDR DATA Design 1 Design 1 transaction : a single memory read/write occuring in a single cycle Design 2 transaction: single memory read/write (potentially) happening over multiple cycles OUT Mem Cache Cache ctl ADDR DATA RD WR OUT Essentially talk about the issue of some memory hierarchies (and associated state machines if the memory access happens over multiple cycles) being absent in the SLM Design 2 11/19/2018

25 Transaction Equivalence
SLM T0 T1 T2 Starting at reset, for corresponding input sequences, if the design outputs are equal at transaction boundaries, then the designs are equivalent Transactions can be pipelined RTL T0 T1 T2 11/19/2018

26 Arithmetic Equivalence
Exact Equivalence For all the possible values in the input space, E and E’ evaluate to exactly the same value Bounded error equivalence For all corresponding values in the input spaces of E and E’, |E – E’| < ε Infinite precision equivalence Ignoring loss of information at any of the intermediate points in the expressions, the expressions evaluate the same function This slide discusses the different notions of arithmetic equivalence and their applicability 11/19/2018

27 Outline Why sequential equivalence checking?
Taming sequential equivalence checking Notions of sequential equivalence Key technology challenges Specifying interface differences Sequential analysis High-capacity solvers Demonstration of SLEC from Calypto 11/19/2018

28 Specifying Interface Differences
Specification of input/output don’t cares Sequential signal relationships Combinational relations between signals Reset/non-reset values of signals Input mappings Factoring delay and throughput differences Handling protocol differences Blocking vs non-blocking communication Serial vs parallel communication 11/19/2018

29 Specifying Interface Differences
Output checks Handling latency and throughput differences Conditional output checks Variable delay or handshake-based checks Out-of-order checks Specifying transaction boundaries When can a new transaction start in the specification or implementation machine Only differences in the input and output interfaces need to be specified,not the actual input/output protocols 11/19/2018

30 Specifying Interface Differences
Mem Cache Cache ctl ADDR DATA RD WR OUT reset out_rdy Specifying a transaction requires: Begin-transaction sequence During-transaction invariants Ready-for-next-transaction condition Output valid condition clk reset RD ADDR OUT out_rdy transaction 11/19/2018

31 Outline Why sequential equivalence checking?
Taming sequential equivalence checking Notions of sequential equivalence Key technology challenges Specifying interface differences Sequential analysis High-capacity solvers Demonstration of SLEC from Calypto 11/19/2018

32 Sequential Analysis T0 T1 T2 Machine acceleration SLM T0 T1 T2
Output checks = ? = ? = ? RTL T0 T1 T2 State induction check 11/19/2018

33 Sequential Analysis Issues
Efficient machine acceleration Cannot afford to replicate next-state/output functions in unrolling over many cycles Elimination of pipelining/transient states Aligning the machines Accounting for data-dependent delay between synchronizing states Accounting for out-of-order output checks 11/19/2018

34 Induction and Sequential EC
Base case The corresponding outputs are equal in transaction 0 assuming the spec and impl machines start in reset states and the input mappings/constraints are obeyed Induction hypothesis Assuming that the spec and impl have equal outputs for the first k transactions assuming input mappings/constraints, then the outputs will be identical for the k+1 th transaction Problem Corresponding states are no longer known Induction using the property that all reachable states (from reset) have been explored or purely by using the constraint that the outputs matched in the first k transactions 11/19/2018

35 Inductive Proofs State-based forward induction Output-based induction
Accumulating reachable synchronizing states during forward symbolic co-simulation SAT-based forward reachability Output-based induction Using equality of outputs in the first k transactions to prove equivalence of outputs at the k+1 th transaction Strengthening the induction invariant Mapping flops (user-driven or automatic) Finding flop maps automatically in the presence of latency/throughput differences Automatic refinement of cut flops on falsification 11/19/2018

36 Reasons for Incomplete Proofs
Weaker induction hypothesis in sequential EC No state point or state mapping typically available Harder solver problems generated since the next state and output checks may require unrolling across multiple cycles Reset required for base case – over-reset can cause incomplete proofs Constraints that span across transactions can invalidate induction 11/19/2018

37 Bounded Equivalence Checking
Symbolic simulation of the spec and impl machines for a fixed number of transactions from reset Bug-finding mode Coverage metrics How to quantify confidence in the equivalence of the machines from a bounded k-transaction proof? 11/19/2018

38 Outline Why sequential equivalence checking?
Taming sequential equivalence checking Notions of sequential equivalence Key technology challenges Specifying interface differences Sequential analysis High-capacity solvers Demonstration of SLEC from Calypto 11/19/2018

39 Solvers Word-level Solver Hybrid Solver Bit-level Simulation Solver
11/19/2018

40 Word Level Solver Word-level solver Bit-level solver (SAT-based)
Expression size Run time BLS WLS Word-level solver Strength : proving arithmetic expressions equivalent Weakness : generating counterexamples Bit-level solver (SAT-based) Strength : proving expressions not equivalent Weakness : proving arithmetic expressions equivalent 11/19/2018

41 Finite Precision Reasoning
The nice algebraic properties of the + , * are not true when arithmetic computations are done using finite precision wire signed [7:0] a,b,c; wire signed [7:0] tmp; wire signed [8:0] out; assign tmp = a + b; assign out = tmp + c; wire signed [7:0] a,b,c; wire signed [7:0] tmp; wire signed [8:0] out; assign tmp = b + c; assign out = tmp + a; != This slide will introduce the issues when arithmetic computations are done with finite precision and the nice algebraic properties are no longer true a = 27 – 1 b = out = -1 c= -1 a = 27 – 1 b = out = 27 – 1 c= -1 11/19/2018

42 Finite Precision + Control
Mixed control-arithmetic reasoning Infinite precision canonization Combination of theories: Finite precision arithmetic Propositional logic if ( a + b < 0) x = a; else if (a + b > 0) x = b else x = a + b; if ( a == -b) x = 0; else if ( a > -b) x = b; else if (a < -b) x = a; 11/19/2018

43 WLS – BLS Interface Leveraging word-level information in bit-level solvers Exploiting word-level symmetry information Using information about bits in a bus for variable ordering in BDDs and decision ordering in SAT Intelligent ordering of bit-level problems based on word-level analysis 11/19/2018

44 Simulation + Formal Solvers
Cuts E1 E2 Simulation – intermediate equivalent points Word and bit-level solvers work together Cut sets based on proven intermediate equivalences for proof simplification 11/19/2018

45 Open Issues in Solvers Efficient identification of PENs in the presence of latency and throughput differences in the designs Which PENs to prove? Ordering of PEN proofs Use of predicate abstraction to simplify arithmetic-heavy proofs 11/19/2018

46 Outline Why sequential equivalence checking?
Taming sequential equivalence checking Notions of sequential equivalence Key technology challenges Demonstration of SLEC from Calypto 11/19/2018

47 Frontend Architecture
SystemC Sys.Verilog Verilog language X CPT API Loop Unrolling Dependence Analysis Flop/Mux inferencing Constant Propagation Dead code elimination Smart memory modeling Language Neutrality Support multiple languages scalably Language independent transforms CPT CPT to CDB xforms CDB CDB API SLEC Verification Engine SLS Synthesis Engine Future Products 11/19/2018

48 Verification Engine Architecture
Setup CDB Structural Decomposition Name based mappings Proof Simulation Engine Orchestration Sequential Analysis Solver Machine Acceleration Convergence Analysis Inductive Fixed point Just the picture of the overall architecture of the tool and how the various pieces fit in together WLS WSAT IPBDP BLS SAT BDD Simulation 11/19/2018

49 Demonstration Example
DES Encryption Block Symmetric key encryption/decryption 64 bit data message, 64 bit key 16 rounds of computation This demo example is a 64-bit DES example. DES is an older encryption standard of the US government. It stands for Data Encryption Standard. The data and key are both 64 bits long and produce a 64 bit result. The algorithm is symmetric, so the same key is used to decrypt the data. The core algorithm initially permutes the data and the key. 16 rounds of computation further permute the data and keys, and then a final inverse permutation done on the data to produce the result. In our demo, the switchbox permutation, inside the compute round, is replaced with a straight-through block to improve the run times for demonstration. (Although, with some of the 1.1 tool optimizations, we can probably handle the full DES with sub-10 minute runs.) Note, the pictures I show here are optimized for hardware implementation. The actual algorithm description from BIST looks very different but functionally equivalent. My implementations passed a few of the published results from the NIST validation suite. This algorithm would be used as part of a larger encryption flow for an entire message. 11/19/2018

50 RTL – System Continuum C0 – Untimed Functional (C/SystemC)
C1 – Timed Functional (SystemC) V2 – Serial RTL (Verilog) V3 – Pipelined RTL (Verilog) SLEC can be used to check for functional equivalence between designs within the RTL-System continuum. Shown are four designs at different levels of abstrcation, using SLEC to equivalence check them within the System space, within RTL space, and across the System-RTL transition. To save time, the demo will actually skip design C1 and will instead do C0-V2 and then V2-V3. (see the summary slide for final results). 11/19/2018

51 SLEC – Setup and Operation
Two designs are setup and verified for functional equivalence. If not equivalent, a short counterexample demonstrates the differences. This slide and the next five slides show a schematic of the problem setup and operation. Alternatively, just walk through this slide and skip the next five. For common cases, most setup information is automatically inferred. 11/19/2018

52 SLEC – Setup and Operation
The spec and impl designs are read in. Large memories and other modules (unsynthesizable design descriptions for example) can be blackboxed while reading in the design. 11/19/2018

53 SLEC – Setup and Operation
Clocks are specified to achieve a common frame of reference for timing relationships. We support single clock and two-phase latching design descriptions. We have implemented support for multiple synchronous clocks, but it has not been thoroughly tested. For common cases, clocking information is automatically inferred. 11/19/2018

54 SLEC – Setup and Operation
Corresponding starting states, usually reset states, are described. We can do combinations of reset approaches- for example, simulate a reset sequence and then set all remaining uninitialized flops and latches to 0. Options include reset sequences, reset images, and explicit state setting. 11/19/2018

55 SLEC – Setup and Operation
Correspondence is established between the design interfaces, constraining input space and aligning output comparisons. Input and output ports, including black box ports, can be automatically matched by name. Inputs can be constrained to corresponding constrained input spaces. A typical example is constraining reset ports in the two devices to be always off. If the interfaces are different, the timing differences (when to apply corresponding input values or compare corresponding output differences) also need to be described. For common cases, interface information is automatically inferred. 11/19/2018

56 SLEC – Setup and Operation
The designs are verified and either proven equivalent, or short counterexamples are generated which demonstrate the differences. Verification results may be: Full proof – proved equivalent forever Bounded proof – proved equivalent for a bounded number of transactions (equivalent proof depth) Falsified – an output difference was found and a counetrexample provided 11/19/2018

57 C0 - Untimed Functional Design
Untimed DES algorithm: written in C wrapped in SystemC. For this design, I wrote the DES algorithm as C functions using SystemC datatypes. I wrapped this functional design in a SystemC SC_METHOD process to create a single cycle design for verification. This is one way we would handle untimed C designs such as Mentor Catapult. 11/19/2018

58 C0 - Untimed Functional Design
C/SystemC Design Simulation /** DES function. */ sc_bv<64> des(sc_bv<64> data_in, sc_bv<64> key_in, bool decrypt) { sc_bv<56> cd0; sc_bv<64> lr0, lr16, rl16, data_out; lr0 = des_ip(data_in); cd0 = des_ic(key_in); lr16 = des_compute(lr0, cd0, decrypt); rl16 = (lr16.range(31, 0), lr16.range(63, 32)); data_out = des_iip(rl16); return data_out; } /** DES function wrapper. */ SC_MODULE(des_c0) { sc_out<sc_bv<64> > data_out; sc_in<sc_bv<64> > data_in, key_in; sc_in<bool> decrypt_in; SC_CTOR(des_c0) { SC_METHOD(evaluate); sensitive << data_in << key_in << decrypt_in; void evaluate() { data_out.write( des(data_in.read(), key_in.read(), decrypt_in.read())); }; c0> sim c0 DES C0 ECB Mode Message Simulation Original message ----> Calypto: Bridging System and RTL Encrypted message ----> f4966ffc92c8df0c 6b6c41c883c aac9640b2e 0f434cdf068b53e1 Decrypted message ----> Calypto: Bridging System and RTL simulation complete. At left is only a portion of the untimed design and it’s wrapper. A simulation of a simple message encryption using DES is shown at right. 11/19/2018

59 V2 - Serial RTL Design Verilog RTL implementation:
Throughput = 16 cycles. Latency = 16 cycles. Here is a serial implementation of the DES algorithm. It takes 16 cycles to compute an output (output latency = 16) with new inputs being accepted every 16 cycles (throughput = 16). This has a small hardware implementation, but has long throughput and latency. 11/19/2018

60 V2 - Serial RTL Design Verilog Design Simulation v2> sim v2
module des_v2_compute(lr_out, lr_in, cd_in, decrypt_in, rst, clk); output [63:0] lr_out; input [63:0] lr_in; input [55:0] cd_in; input decrypt_in, rst, clk; reg [3:0] round_reg; reg [63:0] lr_reg; wire [63:0] lr1, lr2; reg [55:0] cd_reg; wire [55:0] cd1, cd2; reg decrypt_reg; wire decrypt2; assign lr1 = (round_reg == 0)? lr_in : lr_reg; assign cd1 = (round_reg == 0)? cd_in : cd_reg; assign decrypt2 = (round_reg == 0)? decrypt_in : decrypt_reg; des_round rnd(lr2,cd2,lr1,cd1,decrypt2,round_reg); clk) if (rst) begin round_reg <= 0; lr_reg <= 0; cd_reg <= 0; decrypt_reg <= 0; end else begin round_reg <= (round_reg==15)? 0:round_reg + 1; lr_reg <= lr2; cd_reg <= cd2; decrypt_reg <= decrypt2; end assign lr_out = lr_reg; endmodule // additional modules... v2> sim v2 DES V2 ECB Mode Message Simulation Original message ----> Calypto: Bridging System and RTL Encrypted message ----> f4966ffc92c8df0c eb6c41c883c70700 e15617aac9640b2e 0f434cdf068b53e1 Decrypted message ----> Calypto:`Bridgin' System and RTL simulation complete. At left, only a portion of the serial Verilog design is shown. A simulation of a simple message encryption using DES is shown at right – note the BUG – “‘Brindgin’” instead of “Bridging”. This bug would have been caught quickly by our initial verification step (sim-based validation), but we turn that off in the demo to find it using out formal algorithms. 11/19/2018

61 C0V2 Verification Setup Specify designs: Specify clocks:
# c0v2 verification – run.tcl Specify designs: Specify clocks: Specify reset state: Align interfaces: Verify designs: build_design -spec -systemc c0/des_c0.h build_design -impl -verilog v2/des_v2.v v2/des_common.v # automatically inferred. create_waveform -name one -width 1 1 create_constraint -impl -reset -waveform one impl.rst This slide and the next five slides describe the first C0V2 setup. Alternatively, just walk through this slide and skip the next five. create_waveform -name zero -width 1 0+ create_constraint –impl waveform zero impl.rst set_global impl_throughput set_global impl_output_latency 16 verify –mode full_proof 11/19/2018

62 C0V2 Verification Setup Specify designs: Specify clocks:
# c0v2 verification – run.tcl Specify designs: Specify clocks: Specify reset state: Align interfaces: Verify designs: build_design -spec -systemc c0/des_c0.h build_design -impl -verilog v2/des_v2.v v2/des_common.v # automatically inferred. create_waveform -name one -width 1 1 create_constraint -impl -reset -waveform one impl.rst The designs are read in and built. The –verilog and –systemc flags are optional but used for documentation. The design description is assumed from the file suffixes. create_waveform -name zero -width 1 0+ create_constraint –impl waveform zero impl.rst set_global impl_throughput set_global impl_output_latency 16 verify –mode full_proof 11/19/2018

63 C0V2 Verification Setup Specify designs: Specify clocks:
# c0v2 verification – run.tcl Specify designs: Specify clocks: Specify reset state: Align interfaces: Verify designs: build_design -spec -systemc c0/des_c0.h build_design -impl -verilog v2/des_v2.v v2/des_common.v # automatically inferred. create_waveform -name one -width 1 1 create_constraint -impl -reset -waveform one impl.rst The clocks are automatically inferred for this design. This includes: Creating clock sources Identifying the clock ports and their proper edge sensitivities. Assuming the primary inputs’ clock sensitivities Identifying the primary outputs’ clock sensitivities. We could have explicitly set the clock source or input/output sensitivities for documentation purposes. create_waveform -name zero -width 1 0+ create_constraint –impl waveform zero impl.rst set_global impl_throughput set_global impl_output_latency 16 verify –mode full_proof 11/19/2018

64 C0V2 Verification Setup Specify designs: Specify clocks:
# c0v2 verification – run.tcl Specify designs: Specify clocks: Specify reset state: Align interfaces: Verify designs: build_design -spec -systemc c0/des_c0.h build_design -impl -verilog v2/des_v2.v v2/des_common.v # automatically inferred. create_waveform -name one -width 1 1 create_constraint -impl -reset -waveform one impl.rst No reset info is required for C0. All of C0s computation occurs in a single cycle with no state held between computations, so any starting state is a valid reset state. V2 has control state which must be reset. We chose to do this with a reset sequence – impl.rst held high for an inferred single cycle. create_waveform -name zero -width 1 0+ create_constraint –impl waveform zero impl.rst set_global impl_throughput set_global impl_output_latency 16 verify –mode full_proof 11/19/2018

65 C0V2 Verification Setup Specify designs: Specify clocks:
# c0v2 verification – run.tcl Specify designs: Specify clocks: Specify reset state: Align interfaces: Verify designs: build_design -spec -systemc c0/des_c0.h build_design -impl -verilog v2/des_v2.v v2/des_common.v # automatically inferred. create_waveform -name one -width 1 1 create_constraint -impl -reset -waveform one impl.rst First, V2 is constrained to stay out of reset in the middle of a computation (C0 has no corresponding behavior). This is a common input constraint. Second, the input and output ports of C0 and V2 match by name; their correspondence is inferred by SLEC. The only difference is when to sample the inputs and when to compare the outputs. SLEC’s default model, new inputs every cycle, outputs sampled immediately after inputs applied, is appropriate for C0. For V2, the corresponding inputs are applied every 16 cycles, and outputs are sampled 16 cycles after inputs are applied. create_waveform -name zero -width 1 0+ create_constraint –impl waveform zero impl.rst set_global impl_throughput set_global impl_output_latency 16 verify –mode full_proof 11/19/2018

66 C0V2 Verification Setup Specify designs: Specify clocks:
# c0v2 verification – run.tcl Specify designs: Specify clocks: Specify reset state: Align interfaces: Verify designs: build_design -spec -systemc c0/des_c0.h build_design -impl -verilog v2/des_v2.v v2/des_common.v # automatically inferred. create_waveform -name one -width 1 1 create_constraint -impl -reset -waveform one impl.rst There are two verification modes: find_error = bug finding mode (default) – proves to a certain equivalency depth full_proof = try to prove equivalent forever after reset Not shown: set_global sim_based_validation 0 Which turns off simulation based validation. create_waveform -name zero -width 1 0+ create_constraint –impl waveform zero impl.rst set_global impl_throughput set_global impl_output_latency 16 verify 11/19/2018

67 C0V2 Verification Results
SLEC falsifies and generates a counterexample: linux> slec run.tcl slec> # c0v2 verification ... (abridged results log) slec> verify -mode full_proof [ORC-PON] Optimized 'impl' netlist [ORC-PON] Optimized 'spec' netlist [ORC-SDM] Mapping the Specification and Implementation Hierarchies ... [ORC-UDM] Unrolling the design to obtain outputs and flops for transaction 1. [ORC-UDDM] Successfully obtained outputs and flops for transaction 1. [ORC-SSA] Starting to perform sequential analysis. [SEQ-STFE] Performing Simulation to find intermediate equivalences ... [SEQ-NPP1] Number of intermediate equivalence problems proven = 1136 [SIM-VMM] Simulation verified the falsification for map 'name_map<spec.data_out,impl.data_out>' in transaction 1. [CEG-SGLT] Started generating Verilog testbench for counter-example... <WRN> [CEG-UHNP] User design header not provided. SystemC counter-example testbench will fail to compile. [CEG-FGLT] Finished generating Verilog testbench for counter-example. [SIM-SVF] Split VCD files into file 'impl.vcd' and file 'spec.vcd'. Output-pair: spec.data_out (throughput=1, latency=0) and impl.data_out (throughput=16, latency=16) falsified at transaction number 1. Summary of key results: Proven Bounded-Proven Falsified Not-Attempted Output Pairs Flop Pairs u 1.396s : m p (SLEC process used 118 MB and 499 seconds) SLEC finds a single transaction counterexample and generates VCD files and testbenches. On my IBM T30 laptop (2GHz, 512MB), falsification occurs in 8 ¾ minutes. We set sim-based verification off. If it had been on (default), a random simulation check would have found the bug almost immediately (<30 sec) before using our formal solvers. 11/19/2018

68 C0V2 Verification Debug Mismatch in data_out.
We find a difference when comparing the data output in corresponding cycles (cycle 0 for spec and cycle 15 for impl). Only one bit is different in the output, which suggests the error is in the output permutation block, because an error in the computation rounds would likely have scrambled more than one bit. Mismatch in data_out. Single-bit error suggests the bug is likely in the final permutation, not in the compute rounds. 11/19/2018

69 C0V2 Verification Debug Verilog Design – v2.v
Mismatch found in the output permutation block (IIP). The data_out[63] assignment was mistranslated from the original NIST permutation table. Should be = 24. /** * Data inverse initial permutation function. */ module des_iip(data_out, rl_in); output reg [63:0] data_out; input [63:0] rl_in; begin data_out[63] = rl_in[25]; data_out[62] = rl_in[56]; data_out[61] = rl_in[16]; data_out[60] = rl_in[48]; data_out[59] = rl_in[ 8]; data_out[58] = rl_in[40]; data_out[57] = rl_in[ 0]; data_out[56] = rl_in[32]; data_out[55] = rl_in[25]; data_out[54] = rl_in[57]; data_out[53] = rl_in[17]; data_out[52] = rl_in[49]; // . . . end endmodule We knew to look first in the output permutations because an error in the computation rounds would likely have scrambled more than one bit. This kind of practical knowledge about the design is very helpful in diagnosing the problem. The error occurred when translating the NIST specs into Verilog. The NIST tables specify the permutation of bits 1-64 with bit 1 being the leftmost bit, but it was described in Verilog as bits 0-63 with bit 0 being the rightmost bit. The NIST tables need to be converted as y < x We goofed: > 24, not 25! 11/19/2018

70 C0V2 Verification Debug Verilog Design – v2.v
Mismatch found in the output permutation block (IIP). The data_out[63] assignment was mistranslated from the original NIST permutation table. Should be = 24. Correct V2 and re-run SLEC. /** * Data inverse initial permutation function. */ module des_iip(data_out, rl_in); output reg [63:0] data_out; input [63:0] rl_in; begin data_out[63] = rl_in[24]; data_out[62] = rl_in[56]; data_out[61] = rl_in[16]; data_out[60] = rl_in[48]; data_out[59] = rl_in[ 8]; data_out[58] = rl_in[40]; data_out[57] = rl_in[ 0]; data_out[56] = rl_in[32]; data_out[55] = rl_in[25]; data_out[54] = rl_in[57]; data_out[53] = rl_in[17]; data_out[52] = rl_in[49]; // . . . end endmodule We fix the error and re-run SLEC. 11/19/2018

71 C0V2 Verification Results
Re-running SLEC proves full functional equivalence: linux> slec run.tcl slec> # c0v2 verification ... (abridged results log) [ORC-SDM] Mapping the Specification and Implementation Hierarchies ... [ORC-FDM] Finished mapping successfully. [SIM-VOM] Simulation successfully validated the specified output-maps. [ORC-UDM] Unrolling the design to obtain outputs and flops for transaction 1. [ORC-UDDM] Successfully obtained outputs and flops for transaction 1. [ORC-SSA] Starting to perform sequential analysis. [SEQ-STFE] Performing Simulation to find intermediate equivalences ... [SEQ-NPP] Number of intermediate equivalence problems posed = 1136 [SEQ-NPP1] Number of intermediate equivalence problems proven = 1136 [SEQ-NPF] Number of intermediate equivalence problems falsified = 0 [SEQ-DOP] Completed output proofs for Transaction Number 1. [ORC-FSA] Done performing sequential analysis. Output-pair: spec.data_out (throughput=1, latency=0) and impl.data_out (throughput=16, latency=16) proven to be equivalent. Summary of key results: Proven Bounded-Proven Falsified Not-Attempted Output Pairs Flop Pairs slec> slec> # end of file u 1.043s : m p (SLEC process used 78 MB and 419 seconds) SLEC completes full functional proof in 7 minutes. Also note (shown in the log) that this time we left sim-based validation on: [SIM-VOM] Simulation successfully validated the specified output-maps. 11/19/2018

72 V3 Pipelined RTL Design Verilog RTL implementation:
Throughput = 4 cycles Latency = 16 cycles This is a partially pipelined version. New inputs are accepted every 4 cycles (throughput = 4) while output latency is still 16 cycles. This has a larger hardware implementation, but has 4x throughput versus the serial implementation. 11/19/2018

73 V3 Pipelined RTL Design Verilog Design Simulation v3> sim v3
module des_v3_control(sample_inputs_out, round0_out, round1_out, round2_out, round3_out, rst, clk); output reg sample_inputs_out; output reg [3:0] round0_out, round1_out, round2_out, round3_out; input rst, clk; reg [1:0] cycle; begin sample_inputs_out = (cycle == 0)? 1 : 0; round0_out = cycle; round1_out = cycle + 4; round2_out = cycle + 8; round3_out = cycle + 12; end clk) begin if (rst) begin cycle <= 0; end else begin cycle <= (cycle == 3)? 0 : cycle + 1; endmodule // additional modules... v3> sim v3 DES V3 ECB Mode Message Simulation Original message ----> Calypto: Bridging System and RTL Encrypted message ----> f4966ffc92c8df0c 6b6c41c883c aac9640b2e 0f434cdf068b53e1 Decrypted message ----> Calypto: Bridging System and RTL simulation complete. At left, only a portion of the pipelined Verilog design is shown. A simulation of a simple message encryption using DES is shown at right – looks correct. 11/19/2018

74 V2V3 Verification Setup Specify designs: Specify clocks:
# v2v3 verification – run.tcl Specify designs: Specify clocks: Specify reset state: Align interfaces: Verify designs: build_design -spec -verilog v2/des_v2.v v2/des_common.v build_design -impl -verilog v3/des_v3.v v3/des_common.v # automatically inferred. create_waveform -name one -width 1 1 Create_constraint –spec –reset –waveform one spec.rst create_constraint -impl -reset -waveform one impl.rst create_waveform -name zero -width 1 0+ create_constraint –spec waveform zero spec.rst create_constraint –impl waveform zero impl.rst set_global spec_throughput set_global impl_throughput set_global spec_output_latency 16 set_global impl_output_latency 16 This slide describes the V2V3 setup. Both designs need to be reset and kept out of reset during verification. The only setup difference between the two designs is the throughput- The pipelined design accepts new inputs every 4 cycles instead if 16 cycles for the serial design. verify –mode full_proof 11/19/2018

75 V2V3 Verification Results
Running SLEC proves full functional equivalence: linux> slec run.tcl slec> # c0v2 verification ... (abridged results log) [ORC-PON] Optimized 'impl' netlist [ORC-PON] Optimized 'spec' netlist [ORC-SDM] Mapping the Specification and Implementation Hierarchies ... [ORC-FDM] Finished mapping successfully. [SIM-VOM] Simulation successfully validated the specified output-maps. [ORC-UDM] Unrolling the design to obtain outputs and flops for transaction 1. [ORC-UDDM] Successfully obtained outputs and flops for transaction 1. [ORC-SSA] Starting to perform sequential analysis. [SEQ-STFE] Performing Simulation to find intermediate equivalences ... [SEQ-NPP] Number of intermediate equivalence problems posed = 0 [SEQ-NPP1] Number of intermediate equivalence problems proven = 0 [SEQ-NPF] Number of intermediate equivalence problems falsified = 0 [SEQ-DOP] Completed output proofs for Transaction Number 1. [ORC-FSA] Done performing sequential analysis. Output-pair: spec.data_out (throughput=16, latency=16) and impl.data_out (throughput=4, latency=16) proven to be equivalent. Summary of key results: Proven Bounded-Proven Falsified Not-Attempted Output Pairs Flop Pairs u 0.597s : m p (SLEC process used 119 MB and 249 seconds) A full proof is achieved in a little over 4 minutes. 11/19/2018

76 Bridging System and RTL
C0 – Untimed Functional (SystemC) V2 – Serial RTL (Verilog) Design error found and corrected Proven equivalent with C0. V3 – Pipelined RTL (Verilog) Proven equivalent with V2. This summarizes the actual verification results we achieved during the demo. This is the last slide. 11/19/2018

77 Summary Equivalence checking between system-level models and RTL will provide a big value to customers by easing the RTL verification bottleneck There are significant abstractions in the system-level models that need to be bridged The key technology pieces that are required are: Intuitive specification of interface differences Scalable and robust sequential analysis techniques for proving equivalence Synergistic use of bit-level, word-level solvers with simulation Hardware model extraction from system-level models 11/19/2018


Download ppt "Sequential Equivalence Checking : Need and Challenges"

Similar presentations


Ads by Google