# Proving Absence of Deadlocks in Hardware

## Presentation on theme: "Proving Absence of Deadlocks in Hardware"— Presentation transcript:

Proving Absence of Deadlocks in Hardware
Alexander Gotmanov, Satrajit Chatterjee and Mike Kishinevsky Intel Corp., Strategic CAD Labs (SCL)

Basics of hardware verification Communication fabrics
Outline Basics of hardware verification Communication fabrics Modeling Verification

Basics of hardware modeling and verification

Hardware vs. Software X F in out Software Thread parallelism
“Infinite” state Termination Asynchronous Easy to fix Hardware Device parallelism Finite state Unbound execution Synchronous Impossible to fix X F in out

b := (2 <= v) & (v <= 5)
Describing Hardware RTL = Register Transfer Level Registers Wires Finite data types Expressions Modularity Initial state(s) No final state X u := (v+1) * a v u a b := (2 <= v) & (v <= 5) b Circuit X – Boolean state I(X) – initial predicate T(X,X’) – transition relation

Correctness When hardware model is correct? Property verification
Make statements about behaviors of the model M that are expected to hold Is it true that they hold in every execution of M? Equivalence verification M1, M2 are models Is it true that fM1 = fM2?

Linear Temporal Logic (LTL)
LTL is a formal language to reason about executions of a state machine in time Execution = infinite sequence of consecutive states s1 s2 s3 s4 s5 s6 s7 Predicate = statement about a state p ¬q p is true in s42, while q is false s42

Temporal operators / 1 LTL is a formal language to reason about executions of a state machine in time p p p p p p p s1 s2 s3 s4 s5 s6 s7 G(p) “always p” ¬p ¬p ¬p p - - - F(p) s1 s2 s3 s4 s5 s6 s7 “eventually p”

Temporal operators / 2 LTL is a formal language to reason about executions of a state machine in time Plus all propositional connectives: “·”, “+”, “→”, “¬”, … ¬p ¬p p ¬p p p p F(G(p)) s1 s2 s3 s4 s5 s6 s7 “eventually G(p)” G(p) ¬p p ¬p ¬p p ¬p p G(F(p)) s1 s2 s3 s4 s5 s6 s7 “always F(p)” p is satisfied infinitely often in the execution F(p) F(p) F(p) F(p) F(p) F(p) F(p)

Safety and Liveness Safety property (invariant) Liveness property
G(P(X)) P(X) is always true Liveness property GF(P(X)) P(X) is satisfied infinitely often

Guarantee and Persistence
Guarantee property F(P(X)) P(X) becomes true Persistence property FG(P(X)) P(X) becomes “stuck” at true

Property verification / 1
Simulation (~debugging) Applied everywhere Check that the model is correct for a (small) subset of possible behaviors Random / semi-random / trace simulation Need a testbench and checkers

Property verification / 2
Compliance monitors How to choose the “right” properties? Assuming that all properties are true, prove “correctness theorem” Example: proving producer-consumer ordering for PCI Express

Property verification / 3
Small-scale formal verification Applied at module level, late in design cycle Use RTL description Prove properties Bit-level verification Bounded proofs Complete proofs Perfecting model checking algorithms

Property verification / 4
Large-scale formal verification Applied at system level, early in design cycle Build an abstract model (HLM) TLA+ Murphi SMV xMAS Prove properties SMT & bit-level Theorem proving Strengthening Automatic proof

Property verification / 2
HLM-to-RTL correlation Demonstrate that RTL is compliant with its High Level Model Simulation Generate compliance checkers Formal verification (No practical examples)

Reachability analysis / 1
Need to prove: G(P(X)) X – Boolean state I(X) – initial predicate T(X,X’) – transition relation Construct step-wise reachability sets R0, R1, R2, … R0(X) = I(X) Ri+1(X’) = Ri(X) + X. Ri(X) & T(X,X’) For some k, Rk = Rk+1 = R Check that R(X) → P(X). Q.E.D.

Reachability analysis / 2
Implementation options Explicit state enumeration Reduced Ordered BDD And-Inverter Graphs + SAT Formulas + SMT Bounded model checking Partial reachability analysis Unable to reach k, where Rk = Rk+1 Still have “bounded” proof that P(X) is true for first n cycles of execution (n < k)

Induction / 1 Can we prove G(P(X)) without checking all reachable states? Can we prove 2n ≤ n! + 100, n ≥ 0, without checking all non-negative n? P(X) is called inductive, iff I(X) → P(X) P(X) & T(X,X’) → P(X’) Theorem: if P(X) is inductive then G(P(X)) is true Two SAT (SMT) checks to establish inductivity

Induction / 2 What if P(X) is not inductive?
Theorem: if G(P(X)) is true, there exists inductive Q(X), such that Q(X) → P(X) “Local” properties are usually not inductive G(y = 1 & x = 1) is inductive G(y = 1) y = 1 → y’ = 1

Other bit-level approaches
Interpolation [K. McMillan, CAV 2003] Property-oriented Based on Craig’s interpolation theorem IC3 [A. Bradley, VMCAI 2011] Approximate reachability Incremental inductive strengthening ABC [A. Mischenko, R. Brayton, UC-Berkeley] Synthesis for verification Parallel model checking

Equivalence verification
Correctness of synthesis algorithms Prove that optimized circuit is equivalent to the original Equivalence can be expressed as a property z Assert(z == 1); X1 F1 F2 X2 = in

Communication fabrics: modeling
Communication Fabric = logic connecting different agents on a chip

Verification is not optional
Tricky, distributed interaction deadlock starvation Modular verification doesn’t work Problems seen late during integration late RTL, in Silicon

Our Approach: High-Level Modeling
Build early, abstract models of the microarchitecture Goal: (Automatically) prove correctness by exploiting high-level information provided by the model

xMAS Modeling Framework
Model is composed from a small set of primitives primitives are formally specified friendly to microarchitects Networks where packets flow under backpressure naturally concurrent packets are conserved A simple example

xMAS Semantics is short-hand for
Each primitive is a synchronous (RTL) module Different modules are connected by channels is short-hand for channel

Primitives are parameterized
xMAS Semantics Primitives are parameterized is short-hand for k ( x # + 42 ) Example

A Simple Example x x cycle 1 cycle 2 x x x cycle 3 … cycle 6 cycle 7

Agent P Fabric Agent Q ~ # k agent Q P x = req rsp fabric 2
A more complex example

Modeling and Verification Flow
(from architects) To make this easier Additional Formal Analysis by hand Model Checker Proof C++ exec- utable random test- bench.v model.v + SVA abc compile run simulate (RTL Simulator) xMAS API (Library) VCD Other backends to SystemC, TLA, Murphi, etc. possible

Safety proofs

Channel Properties A channel property checks that all packets on a channel satisfy some condition e.g. all packets received by an agent have correct dest_id 4 4 Example of channel property: G (z.irdy  (z.data = 0)) (queue 1) (queue 2) Although this property is obviously true: Interpolation (in ABC) takes 10 mins to prove this! (Usually explicit or BDD-based reachability not an option )

Inductive Strengthening / 1
Use high-level structure to add invariants 4 4 Target channel property: G (z.irdy  (z.data = 0)) (queue 1) (queue 2) Invariant 1: G (usedj  (memj = 0)) If location j in queue 2 is in use, it must contain 0 Invariant 2: G (y.irdy  (y.data = 0)) Invariant 3: G (usedj  (memj = 0)) (For queue 1) This set of invariants is inductive!

Inductive Strengthening / 2
Similar propagation for other primitives Example: Propagation across function primitive k x y z v # + 7 ( 6 - bit ) Property: G (z.irdy  (z.data = 7)) Invariant: G (y.irdy  ((y.data+7)=7)) The invariant is obtained syntactically (no need to invert function)

Non-blocking property
non-blocking property on channel r: G (r.irdy  r.trdy) Intuition: If a packet is in r there is room in the ingress queue (useful to reason about liveness) Hard property to prove: Not inductive!

Strengthening non-blocking properties
numo numc numi non-blocking property on channel r: G (r.irdy  r.trdy) (Inductive) Invariant: G (numi + numc= numo)

How to find numi + numc= numo automatically?
Introduce λ variables to count transfers on each channel numo numc numi Eliminate λs using modified Gaussian Elimination

Practical Experience More robust than model checking
model checking fails on trivial examples More automation than theorem proving hundreds of invariants generated automatically to make it inductive invariants for non-blocking can have 20+ num terms each 6x6 Mesh: validated global routing in minutes each router has a static local routing function channel properties to check that a router never receives packets it doesn’t know how to route 100+ simultaneous transactions in flight Happens less often than you think due to switches Introduces implications which are tautologies Can remove tautological properties with an SMT solver during propagation

Liveness proofs

A retry state either persists or ends in transfer.
Channel protocol Blocked (trdy=0) FwdRetry irdy = 1 trdy = 0 Initiator is ready, waiting for target Inactive irdy = 0 trdy = 0 Xfer irdy = 1 trdy = 1 BwdRetry irdy = 0 trdy = 1 Target is ready, waiting for initiator Idle (irdy=0) Make it clear that all 4 combinations are possible. “For the rest of the presentation, … Idle, Block” A retry state either persists or ends in transfer.

data irdy trdy Deadlock in xMAS is defined for a channel Dead(u) = “eventually a packet arrives at input of u, but output of u never accepts it” Dead(u) = F(u.irdy · G¬u.trdy) Live(u) = ¬Dead(u) = G(u.irdy → Fu.trdy)

Example FG(¬w.trdy) Dead(u) Sink is not fair GF(w.trdy) Live(u)
Sink is fair

GF(x.trdy) = ¬FG(¬w.trdy)
Fairness Constraints Fairness constraint: GF(x.trdy) = ¬FG(¬w.trdy) Execution of xMAS model is fair if it satisfies all fairness constraints Fair = conjunction of all fairness constraints

Simple Messaging Fabric with Ordering
State of the art ABC (liveness-to-safety translation) Reduces liveness problem to equivalent safety problem by circuit transformation Doubles number of flops BMC can reveal “shallow” deadlocks Induction, interpolation, etc. Does not scale well to xMAS models with 10s of queues Example Simple Messaging Fabric with Ordering 75 primitives (24 queues) ABC Proof – infeasible. BMC – 29 frames in 1hr. Our method Proof – 4s.

Idle and Block Idle(u) = FG(¬u.irdy) “u (eventually) stuck at idle”
Block(u) = FG(¬u.trdy) “u (eventually) stuck at blocked” Dead(u) = ¬Idle(u) · Block(u) “u is in deadlock iff u stuck at blocked, but not stuck at idle” Fair = ¬Block(w) “sink is fair iff w not stuck at blocked”

Equations for xMAS queue
Idle(v) Idle(u) Block(v) u.trdy = (q.num < k) v.irdy = (q.num > 0) Block(u) Empty(q) Full(q) Full(q) = FG(q.num=k) “q (eventually) stuck at full” Empty(q) = FG(q.num=0) “q (eventually) stuck at empty” Equations propagate knowledge about Idle and Block conditions through a queue. Block(u) = Full(q) “u stuck at blocked iff q stuck at full” Full(q) → Block(v) “if q stuck at full then v stuck at blocked” Similar equations for all xMAS primitives. Other equations: Idle(v) = Empty(q), Empty(q) → Idle(u), ¬Idle(u) · Block(v) → Full(q), etc.

Liveness proof Block(u) = Full(q1) Full(q1) → Block(v) DeadEq
Dead(u) = ¬Idle(u) · Block(u) Block(v) Block(w) Full(q1) Full(q2) ¬Block(w) Block(u) = Full(q1) Full(q1) → Block(v) DeadEq (true for every execution) Fair = ¬Block(w) (true for all fair executions) Block(v) = Full(q2) Full(q2) → Block(w) Dead(u) · DeadEq · Fair = 0 in propositional logic There is no fair execution of the model, where channel u is dead. Q.E.D.

(per primitive) LTL theorems, describing how Idle and Block propagate through primitives + May return unreachable executions as counterexamples Rule out with over approximate reachability using automatically generated invariants. Can’t comprehend message-dependent deadlocks with Idle and Block only Capture properties of data with additional Prop conditions. (details in the backup) Fair As ¬Block UNSAT = Liveness proof SAT = Counterexample

Experimental results: proof
No proofs with ABC, when number of queues is in 10s. Experiment NPrims (NQueues) Time for ABC (proof) Num. of BMC frames in 1hr Time for our method Time % (DE+SAT) Time for safety Two queues 4 (2) <1s >100 - SMF = simple messaging fabric 57 (20) N/A 2s SMF with ordering 75 (24) 29 4s IOF1 = IO fabric 91 (19) 15 3s IOF2 293 (58) 14 14s 53%+47% 1min IOF3 480 (92) 11 12min 94%+6% 9min IOF4 (minimal) 493 (97) 18 2.5min 85%+15% 28s IOF4 (non-minimal) 20 3min 65%+35% 3.5hr IOF5 (minimal) 680 (131) 40min 97%+3% 2min IOF5 (non-minimal) 44min 10hr+

Conclusion Verification technology to reduce liveness of xMAS model to satisfiability in propositional logic Can get proofs for models with 100s of queues Open problem: method requires extension for models with non-restricted joins

References “Quick Formal Modeling of Communication Fabrics to Enable Verification”, HLDVT 2010 “Automatic Generation of Inductive Invariants from High-Level Microarchitectural Models of Communication Fabrics”, CAV 2010 “Verifying Deadlock-Freedom of Communication Fabrics”, VMCAI 2011

Thank you

Backup

# Careful with Joins # ( x , y ) + z ( x , y ) z Hard case Easy case
G (z.irdy  (z.data = 42)) Most joins like this in practice (used for synchronization)

Empty(q1) Dead(a) Fair = ¬Block(g) Full(q2) Full(q3) Dead(a) · DeadEq · Fair has a solution Solution is unreachable execution with q1 stuck at empty and q2, q3 stuck at full Deadlock equations – about executions Flow invariants – about instantaneous state The execution contradicts with model flow invariant (automatically generated): q1.num = q2.num + q3.num ? (q1.num = 0) → ((q2.num = 0)·(q3.num = 0))

Dead(u) · DeadEq · Fair · Inv · InfEq for propositional satisfiability
Adding invariants / 2 Empty(q1) Dead(a) Fair = ¬Block(g) Live Full(q2) Full(q3) Add equations connecting Idle, Block, Empty, Full, etc. with instantaneous state of the model (q.num, irdy, trdy, etc.): Empty(q) → (q.num = 0) Full(q) → (q.num = k) Idle(u) → (u.irdy = 0) InfEq (eventually true for every execution) UNSAT Add any instantaneous invariants (Inv), solve Dead(u) · DeadEq · Fair · Inv · InfEq for propositional satisfiability

(per primitive) LTL theorems, describing how Idle and Block propagate through primitives + Fair UNSAT = Liveness proof SAT = Counterexample As ¬Block + InfEq Connect Idle, Block, etc. with instantaneous state + Inv Automatically generated instantaneous invariants

This method is a recent improvement on the one in the paper.
Data dependencies / 1 a(x) Idle(v) Block(v) Idle(u) u.trdy = u.irdy · (a(u.data) · v.trdy + + b(u.data) · w.trdy) v.irdy = u.irdy · a(u.data) w.irdy = u.irdy · b(u.data) Block(u) Idle(w) b(x) Block(w) Idle(v) depends on what kind of data is coming from input channel u Idle(v) = Idle(u) + “all packets are going to the bottom branch” Propp(x)(u) = FG(u.irdy → p(u.data)) “u (eventually) stuck at p(x)” Idle(v) = Idle(u) + Prop¬a(x)(u) “v stuck at idle, iff u stuck at idle or u stuck at ¬a(x)” Prop conditions in addition to Idle and Block, etc. Add equations to propagate Prop conditions through the model This method is a recent improvement on the one in the paper.

This method is a recent improvement on the one in the paper.
Data dependencies / 2 Propp(f(x))(v) Propp(x)(v) Propp(x)(v) = Propp(f(x))(u) “v stuck at p(x) iff u stuck at p(f(x))” Number of all possible Prop conditions is (number of channels) x (average number of predicates per channel). Linear from size of the model. May be exponential from width of data on channels. Explosion is avoidable in practice Compute a limited subset of Prop conditions. Start with Prop conditions used in equations for all switches. Add new Prop conditions with propagation. Stop propagation, if Prop predicate is repeated or becomes a tautology. This method is a recent improvement on the one in the paper.

Method revisited (again)
Dead(u) As ¬Idle · Block Find a set of Prop conditions to use + and Prop DeadEq (per primitive) LTL theorems, describing how Idle and Block propagate through primitives + Fair UNSAT = Liveness proof SAT = Counterexample As ¬Block + InfEq Connect Idle, Block, etc. with instantaneous state + Inv Automatically generated instantaneous invariants

Experimental results: finding a bug
NPrims (NQueues) Time for ABC (BMC) Time for our method 2 Master/Slave agents (no credit logic) 16 (2) 6s <1s SMF1 (unfair sink) 57 (20) 4s 2s (broken credits logic) 8s SMF2 75 (24) 1hr 3s (broken ordering logic) 4hr Deadlocks found with our method can be unreachable (due to over-approximation). In experiments, all found deadlocks were reachable.