Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs.

Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs

of 47 Overview of verification work Cache coherence protocols –Alpha EV6, EV7, EV8 protocols –Itanium Bus protocols: –PCI-X, Infiniband (FIO/NGIO/SIO) Database systems Distributed algorithms A SAT-based bounded model checker –Applications to Itanium software

of 47 Most of this work uses TLA+ Lamport’s specification language based on set theory, first-order logic, temporal logic Hierarchical style improves readability, rigor –specifications: becomes –proofs: becomes Most find reading easy, writing not too hard 1. 1. CASE 2. CASE 3. QED

of 47 Wildfire: EV6 cache coherence Kourosh Gharachorloo, Madhu Sharma, Simon Steely, Steve Van Doren quad global switch quad P1 local switch P2 P3 P4 mem DIR TTT DTAG global port arbiter 32 processor server

of 47 Directory-based cache coherence xx 5 copiesowner processorsmemorydirectory To get x, go to x’s directory to see who owns x. P1 P2 P3 P4

of 47 Get read-only copy P Q Rd(x) Fwd(x) Fill(x,5) copies=Qxowner=Q P,Q

of 47 Get writable copy P S R Q copies=Q,R,Sxowner=Q RdEx(x) FwdRdEx(x) FillEx(x,5) Inval(x) P P

of 47 A complicated protocol Directory can be many steps ahead of processors R1 R2R3 Dir RdEx(x) FwdRdEx(x) RdEx(x) FwdRdEx(x) Data

of 47 A complicated protocol Generates data and commit events independently Memory barriers impose instruction order –Maintain count of outstanding off-chip requests –Pass memory barrier only when count is 0 read A MB read B inval(x) inval(y) data(A) read A MB read B inval(x) inval(y) commit(A) data(A)

of 47 Dramatic speedups possible Reads are fastMBs are fast read A … work... MB read B inval(x) inval(y) commit(A) data(A) read A MB read B commit(A) data(A) owner fwd(A) “Intuitively surprising this actually works!”

of 47 Wildfire verification Paul Harter, Leslie Lamport, Mark Tuttle, Yuan Yu We are asked to look at the protocol We arrive very late (almost tape-out) No time for complete proof But enough time for a rigorous analysis

of 47 Wildfire cache coherence in “three easy steps”+“two-man years” Model Alpha memory model. (200 lines) Model complete protocol. (2000 lines, 3 months) Prove implementation (5500 lines, 4+ months, incomplete) Model abstract protocol. (500 lines) Prove implementation (550 lines, 2 months, informal)

of 47 Step 1: Alpha memory model Official specification is –Informal: an English document –Behavioral: defines acceptable sequences of memory operations Our specification is –Precise: a single logical formula –State-based: required for invariance-style proofs We did simplify the model slightly: –Operations read and write entire cache lines –Some “impossible” implementations ruled out Compare the specifications: 12 pages vs 200 lines

of 47 The heart of the model A Before order –Orders reads and writes in an execution –Determines return values for the reads A GoodExecutionOrder predicate –Defines the Before orders allowed by the model

of 47 State machine actions ReceiveRequest(proc, req) Receive a request ChooseNewData(proc, idx) Choose the return value for a request Respond(proc, idx) Return the value to a request ExtendBefore Expand the Before relation Actions must preserve GoodExecutionOrder.

of 47 GoodExecutionOrder GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\ \A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2) This is the hard part --- look how short it is!

of 47 /\ (*******************************************************) (* Writes and successful SCs to the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1)

of 47 /\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2)

of 47 /\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2)

of 47 Step 2: Model abstract protocol protocol = abstract protocol + implementation junk Surprisingly, –abstract protocol’s correctness was far from obvious –we discovered a bug… in the memory model Proved hardest part of correctness: –Proved the Before order is acyclic –35-line invariant based on 300 lines of definitions –550-line proof, cases nested 10 levels deep

of 47 Found: Alpha memory model bug x=0, y=0 P: if x=1 then y:= 2 Q: if y=2 then x:=1 x=1, y=2 This behavior breaks the critical section implementation recommended in the SRM. (Jim Saxe) Original Alpha memory model allowed

of 47 Revised Alpha memory model causal cycle P: if x=1 then y:=2 Q: if y=2 then x:= 1 break the cycle P: if x=1 then y:=2 Q: if y=2 then x:= 1

of 47 Wildfire counterexample The Alpha memory model says x=3, but in Wildfire it could be x=1… Q: if x=1 then y:=2 R: if y=2 then x:=3 P: x:=1

of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 ITD(x)ok

of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 Rd(x) Fwd(x) x=1

of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 ITD(y) x=1 ok y=2

of 47 Q directory P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0,3Px=1 y=2 Rd(y) Fwd(y) y=2 Inval(x)

of 47 Q P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=3Px=1 y=2 Inval(x) The result must be x=3, but the result is x=1. The same thing was possible in other machines. (Kourosh Gharachorloo)

of 47 What went wrong? An ordering internal to P … forced an ordering for Q: P: if x=1 then y:=2 Q: if y=2 then x:= 1 P: if x=1 then y:=2 Q: if y=2 then x:= 1 The fix: use internal orderings to forbid orderings, but not to force orderings.

of 47 New Alpha memory model Q: if x=1 then y:=2 R: if y=2 then x:= 3 P: x:=1 There is no dependency/source cycle: R1R2W1WnW2 …

of 47 Obstacle: no single, complete description English documents: 12 documents, 4-inch stack Lisp simulator: crucial to understanding some details None compact, none mathematically tractable Different levels of abstraction, some inconsistency We had to write our own description Step 3: Model complete protocol

of 47 Obstacle: algorithm complexity ChangeToDirty DummyRdVic FailedChangeToDirty Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic ChangeToDirtyFailure ChangeToDirtySuccess FetchFillMarker FillMarkerFillMarkerMod ForwardFetch ForwardFetchWithFetchFillMarker ForwardRd ForwardRdMod ForwardRdWithFillMarker ForwardRdModWithFillMarkerMod InvalAck InvalToDirtySuccess Invalidate LoopComsig LoopComsigWithInvalAck LoopComsigWithShadowClear LoopComsigWithShadowInvalAndShadowClear ShadowChangeToDirtySuccess ShadowForwardFetch ShadowForwardRd ShadowForwardRdMod ShadowInvalToDirtySuccess ShadowInvalidate ShadowShortFillMod ShadowSnap ShortFetchFill ShortFill ShortFillMod VictimAck FetchFill Fill FillMod VCFetchFill VCFill VCFillMod

of 47 Solution: Quarks Ack ChangeToDirty Clear Comsig Fill ForwardedGet GetValue InvalidToDirty QuadInvalidate ReleaseMAF ReleaseVDB SetCacheLineState Victimize Write Quarks combine to form messages.

of 47 Quarks form messages, then split up GetValueQuadInval ForwardedGetForwardedGet, QuadInval, Comsig home quad global switch copy holders owner reader

of 47 Quarks resolve message overloading “ChangeToDirtySuccess” could mean –{AckChangeToDirty, Comsig, QuadInvalidate^*, ClearOutstandingInval} –{AckChangeToDirty, Comsig, QuadInvalidate^*} –{Comsig, ReleaseMAF, SetCacheLineState} Quarks simplify algorithm description Each quark processed separately, independently Each data structure changed by a single quark

of 47 Quark handling ProcFieldsMessage(proc, msg) == /\... /\ Cache' = CASE... [] ("Fill" \in msg) /\ (subtype("Fill") # "Fetch") -> [Cache EXCEPT ![proc, cacheIndex].state = IF subtype("Fill") = "Mod" THEN "ExclusiveDirty" ELSE "Clean", ![proc, cacheIndex].tag = AddressToTag(msg.adr), ![proc, cacheIndex].data = msg.data ] If a processor receives a Fill quark carrying cacheable data, then how is the cache is updated?

of 47 Define an invariant describing all reachable states. 1000 lines Prove invariance. We focused on the most difficult, error-prone parts: cachedtagdirectory messages Wildfire invariant on quad (150 lines) off quad (150 lines)

of 47 2./\ a.\/ (* proc is the owner of adr *) 1./\ Dir[adr].owner = proc b.\/ (* proc is not the owner of adr *)... 2./\ a.\/ (* dtag is dirty *) 1./\ DTagState(adr, proc) = Dirty... b.\/ (* dtag is invalid *)... c.\/ (* dtag is clean *)... Dir - Dtag Invariant DTagCacheInvariant ==... Mother == DirDTagInvariant /\ DTagCacheInvariant /\... DirDTagInvariant == \A adr \in MemBlockAddress, proc \in Processor : a.\/ (* local address *)... b.\/ (* nonlocal address *) 1./\ ProcToQuad(proc) # AddressToQuad(adr) 2./\ Proj(HomeToArbQ) = [ [FG* [QFI] QI* AckWrite] QI* AGV(mod,1) | FG* AckCTD(Success)] FG*

of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* DTagState(proc, adr) = "Invalid" *) 2. CASE b (* DTagState(proc, adr) # "Invalid" *) 3. QED

of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* DTagState(proc, adr) = "Invalid" *) 1. CASE a2a (* AddressCache(proc, adr).state' = "Invalid" *) 2. CASE a2b (* AddressCache(proc, adr).state' # "Invalid" *) 3. QED 2. CASE b (* DTagState(proc, adr) # "Invalid" *) 3. QED

of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* 1./\ DTagState(proc, adr) = "Invalid" *) 1. CASE a2a (* 1. AddressCache(proc, adr).state' = "Invalid" *)... 1. CASE doing something at the proc Pf:.... 2. CASE doing something at the arb 3. QED... 2. CASE a2b (* 1. AddressCache(proc, adr).state' # "Invalid" *) 3. QED 2. CASE b (* 1./\ DTagState(proc, adr) # "Invalid" *) 3. QED

of 47 The implementation proof In Step 2, we defined an abstract model of the Wildfire algorithm In Step 3, we defined a complete model of the Wildfire algorithm Now use the invariant to prove that the complete model implements the abstract model. This is undone.

of 47 Results: one bug A fetch is an uncached read. Victimization removes data from the cache. The bug allows a fetch to interfere with victimization. To demonstrate the bug, we need to describe more of the hardware…

of 47 The quad architecture quad proc cache proc P ArbGP dtagdirectorymemoryttt switch to other quads

of 47 Dtag: a duplicate copy of cache state One use: invalidate all copies on a quad. cache P Arb dtag y r/w y P r/w inval(y)

of 47 TTT: tells state of off-quad requests GP ttt cache P y write(y) ackwrite(y)

of 47 The Bug By causing a fetch to interfere with a victimize, generate an Inval(y) to a cache without a copy of y. cache P Arb dtag y r/w inval(y)

of 47 Initial state: P owns y dirmem y: Py dtag y: P ttt gp arb P y QRS

of 47 Now P victimizes y to read x into same cache line dirmem y: Py dtag y: P gp arb P y QRS ttt write(y)ackwrite(y) write(y) ackwrite(y) get(x)

of 47 So P is waiting for x dirmem y:y dtag y: P gp arb PQRS ttt write(y)ackwrite(y) get(x)

of 47 Now R becomes owner of y dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x)

of 47 Now P fetches y while waiting for x dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x) fetch(y) ackfetch(y) fill(y)

of 47 Now P gets its copy of y dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x) ackfetch(y) fetch(y)ackfetch(y) fill(y) fwd(y) fill(y)

of 47 Now ackwrite arrives: the bug dirmem y: Ry ttt write(y)ackwrite(y) ackfetch(y) fetch(y)ackfetch(y) ackwrites normally invalidate dtag entries half-completed reads normally inhibit this: the new data has reached the cache, and the dtag entry is for the new data but fetches are not cached, and should not be treated like cached reads dtag y: P

of 47 So P is still waiting for x dtag y: P gp arb PQ ttt get(x) R owns y S dirmem y: Ry

of 47 Now Q reads y dtag y: P,Q gp arb PQ gets y ttt get(x) R owns y S dirmem y: R,Qy

of 47 Now S becomes owner of y dtag y: P,Q gp arb PQ gets y ttt get(x) S owns y R dirmem y: R,Qy inval(y)

of 47 Now we are in trouble… gp arb PQ gets y ttt get(x) inval(y) dtag y: P,Q The inval is forwarded to both P and Q but P doesn’t have a copy to invalidate!

of 47 The bug is obvious in hindsight But our scenario exhibiting the bug –is very long, –uses 4 processors, –uses 2 locations, and –uses 15 messages. Finding this scenario seemed beyond the power of automated tools like model checkers.

of 47 Wildfire conclusion We performed a rigorous analysis. We studied the hardest parts of the algorithm. Designers said their confidence in the algorithm was much improved. We expected to find more errors –Designers knew what they were doing –We joined late in the design cycle: We had been asked to study the protocol Bugs at protocol level had already been found All remaining bugs were at the implementation level

of 47 EV7 cache coherence Joshua Scheid, Homayoon Akhiani, Jonathan Nall Damien Doligez, Scott Kreider, Scott Taylor, Brannon Batson Much simpler protocol, proof actually completed First TLA+ specification written by engineers First intense application of TLC model checker New, interesting uses of spec in simulation

of 47 Results 73 bugs found –Most bugs were ambiguity in design documents 37 minor: typos, type errors, etc 11 bugs: wrong message sent/wrong state set 14 missing cases 7 spurious cases (dead code) –5 bugs were actual implementation bugs 1 found by TLC 4 found by using TLC error traces for RTL simulation…

of 47 Interesting spec applications Translate TLC error traces into RTL stimulus –Force RTL simulator into interesting corner cases Translate random TLC traces: –Better than random stimulus: satisfies TLA+ spec! Translate RTL simulator output to TLA+ –TLC can check that RTL satisfies TLA+ spec –TLC can trace visited states, improve coverage TLA+ specs yield good RTL assertions to check

of 47 Itanium cache coherence Mark Tuttle, Jae Yang Used Intel chips, modeled their external behavior Simply writing spec yielded two design changes Too big for TLC: –Intel chip models allowed too many behaviors –Interesting scenarios required large configurations –Used TLC for simulation, not model checking Most interesting: TLA+ Itanium memory model –With Gil Neiger, Leslie Lamport, Yuan Yu

of 47 Interconnect verification PCI-X: a high-speed extension to PCI bus –Tom Rodeheffer, Mark Tuttle –Found fatal flaws in submissions to standards group Infiniband (FIO/NGIO/SIO) –Mark Tuttle, Jae Yang, George Zhang, …

of 47 Database recovery Dave Lomet, Mark Tuttle data log cache cache managerlog manager disk memory O: x := x+1 xO

of 47 After a crash, only the disk remains data log Recovery manager must reconstruct the database Recovery manager has only the bits left on disk Our theory explains how bits must be managed

of 47 Recovery theory Define an ordering on the database operations Theorem: If cache manager follows order, then state remains recoverable. Theorem: If recovery manager follows order from a recoverable state, then recovery succeeds. The proofs were done and perfect … … but model checking found 3 subtle mistakes!

of 47 Robot rendezvous Maurice Herlihy, Mark Tuttle Robots parachute onto a graph, move around the graph, and rendezvous on a single node. Protocol is complete, all but the “move” function Model checking shows no “move” will work Saved a week of useless search for “move” !

of 47 The dream Model checking white board conversations. –This is where the real design happens. –This is where the problems are encountered This requires an abstract, expressive language TLA+ is a good language TLC is too slow. What can we take from TLA+ … … and still get reasonable performance?

of 47 SAT-based BMC Rajeev Joshi, John Matthews, Mark Tuttle Rajeev Joshi, John Matthews, Mark Tuttle TLA+ is the right language, TLC is too slow Why not use TLA+ –Thought typed language would help (TLA+ untyped) –Wanted to be able to change language (TLA+ hard to parse) –In hindsight may not have been necessary

of 47 Protocol and property Boolean formula x and (y or z) SAT checker Satisfying assignment x = true, y=true, z=false Counterexample trace S0 S1 S2 S3 … S property violated! Model checker SAT-based Model Checking Rajeev Joshi, John Matthews, Mark Tuttle Only nontrivial step

of 47 MLA Language Types: booleans, integer ranges, records, enums, bounded sequences, finite sets, recursive functions Operators: arithmetic, logical, relational Value constructors: lambdas, etc. Expressions: let, case, if, quantification MLA compiler: –16,000 lines of ML (wc) –Function translation only source of trouble

of 47 Itanium program verification Given –Assembly language program (compiler output) –Safety property (an invariant) Is safety violated by an execution of the program allowed by the Itanium memory model ? Thinking of synchronization code (mutex) Examples were Dekker and Bakery algs

of 47 Conclusion Formal methods are up to industrial problems. Formal methods have incremental payoffs: Specification documents design. Model checking finds quick design errors. Proof writing finds deeper design errors. Any partial proof is still a rigorous analysis.

of 47 EV7 cache coherence Specification is 1800 lines Specification accepted by TLC w/o modification Largest instance feasible to check with TLC: –1 cache line, 2 data values, 3 processors –12 million reachable states (w/ symmetry reductions)

of 47 EV8 cache coherence EV8 designers planned to use a TLA+ spec as the official spec (plus an English explanation) But EV8 was cancelled, designers went to Intel Rumor: Intel has an official spec in TLA+

Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs.

Similar presentations

Presentation on theme: "Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs.

Similar presentations

Presentation on theme: "Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs."— Presentation transcript:

Similar presentations

About project

Feedback