Presentation is loading. Please wait.

Presentation is loading. Please wait.

Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs.

Similar presentations


Presentation on theme: "Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs."— Presentation transcript:

1 Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs

2 Slide 2 of 47 Overview of verification work Cache coherence protocols –Alpha EV6, EV7, EV8 protocols –Itanium Bus protocols: –PCI-X, Infiniband (FIO/NGIO/SIO) Database systems Distributed algorithms A SAT-based bounded model checker –Applications to Itanium software

3 Slide 3 of 47 Most of this work uses TLA+ Lamport’s specification language based on set theory, first-order logic, temporal logic Hierarchical style improves readability, rigor –specifications: becomes –proofs: becomes Most find reading easy, writing not too hard 1. 1. CASE 2. CASE 3. QED

4 Slide 4 of 47 Wildfire: EV6 cache coherence Kourosh Gharachorloo, Madhu Sharma, Simon Steely, Steve Van Doren quad global switch quad P1 local switch P2 P3 P4 mem DIR TTT DTAG global port arbiter 32 processor server

5 Slide 5 of 47 Directory-based cache coherence xx 5 copiesowner processorsmemorydirectory To get x, go to x’s directory to see who owns x. P1 P2 P3 P4

6 Slide 6 of 47 Get read-only copy P Q Rd(x) Fwd(x) Fill(x,5) copies=Qxowner=Q P,Q

7 Slide 7 of 47 Get writable copy P S R Q copies=Q,R,Sxowner=Q RdEx(x) FwdRdEx(x) FillEx(x,5) Inval(x) P P

8 Slide 8 of 47 A complicated protocol Directory can be many steps ahead of processors R1 R2R3 Dir RdEx(x) FwdRdEx(x) RdEx(x) FwdRdEx(x) Data

9 Slide 9 of 47 A complicated protocol Generates data and commit events independently Memory barriers impose instruction order –Maintain count of outstanding off-chip requests –Pass memory barrier only when count is 0 read A MB read B inval(x) inval(y) data(A) read A MB read B inval(x) inval(y) commit(A) data(A)

10 Slide 10 of 47 Dramatic speedups possible Reads are fastMBs are fast read A … work... MB read B inval(x) inval(y) commit(A) data(A) read A MB read B commit(A) data(A) owner fwd(A) “Intuitively surprising this actually works!”

11 Slide 11 of 47 Wildfire verification Paul Harter, Leslie Lamport, Mark Tuttle, Yuan Yu We are asked to look at the protocol We arrive very late (almost tape-out) No time for complete proof But enough time for a rigorous analysis

12 Slide 12 of 47 Wildfire cache coherence in “three easy steps”+“two-man years” Model Alpha memory model. (200 lines) Model complete protocol. (2000 lines, 3 months) Prove implementation (5500 lines, 4+ months, incomplete) Model abstract protocol. (500 lines) Prove implementation (550 lines, 2 months, informal)

13 Slide 13 of 47 Step 1: Alpha memory model Official specification is –Informal: an English document –Behavioral: defines acceptable sequences of memory operations Our specification is –Precise: a single logical formula –State-based: required for invariance-style proofs We did simplify the model slightly: –Operations read and write entire cache lines –Some “impossible” implementations ruled out Compare the specifications: 12 pages vs 200 lines

14 Slide 14 of 47 The heart of the model A Before order –Orders reads and writes in an execution –Determines return values for the reads A GoodExecutionOrder predicate –Defines the Before orders allowed by the model

15 Slide 15 of 47 State machine actions ReceiveRequest(proc, req) Receive a request ChooseNewData(proc, idx) Choose the return value for a request Respond(proc, idx) Return the value to a request ExtendBefore Expand the Before relation Actions must preserve GoodExecutionOrder.

16 Slide 16 of 47 GoodExecutionOrder GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\ \A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2) This is the hard part --- look how short it is!

17 Slide 17 of 47 /\ (*******************************************************) (* Writes and successful SCs to the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1)

18 Slide 18 of 47 /\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2)

19 Slide 19 of 47 /\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2)

20 Slide 20 of 47 Step 2: Model abstract protocol protocol = abstract protocol + implementation junk Surprisingly, –abstract protocol’s correctness was far from obvious –we discovered a bug… in the memory model Proved hardest part of correctness: –Proved the Before order is acyclic –35-line invariant based on 300 lines of definitions –550-line proof, cases nested 10 levels deep

21 Slide 21 of 47 Found: Alpha memory model bug x=0, y=0 P: if x=1 then y:= 2 Q: if y=2 then x:=1 x=1, y=2 This behavior breaks the critical section implementation recommended in the SRM. (Jim Saxe) Original Alpha memory model allowed

22 Slide 22 of 47 Revised Alpha memory model causal cycle P: if x=1 then y:=2 Q: if y=2 then x:= 1 break the cycle P: if x=1 then y:=2 Q: if y=2 then x:= 1

23 Slide 23 of 47 Wildfire counterexample The Alpha memory model says x=3, but in Wildfire it could be x=1… Q: if x=1 then y:=2 R: if y=2 then x:=3 P: x:=1

24 Slide 24 of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 ITD(x)ok

25 Slide 25 of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 Rd(x) Fwd(x) x=1

26 Slide 26 of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 ITD(y) x=1 ok y=2

27 Slide 27 of 47 Q directory P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0,3Px=1 y=2 Rd(y) Fwd(y) y=2 Inval(x)

28 Slide 28 of 47 Q P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=3Px=1 y=2 Inval(x) The result must be x=3, but the result is x=1. The same thing was possible in other machines. (Kourosh Gharachorloo)

29 Slide 29 of 47 What went wrong? An ordering internal to P … forced an ordering for Q: P: if x=1 then y:=2 Q: if y=2 then x:= 1 P: if x=1 then y:=2 Q: if y=2 then x:= 1 The fix: use internal orderings to forbid orderings, but not to force orderings.

30 Slide 30 of 47 New Alpha memory model Q: if x=1 then y:=2 R: if y=2 then x:= 3 P: x:=1 There is no dependency/source cycle: R1R2W1WnW2 …

31 Slide 31 of 47 Obstacle: no single, complete description English documents: 12 documents, 4-inch stack Lisp simulator: crucial to understanding some details None compact, none mathematically tractable Different levels of abstraction, some inconsistency We had to write our own description Step 3: Model complete protocol

32 Slide 32 of 47 Obstacle: algorithm complexity ChangeToDirty DummyRdVic FailedChangeToDirty Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic ChangeToDirtyFailure ChangeToDirtySuccess FetchFillMarker FillMarkerFillMarkerMod ForwardFetch ForwardFetchWithFetchFillMarker ForwardRd ForwardRdMod ForwardRdWithFillMarker ForwardRdModWithFillMarkerMod InvalAck InvalToDirtySuccess Invalidate LoopComsig LoopComsigWithInvalAck LoopComsigWithShadowClear LoopComsigWithShadowInvalAndShadowClear ShadowChangeToDirtySuccess ShadowForwardFetch ShadowForwardRd ShadowForwardRdMod ShadowInvalToDirtySuccess ShadowInvalidate ShadowShortFillMod ShadowSnap ShortFetchFill ShortFill ShortFillMod VictimAck FetchFill Fill FillMod VCFetchFill VCFill VCFillMod

33 Slide 33 of 47 Solution: Quarks Ack ChangeToDirty Clear Comsig Fill ForwardedGet GetValue InvalidToDirty QuadInvalidate ReleaseMAF ReleaseVDB SetCacheLineState Victimize Write Quarks combine to form messages.

34 Slide 34 of 47 Quarks form messages, then split up GetValueQuadInval ForwardedGetForwardedGet, QuadInval, Comsig home quad global switch copy holders owner reader

35 Slide 35 of 47 Quarks resolve message overloading “ChangeToDirtySuccess” could mean –{AckChangeToDirty, Comsig, QuadInvalidate^*, ClearOutstandingInval} –{AckChangeToDirty, Comsig, QuadInvalidate^*} –{Comsig, ReleaseMAF, SetCacheLineState} Quarks simplify algorithm description Each quark processed separately, independently Each data structure changed by a single quark

36 Slide 36 of 47 Quark handling ProcFieldsMessage(proc, msg) == /\... /\ Cache' = CASE... [] ("Fill" \in msg) /\ (subtype("Fill") # "Fetch") -> [Cache EXCEPT ![proc, cacheIndex].state = IF subtype("Fill") = "Mod" THEN "ExclusiveDirty" ELSE "Clean", ![proc, cacheIndex].tag = AddressToTag(msg.adr), ![proc, cacheIndex].data = msg.data ] If a processor receives a Fill quark carrying cacheable data, then how is the cache is updated?

37 Slide 37 of 47 Define an invariant describing all reachable states. 1000 lines Prove invariance. We focused on the most difficult, error-prone parts: cachedtagdirectory messages Wildfire invariant on quad (150 lines) off quad (150 lines)

38 Slide 38 of 47 2./\ a.\/ (* proc is the owner of adr *) 1./\ Dir[adr].owner = proc b.\/ (* proc is not the owner of adr *)... 2./\ a.\/ (* dtag is dirty *) 1./\ DTagState(adr, proc) = Dirty... b.\/ (* dtag is invalid *)... c.\/ (* dtag is clean *)... Dir - Dtag Invariant DTagCacheInvariant ==... Mother == DirDTagInvariant /\ DTagCacheInvariant /\... DirDTagInvariant == \A adr \in MemBlockAddress, proc \in Processor : a.\/ (* local address *)... b.\/ (* nonlocal address *) 1./\ ProcToQuad(proc) # AddressToQuad(adr) 2./\ Proj(HomeToArbQ) = [ [FG* [QFI] QI* AckWrite] QI* AGV(mod,1) | FG* AckCTD(Success)] FG*

39 Slide 39 of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* DTagState(proc, adr) = "Invalid" *) 2. CASE b (* DTagState(proc, adr) # "Invalid" *) 3. QED

40 Slide 40 of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* DTagState(proc, adr) = "Invalid" *) 1. CASE a2a (* AddressCache(proc, adr).state' = "Invalid" *) 2. CASE a2b (* AddressCache(proc, adr).state' # "Invalid" *) 3. QED 2. CASE b (* DTagState(proc, adr) # "Invalid" *) 3. QED

41 Slide 41 of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* 1./\ DTagState(proc, adr) = "Invalid" *) 1. CASE a2a (* 1. AddressCache(proc, adr).state' = "Invalid" *)... 1. CASE doing something at the proc Pf:.... 2. CASE doing something at the arb 3. QED... 2. CASE a2b (* 1. AddressCache(proc, adr).state' # "Invalid" *) 3. QED 2. CASE b (* 1./\ DTagState(proc, adr) # "Invalid" *) 3. QED

42 Slide 42 of 47 The implementation proof In Step 2, we defined an abstract model of the Wildfire algorithm In Step 3, we defined a complete model of the Wildfire algorithm Now use the invariant to prove that the complete model implements the abstract model. This is undone.

43 Slide 43 of 47 Results: one bug A fetch is an uncached read. Victimization removes data from the cache. The bug allows a fetch to interfere with victimization. To demonstrate the bug, we need to describe more of the hardware…

44 Slide 44 of 47 The quad architecture quad proc cache proc P ArbGP dtagdirectorymemoryttt switch to other quads

45 Slide 45 of 47 Dtag: a duplicate copy of cache state One use: invalidate all copies on a quad. cache P Arb dtag y r/w y P r/w inval(y)

46 Slide 46 of 47 TTT: tells state of off-quad requests GP ttt cache P y write(y) ackwrite(y)

47 Slide 47 of 47 The Bug By causing a fetch to interfere with a victimize, generate an Inval(y) to a cache without a copy of y. cache P Arb dtag y r/w inval(y)

48 Slide 48 of 47 Initial state: P owns y dirmem y: Py dtag y: P ttt gp arb P y QRS

49 Slide 49 of 47 Now P victimizes y to read x into same cache line dirmem y: Py dtag y: P gp arb P y QRS ttt write(y)ackwrite(y) write(y) ackwrite(y) get(x)

50 Slide 50 of 47 So P is waiting for x dirmem y:y dtag y: P gp arb PQRS ttt write(y)ackwrite(y) get(x)

51 Slide 51 of 47 Now R becomes owner of y dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x)

52 Slide 52 of 47 Now P fetches y while waiting for x dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x) fetch(y) ackfetch(y) fill(y)

53 Slide 53 of 47 Now P gets its copy of y dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x) ackfetch(y) fetch(y)ackfetch(y) fill(y) fwd(y) fill(y)

54 Slide 54 of 47 Now ackwrite arrives: the bug dirmem y: Ry ttt write(y)ackwrite(y) ackfetch(y) fetch(y)ackfetch(y) ackwrites normally invalidate dtag entries half-completed reads normally inhibit this: the new data has reached the cache, and the dtag entry is for the new data but fetches are not cached, and should not be treated like cached reads dtag y: P

55 Slide 55 of 47 So P is still waiting for x dtag y: P gp arb PQ ttt get(x) R owns y S dirmem y: Ry

56 Slide 56 of 47 Now Q reads y dtag y: P,Q gp arb PQ gets y ttt get(x) R owns y S dirmem y: R,Qy

57 Slide 57 of 47 Now S becomes owner of y dtag y: P,Q gp arb PQ gets y ttt get(x) S owns y R dirmem y: R,Qy inval(y)

58 Slide 58 of 47 Now we are in trouble… gp arb PQ gets y ttt get(x) inval(y) dtag y: P,Q The inval is forwarded to both P and Q but P doesn’t have a copy to invalidate!

59 Slide 59 of 47 The bug is obvious in hindsight But our scenario exhibiting the bug –is very long, –uses 4 processors, –uses 2 locations, and –uses 15 messages. Finding this scenario seemed beyond the power of automated tools like model checkers.

60 Slide 60 of 47 Wildfire conclusion We performed a rigorous analysis. We studied the hardest parts of the algorithm. Designers said their confidence in the algorithm was much improved. We expected to find more errors –Designers knew what they were doing –We joined late in the design cycle: We had been asked to study the protocol Bugs at protocol level had already been found All remaining bugs were at the implementation level

61 Slide 61 of 47 EV7 cache coherence Joshua Scheid, Homayoon Akhiani, Jonathan Nall Damien Doligez, Scott Kreider, Scott Taylor, Brannon Batson Much simpler protocol, proof actually completed First TLA+ specification written by engineers First intense application of TLC model checker New, interesting uses of spec in simulation

62 Slide 62 of 47 Results 73 bugs found –Most bugs were ambiguity in design documents 37 minor: typos, type errors, etc 11 bugs: wrong message sent/wrong state set 14 missing cases 7 spurious cases (dead code) –5 bugs were actual implementation bugs 1 found by TLC 4 found by using TLC error traces for RTL simulation…

63 Slide 63 of 47 Interesting spec applications Translate TLC error traces into RTL stimulus –Force RTL simulator into interesting corner cases Translate random TLC traces: –Better than random stimulus: satisfies TLA+ spec! Translate RTL simulator output to TLA+ –TLC can check that RTL satisfies TLA+ spec –TLC can trace visited states, improve coverage TLA+ specs yield good RTL assertions to check

64 Slide 64 of 47 Itanium cache coherence Mark Tuttle, Jae Yang Used Intel chips, modeled their external behavior Simply writing spec yielded two design changes Too big for TLC: –Intel chip models allowed too many behaviors –Interesting scenarios required large configurations –Used TLC for simulation, not model checking Most interesting: TLA+ Itanium memory model –With Gil Neiger, Leslie Lamport, Yuan Yu

65 Slide 65 of 47 Interconnect verification PCI-X: a high-speed extension to PCI bus –Tom Rodeheffer, Mark Tuttle –Found fatal flaws in submissions to standards group Infiniband (FIO/NGIO/SIO) –Mark Tuttle, Jae Yang, George Zhang, …

66 Slide 66 of 47 Database recovery Dave Lomet, Mark Tuttle data log cache cache managerlog manager disk memory O: x := x+1 xO

67 Slide 67 of 47 After a crash, only the disk remains data log Recovery manager must reconstruct the database Recovery manager has only the bits left on disk Our theory explains how bits must be managed

68 Slide 68 of 47 Recovery theory Define an ordering on the database operations Theorem: If cache manager follows order, then state remains recoverable. Theorem: If recovery manager follows order from a recoverable state, then recovery succeeds. The proofs were done and perfect … … but model checking found 3 subtle mistakes!

69 Slide 69 of 47 Robot rendezvous Maurice Herlihy, Mark Tuttle Robots parachute onto a graph, move around the graph, and rendezvous on a single node. Protocol is complete, all but the “move” function Model checking shows no “move” will work Saved a week of useless search for “move” !

70 Slide 70 of 47 The dream Model checking white board conversations. –This is where the real design happens. –This is where the problems are encountered This requires an abstract, expressive language TLA+ is a good language TLC is too slow. What can we take from TLA+ … … and still get reasonable performance?

71 Slide 71 of 47 SAT-based BMC Rajeev Joshi, John Matthews, Mark Tuttle Rajeev Joshi, John Matthews, Mark Tuttle TLA+ is the right language, TLC is too slow Why not use TLA+ –Thought typed language would help (TLA+ untyped) –Wanted to be able to change language (TLA+ hard to parse) –In hindsight may not have been necessary

72 Slide 72 of 47 Protocol and property Boolean formula x and (y or z) SAT checker Satisfying assignment x = true, y=true, z=false Counterexample trace S0 S1 S2 S3 … S property violated! Model checker SAT-based Model Checking Rajeev Joshi, John Matthews, Mark Tuttle Only nontrivial step

73 Slide 73 of 47 MLA Language Types: booleans, integer ranges, records, enums, bounded sequences, finite sets, recursive functions Operators: arithmetic, logical, relational Value constructors: lambdas, etc. Expressions: let, case, if, quantification MLA compiler: –16,000 lines of ML (wc) –Function translation only source of trouble

74 Slide 74 of 47 Itanium program verification Given –Assembly language program (compiler output) –Safety property (an invariant) Is safety violated by an execution of the program allowed by the Itanium memory model ? Thinking of synchronization code (mutex) Examples were Dekker and Bakery algs

75 Slide 75 of 47 Conclusion Formal methods are up to industrial problems. Formal methods have incremental payoffs: Specification documents design. Model checking finds quick design errors. Proof writing finds deeper design errors. Any partial proof is still a rigorous analysis.

76 Slide 76 of 47

77 Slide 77 of 47 EV7 cache coherence Specification is 1800 lines Specification accepted by TLC w/o modification Largest instance feasible to check with TLC: –1 cache line, 2 data values, 3 processors –12 million reachable states (w/ symmetry reductions)

78 Slide 78 of 47 EV8 cache coherence EV8 designers planned to use a TLA+ spec as the official spec (plus an English explanation) But EV8 was cancelled, designers went to Intel Rumor: Intel has an official spec in TLA+


Download ppt "Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs."

Similar presentations


Ads by Google