Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs.

Slides:



Advertisements
Similar presentations
September 1999Compaq Computer CorporationSlide 1 of 18 Proving cache coherence for the Alpha (EV6) processor Paul Harter, Leslie Lamport, Mark Tuttle,
Advertisements

Functional Decompositions for Hardware Verification With a few speculations on formal methods for embedded systems Ken McMillan.
Copyright 2000 Cadence Design Systems. Permission is granted to reproduce without modification. Introduction An overview of formal methods for hardware.
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Semantics Static semantics Dynamic semantics attribute grammars
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
Introducing Formal Methods, Module 1, Version 1.1, Oct., Formal Specification and Analytical Verification L 5.
Automatic Verification Book: Chapter 6. What is verification? Traditionally, verification means proof of correctness automatic: model checking deductive:
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Adapted from UCB CS252 S01, Revised by Zhao Zhang in IASTATE CPRE 585, 2004 Lecture 14: Hardware Approaches for Cache Optimizations Cache performance.
Using Formal Specifications to Monitor and Guide Simulation: Verifying the Cache Coherence Engine of the Alpha Microprocessor Serdar Tasiran Systems.
Cache Optimization Summary
Annoucements  Next labs 9 and 10 are paired for everyone. So don’t miss the lab.  There is a review session for the quiz on Monday, November 4, at 8:00.
1 Mechanical Verification of Timed Automata Myla Archer and Constance Heitmeyer Presented by Rasa Bonyadlou 24 October 2002.
Software Failure: Reasons Incorrect, missing, impossible requirements * Requirement validation. Incorrect specification * Specification verification. Faulty.
Panel on Decision Procedures Panel on Decision Procedures Randal E. Bryant Lintao Zhang Nils Klarlund Harald Ruess Sergey Berezin Rajeev Joshi.
6/14/991 Symbolic verification of systems with state machines David L. Dill Jeffrey Su Jens Skakkebaek Computer System Laboratory Stanford University.
ISBN Chapter 3 Describing Syntax and Semantics.
CS 355 – Programming Languages
Using a Formal Specification and a Model Checker to Monitor and Guide Simulation Verifying the Multiprocessing Hardware of the Alpha Microprocessor.
Formal Methods in Software Engineering Credit Hours: 3+0 By: Qaisar Javaid Assistant Professor Formal Methods in Software Engineering1.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Model Checking. Used in studying behaviors of reactive systems Typically involves three steps: Create a finite state model (FSM) of the system design.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Transaction Ordering Verification using Trace Inclusion Refinement Mike Jones 11 January 2000.
Describing Syntax and Semantics
School of Computer ScienceG53FSP Formal Specification1 Dr. Rong Qu Introduction to Formal Specification
Architecture and Design of AlphaServer GS320 Kourosh Gharachorloo, Madhu Sharma, Simon Steely, and Stephen Van Doren ASPLOS’2000 Presented By: Alok Garg.
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
Application of Formal Verification Methods to the analysis of Bearings-only Ballistic Missile Interception Algorithms Eli Bendersky Michael Butvinnik Supervisor:
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Cheng/Dillon-Software Engineering: Formal Methods Model Checking.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Using Mathematica for modeling, simulation and property checking of hardware systems Ghiath AL SAMMANE VDS group : Verification & Modeling of Digital systems.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Overview of Formal Methods. Topics Introduction and terminology FM and Software Engineering Applications of FM Propositional and Predicate Logic Program.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Semantics In Text: Chapter 3.
An Axiomatic Basis for Computer Programming Robert Stewart.
Scientific Debugging. Errors in Software Errors are unexpected behaviors or outputs in programs As long as software is developed by humans, it will contain.
CS533 Concepts of Operating Systems Jonathan Walpole.
Using Sequential Containers Lecture 8 Hartmut Kaiser
Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.
Program Correctness. The designer of a distributed system has the responsibility of certifying the correctness of the system before users start using.
September 1999Compaq Computer CorporationSlide 1 of 16 Verification of cache-coherence protocols with TLA+ Homayoon Akhiani, Damien Doligez, Paul Harter,
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
Writing, Verifying and Exploiting Formal Specifications for Hardware Designs Chapter 3: Verifying a Specification Presenter: Scott Crosby.
Architecture and Design of the AlphaServer GS320 Gharachorloo, et al. (Compaq) Presented by Curt Harting
ALLOY: A Formal Methods Tool Glenn Gordon Indiana University of Pennsylvania COSC 481- Formal Methods Dr. W. Oblitey 26 April 2005.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Copyright 1999 G.v. Bochmann ELG 7186C ch.1 1 Course Notes ELG 7186C Formal Methods for the Development of Real-Time System Applications Gregor v. Bochmann.
Zuse’s Plankalkül – 1945 Never implemented Problems Zuse Solved
Architecture and Design of AlphaServer GS320
Verification and Validation Overview
Multiprocessor Cache Coherency
Shared Memory Consistency Models: A Tutorial
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Presentation transcript:

Verification at HP Labs Mark Tuttle (with the help of many friends at) HP Labs

Slide 2 of 47 Overview of verification work Cache coherence protocols –Alpha EV6, EV7, EV8 protocols –Itanium Bus protocols: –PCI-X, Infiniband (FIO/NGIO/SIO) Database systems Distributed algorithms A SAT-based bounded model checker –Applications to Itanium software

Slide 3 of 47 Most of this work uses TLA+ Lamport’s specification language based on set theory, first-order logic, temporal logic Hierarchical style improves readability, rigor –specifications: becomes –proofs: becomes Most find reading easy, writing not too hard CASE 2. CASE 3. QED

Slide 4 of 47 Wildfire: EV6 cache coherence Kourosh Gharachorloo, Madhu Sharma, Simon Steely, Steve Van Doren quad global switch quad P1 local switch P2 P3 P4 mem DIR TTT DTAG global port arbiter 32 processor server

Slide 5 of 47 Directory-based cache coherence xx 5 copiesowner processorsmemorydirectory To get x, go to x’s directory to see who owns x. P1 P2 P3 P4

Slide 6 of 47 Get read-only copy P Q Rd(x) Fwd(x) Fill(x,5) copies=Qxowner=Q P,Q

Slide 7 of 47 Get writable copy P S R Q copies=Q,R,Sxowner=Q RdEx(x) FwdRdEx(x) FillEx(x,5) Inval(x) P P

Slide 8 of 47 A complicated protocol Directory can be many steps ahead of processors R1 R2R3 Dir RdEx(x) FwdRdEx(x) RdEx(x) FwdRdEx(x) Data

Slide 9 of 47 A complicated protocol Generates data and commit events independently Memory barriers impose instruction order –Maintain count of outstanding off-chip requests –Pass memory barrier only when count is 0 read A MB read B inval(x) inval(y) data(A) read A MB read B inval(x) inval(y) commit(A) data(A)

Slide 10 of 47 Dramatic speedups possible Reads are fastMBs are fast read A … work... MB read B inval(x) inval(y) commit(A) data(A) read A MB read B commit(A) data(A) owner fwd(A) “Intuitively surprising this actually works!”

Slide 11 of 47 Wildfire verification Paul Harter, Leslie Lamport, Mark Tuttle, Yuan Yu We are asked to look at the protocol We arrive very late (almost tape-out) No time for complete proof But enough time for a rigorous analysis

Slide 12 of 47 Wildfire cache coherence in “three easy steps”+“two-man years” Model Alpha memory model. (200 lines) Model complete protocol. (2000 lines, 3 months) Prove implementation (5500 lines, 4+ months, incomplete) Model abstract protocol. (500 lines) Prove implementation (550 lines, 2 months, informal)

Slide 13 of 47 Step 1: Alpha memory model Official specification is –Informal: an English document –Behavioral: defines acceptable sequences of memory operations Our specification is –Precise: a single logical formula –State-based: required for invariance-style proofs We did simplify the model slightly: –Operations read and write entire cache lines –Some “impossible” implementations ruled out Compare the specifications: 12 pages vs 200 lines

Slide 14 of 47 The heart of the model A Before order –Orders reads and writes in an execution –Determines return values for the reads A GoodExecutionOrder predicate –Defines the Before orders allowed by the model

Slide 15 of 47 State machine actions ReceiveRequest(proc, req) Receive a request ChooseNewData(proc, idx) Choose the return value for a request Respond(proc, idx) Return the value to a request ExtendBefore Expand the Before relation Actions must preserve GoodExecutionOrder.

Slide 16 of 47 GoodExecutionOrder GoodExecutionOrder == LET [some definitions deleted] IN /\ (*************************************************************) (* Before is a partial order. *) (*************************************************************) /\ Before \subseteq ReqId \X ReqId /\ \A r1, r2 \in ReqId : IsBefore(r1, r2) => ~IsBefore(r2, r1) /\ \A r1, r2, r3 \in ReqId : IsBefore(r1, r2) /\ IsBefore(r2, r3) => IsBefore(r1, r3) /\ (*************************************************************) (* SourceOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : SourceOrder(r1, r2) => IsBefore(r1, r2) /\ (*************************************************************) (* RequestOrder implies the Before order. *) (*************************************************************) \A r1, r2 \in ReqId : RequestOrder(r1, r2) => IsBefore(r1, r2) This is the hard part --- look how short it is!

Slide 17 of 47 /\ (*******************************************************) (* Writes and successful SCs to the same location that *) (* have issued a response are totally ordered. *) (*******************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r1].req.type \in {"Wr", "SC"} /\ ReqIdQ[r1].req.newData # "Failed" /\ ReqIdQ[r1].req.responded /\ ReqIdQ[r2].req.type \in {"Wr", "SC"} /\ ReqIdQ[r2].req.newData # "Failed" /\ ReqIdQ[r2].req.responded /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IsBefore(r1, r2) \/ IsBefore(r2, r1)

Slide 18 of 47 /\ (*******************************************************************) (* LL/SC Axiom: For each successful SC, there is a matching LL and *) (* there is no write to the same address from a different *) (* processor between the LL and SC in the Before order. *) (*******************************************************************) \A r2 \in ReqId : /\ ReqIdQ[r2].req.type = "SC" /\ ReqIdQ[r2].newData \notin {Failed, NotChosen} => \E r1 \in ReqId : /\ LLSCPair(r1, r2) /\ \A r \in ReqId : /\ \/ ReqIdQ[r].req.type = "Wr" \/ /\ ReqIdQ[r].req.type = "SC" /\ ReqIdQ[r].newData \notin {NotChosen, Failed} /\ r[1] # r2[1] /\ ReqIdQ[r2].req.adr = ReqIdQ[r].req.adr => ~IsBefore(r1, r) \/ ~IsBefore(r, r2)

Slide 19 of 47 /\ (**************************************************************) (* Value Axiom: A read reads from the preceding write in the *) (* Before order. *) (**************************************************************) \A r1, r2 \in ReqId : /\ ReqIdQ[r2].source # NoSource /\ ReqIdQ[r1].req.type = "Wr" /\ ReqIdQ[r1].req.adr = ReqIdQ[r2].req.adr => IF ReqIdQ[r2].source = FromInitMem THEN ~IsBefore(r1, r2) ELSE \/ ~IsBefore(ReqIdQ[r2].source, r1) \/ ~IsBefore(r1, r2)

Slide 20 of 47 Step 2: Model abstract protocol protocol = abstract protocol + implementation junk Surprisingly, –abstract protocol’s correctness was far from obvious –we discovered a bug… in the memory model Proved hardest part of correctness: –Proved the Before order is acyclic –35-line invariant based on 300 lines of definitions –550-line proof, cases nested 10 levels deep

Slide 21 of 47 Found: Alpha memory model bug x=0, y=0 P: if x=1 then y:= 2 Q: if y=2 then x:=1 x=1, y=2 This behavior breaks the critical section implementation recommended in the SRM. (Jim Saxe) Original Alpha memory model allowed

Slide 22 of 47 Revised Alpha memory model causal cycle P: if x=1 then y:=2 Q: if y=2 then x:= 1 break the cycle P: if x=1 then y:=2 Q: if y=2 then x:= 1

Slide 23 of 47 Wildfire counterexample The Alpha memory model says x=3, but in Wildfire it could be x=1… Q: if x=1 then y:=2 R: if y=2 then x:=3 P: x:=1

Slide 24 of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 ITD(x)ok

Slide 25 of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 Rd(x) Fwd(x) x=1

Slide 26 of 47 Q directory Inval(x) P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0Px=1 ITD(y) x=1 ok y=2

Slide 27 of 47 Q directory P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=0,3Px=1 y=2 Rd(y) Fwd(y) y=2 Inval(x)

Slide 28 of 47 Q P: x:=1 R: if y=2 then x:=3Q: if x=1 then y:=2 Rx=3Px=1 y=2 Inval(x) The result must be x=3, but the result is x=1. The same thing was possible in other machines. (Kourosh Gharachorloo)

Slide 29 of 47 What went wrong? An ordering internal to P … forced an ordering for Q: P: if x=1 then y:=2 Q: if y=2 then x:= 1 P: if x=1 then y:=2 Q: if y=2 then x:= 1 The fix: use internal orderings to forbid orderings, but not to force orderings.

Slide 30 of 47 New Alpha memory model Q: if x=1 then y:=2 R: if y=2 then x:= 3 P: x:=1 There is no dependency/source cycle: R1R2W1WnW2 …

Slide 31 of 47 Obstacle: no single, complete description English documents: 12 documents, 4-inch stack Lisp simulator: crucial to understanding some details None compact, none mathematically tractable Different levels of abstraction, some inconsistency We had to write our own description Step 3: Model complete protocol

Slide 32 of 47 Obstacle: algorithm complexity ChangeToDirty DummyRdVic FailedChangeToDirty Fetch InvalToDirty InvalToDirtyVic Rd RdMod RdVic RdVicMod QV_Fetch QV_Rd QV_RdMod WrVic ChangeToDirtyFailure ChangeToDirtySuccess FetchFillMarker FillMarkerFillMarkerMod ForwardFetch ForwardFetchWithFetchFillMarker ForwardRd ForwardRdMod ForwardRdWithFillMarker ForwardRdModWithFillMarkerMod InvalAck InvalToDirtySuccess Invalidate LoopComsig LoopComsigWithInvalAck LoopComsigWithShadowClear LoopComsigWithShadowInvalAndShadowClear ShadowChangeToDirtySuccess ShadowForwardFetch ShadowForwardRd ShadowForwardRdMod ShadowInvalToDirtySuccess ShadowInvalidate ShadowShortFillMod ShadowSnap ShortFetchFill ShortFill ShortFillMod VictimAck FetchFill Fill FillMod VCFetchFill VCFill VCFillMod

Slide 33 of 47 Solution: Quarks Ack ChangeToDirty Clear Comsig Fill ForwardedGet GetValue InvalidToDirty QuadInvalidate ReleaseMAF ReleaseVDB SetCacheLineState Victimize Write Quarks combine to form messages.

Slide 34 of 47 Quarks form messages, then split up GetValueQuadInval ForwardedGetForwardedGet, QuadInval, Comsig home quad global switch copy holders owner reader

Slide 35 of 47 Quarks resolve message overloading “ChangeToDirtySuccess” could mean –{AckChangeToDirty, Comsig, QuadInvalidate^*, ClearOutstandingInval} –{AckChangeToDirty, Comsig, QuadInvalidate^*} –{Comsig, ReleaseMAF, SetCacheLineState} Quarks simplify algorithm description Each quark processed separately, independently Each data structure changed by a single quark

Slide 36 of 47 Quark handling ProcFieldsMessage(proc, msg) == /\... /\ Cache' = CASE... [] ("Fill" \in msg) /\ (subtype("Fill") # "Fetch") -> [Cache EXCEPT ![proc, cacheIndex].state = IF subtype("Fill") = "Mod" THEN "ExclusiveDirty" ELSE "Clean", ![proc, cacheIndex].tag = AddressToTag(msg.adr), ![proc, cacheIndex].data = msg.data ] If a processor receives a Fill quark carrying cacheable data, then how is the cache is updated?

Slide 37 of 47 Define an invariant describing all reachable states lines Prove invariance. We focused on the most difficult, error-prone parts: cachedtagdirectory messages Wildfire invariant on quad (150 lines) off quad (150 lines)

Slide 38 of 47 2./\ a.\/ (* proc is the owner of adr *) 1./\ Dir[adr].owner = proc b.\/ (* proc is not the owner of adr *)... 2./\ a.\/ (* dtag is dirty *) 1./\ DTagState(adr, proc) = Dirty... b.\/ (* dtag is invalid *)... c.\/ (* dtag is clean *)... Dir - Dtag Invariant DTagCacheInvariant ==... Mother == DirDTagInvariant /\ DTagCacheInvariant /\... DirDTagInvariant == \A adr \in MemBlockAddress, proc \in Processor : a.\/ (* local address *)... b.\/ (* nonlocal address *) 1./\ ProcToQuad(proc) # AddressToQuad(adr) 2./\ Proj(HomeToArbQ) = [ [FG* [QFI] QI* AckWrite] QI* AGV(mod,1) | FG* AckCTD(Success)] FG*

Slide 39 of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* DTagState(proc, adr) = "Invalid" *) 2. CASE b (* DTagState(proc, adr) # "Invalid" *) 3. QED

Slide 40 of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* DTagState(proc, adr) = "Invalid" *) 1. CASE a2a (* AddressCache(proc, adr).state' = "Invalid" *) 2. CASE a2b (* AddressCache(proc, adr).state' # "Invalid" *) 3. QED 2. CASE b (* DTagState(proc, adr) # "Invalid" *) 3. QED

Slide 41 of 47 DTag-Cache Invariance ASSUME: /\ Mother /\ Wildfire /\ DTagCacheInvariant(proc,adr) PROVE: DTagCacheInvariant(proc,adr)' 1. CASE a (* 1./\ DTagState(proc, adr) = "Invalid" *) 1. CASE a2a (* 1. AddressCache(proc, adr).state' = "Invalid" *) CASE doing something at the proc Pf: CASE doing something at the arb 3. QED CASE a2b (* 1. AddressCache(proc, adr).state' # "Invalid" *) 3. QED 2. CASE b (* 1./\ DTagState(proc, adr) # "Invalid" *) 3. QED

Slide 42 of 47 The implementation proof In Step 2, we defined an abstract model of the Wildfire algorithm In Step 3, we defined a complete model of the Wildfire algorithm Now use the invariant to prove that the complete model implements the abstract model. This is undone.

Slide 43 of 47 Results: one bug A fetch is an uncached read. Victimization removes data from the cache. The bug allows a fetch to interfere with victimization. To demonstrate the bug, we need to describe more of the hardware…

Slide 44 of 47 The quad architecture quad proc cache proc P ArbGP dtagdirectorymemoryttt switch to other quads

Slide 45 of 47 Dtag: a duplicate copy of cache state One use: invalidate all copies on a quad. cache P Arb dtag y r/w y P r/w inval(y)

Slide 46 of 47 TTT: tells state of off-quad requests GP ttt cache P y write(y) ackwrite(y)

Slide 47 of 47 The Bug By causing a fetch to interfere with a victimize, generate an Inval(y) to a cache without a copy of y. cache P Arb dtag y r/w inval(y)

Slide 48 of 47 Initial state: P owns y dirmem y: Py dtag y: P ttt gp arb P y QRS

Slide 49 of 47 Now P victimizes y to read x into same cache line dirmem y: Py dtag y: P gp arb P y QRS ttt write(y)ackwrite(y) write(y) ackwrite(y) get(x)

Slide 50 of 47 So P is waiting for x dirmem y:y dtag y: P gp arb PQRS ttt write(y)ackwrite(y) get(x)

Slide 51 of 47 Now R becomes owner of y dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x)

Slide 52 of 47 Now P fetches y while waiting for x dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x) fetch(y) ackfetch(y) fill(y)

Slide 53 of 47 Now P gets its copy of y dirmem y: Ry dtag y: P gp arb PQR owns y S ttt write(y)ackwrite(y) get(x) ackfetch(y) fetch(y)ackfetch(y) fill(y) fwd(y) fill(y)

Slide 54 of 47 Now ackwrite arrives: the bug dirmem y: Ry ttt write(y)ackwrite(y) ackfetch(y) fetch(y)ackfetch(y) ackwrites normally invalidate dtag entries half-completed reads normally inhibit this: the new data has reached the cache, and the dtag entry is for the new data but fetches are not cached, and should not be treated like cached reads dtag y: P

Slide 55 of 47 So P is still waiting for x dtag y: P gp arb PQ ttt get(x) R owns y S dirmem y: Ry

Slide 56 of 47 Now Q reads y dtag y: P,Q gp arb PQ gets y ttt get(x) R owns y S dirmem y: R,Qy

Slide 57 of 47 Now S becomes owner of y dtag y: P,Q gp arb PQ gets y ttt get(x) S owns y R dirmem y: R,Qy inval(y)

Slide 58 of 47 Now we are in trouble… gp arb PQ gets y ttt get(x) inval(y) dtag y: P,Q The inval is forwarded to both P and Q but P doesn’t have a copy to invalidate!

Slide 59 of 47 The bug is obvious in hindsight But our scenario exhibiting the bug –is very long, –uses 4 processors, –uses 2 locations, and –uses 15 messages. Finding this scenario seemed beyond the power of automated tools like model checkers.

Slide 60 of 47 Wildfire conclusion We performed a rigorous analysis. We studied the hardest parts of the algorithm. Designers said their confidence in the algorithm was much improved. We expected to find more errors –Designers knew what they were doing –We joined late in the design cycle: We had been asked to study the protocol Bugs at protocol level had already been found All remaining bugs were at the implementation level

Slide 61 of 47 EV7 cache coherence Joshua Scheid, Homayoon Akhiani, Jonathan Nall Damien Doligez, Scott Kreider, Scott Taylor, Brannon Batson Much simpler protocol, proof actually completed First TLA+ specification written by engineers First intense application of TLC model checker New, interesting uses of spec in simulation

Slide 62 of 47 Results 73 bugs found –Most bugs were ambiguity in design documents 37 minor: typos, type errors, etc 11 bugs: wrong message sent/wrong state set 14 missing cases 7 spurious cases (dead code) –5 bugs were actual implementation bugs 1 found by TLC 4 found by using TLC error traces for RTL simulation…

Slide 63 of 47 Interesting spec applications Translate TLC error traces into RTL stimulus –Force RTL simulator into interesting corner cases Translate random TLC traces: –Better than random stimulus: satisfies TLA+ spec! Translate RTL simulator output to TLA+ –TLC can check that RTL satisfies TLA+ spec –TLC can trace visited states, improve coverage TLA+ specs yield good RTL assertions to check

Slide 64 of 47 Itanium cache coherence Mark Tuttle, Jae Yang Used Intel chips, modeled their external behavior Simply writing spec yielded two design changes Too big for TLC: –Intel chip models allowed too many behaviors –Interesting scenarios required large configurations –Used TLC for simulation, not model checking Most interesting: TLA+ Itanium memory model –With Gil Neiger, Leslie Lamport, Yuan Yu

Slide 65 of 47 Interconnect verification PCI-X: a high-speed extension to PCI bus –Tom Rodeheffer, Mark Tuttle –Found fatal flaws in submissions to standards group Infiniband (FIO/NGIO/SIO) –Mark Tuttle, Jae Yang, George Zhang, …

Slide 66 of 47 Database recovery Dave Lomet, Mark Tuttle data log cache cache managerlog manager disk memory O: x := x+1 xO

Slide 67 of 47 After a crash, only the disk remains data log Recovery manager must reconstruct the database Recovery manager has only the bits left on disk Our theory explains how bits must be managed

Slide 68 of 47 Recovery theory Define an ordering on the database operations Theorem: If cache manager follows order, then state remains recoverable. Theorem: If recovery manager follows order from a recoverable state, then recovery succeeds. The proofs were done and perfect … … but model checking found 3 subtle mistakes!

Slide 69 of 47 Robot rendezvous Maurice Herlihy, Mark Tuttle Robots parachute onto a graph, move around the graph, and rendezvous on a single node. Protocol is complete, all but the “move” function Model checking shows no “move” will work Saved a week of useless search for “move” !

Slide 70 of 47 The dream Model checking white board conversations. –This is where the real design happens. –This is where the problems are encountered This requires an abstract, expressive language TLA+ is a good language TLC is too slow. What can we take from TLA+ … … and still get reasonable performance?

Slide 71 of 47 SAT-based BMC Rajeev Joshi, John Matthews, Mark Tuttle Rajeev Joshi, John Matthews, Mark Tuttle TLA+ is the right language, TLC is too slow Why not use TLA+ –Thought typed language would help (TLA+ untyped) –Wanted to be able to change language (TLA+ hard to parse) –In hindsight may not have been necessary

Slide 72 of 47 Protocol and property Boolean formula x and (y or z) SAT checker Satisfying assignment x = true, y=true, z=false Counterexample trace S0 S1 S2 S3 … S property violated! Model checker SAT-based Model Checking Rajeev Joshi, John Matthews, Mark Tuttle Only nontrivial step

Slide 73 of 47 MLA Language Types: booleans, integer ranges, records, enums, bounded sequences, finite sets, recursive functions Operators: arithmetic, logical, relational Value constructors: lambdas, etc. Expressions: let, case, if, quantification MLA compiler: –16,000 lines of ML (wc) –Function translation only source of trouble

Slide 74 of 47 Itanium program verification Given –Assembly language program (compiler output) –Safety property (an invariant) Is safety violated by an execution of the program allowed by the Itanium memory model ? Thinking of synchronization code (mutex) Examples were Dekker and Bakery algs

Slide 75 of 47 Conclusion Formal methods are up to industrial problems. Formal methods have incremental payoffs: Specification documents design. Model checking finds quick design errors. Proof writing finds deeper design errors. Any partial proof is still a rigorous analysis.

Slide 76 of 47

Slide 77 of 47 EV7 cache coherence Specification is 1800 lines Specification accepted by TLC w/o modification Largest instance feasible to check with TLC: –1 cache line, 2 data values, 3 processors –12 million reachable states (w/ symmetry reductions)

Slide 78 of 47 EV8 cache coherence EV8 designers planned to use a TLA+ spec as the official spec (plus an English explanation) But EV8 was cancelled, designers went to Intel Rumor: Intel has an official spec in TLA+