Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan.

Similar presentations


Presentation on theme: "1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan."— Presentation transcript:

1 1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan Xiaofang Chen School of Computing, University of Utah Salt Lake City, UT Intel SRC Customization Award 2005-TJ-1318

2 2 Project Personnel  IBM Mentor: Dr. Steven M. German  Intel Mentor: Dr. Ching-Tsun Chou  Primary Student: Xiaofang Chen  Summer internship planned - IBM T.J. Watson (6/07) where the research discussed here in Project 2 will be furthered  Other SRC Student: Robert Palmer (work involving TLA+ modeling of communication libraries)  Defense May 10; Expected to join Intel (6/07)  3 other PhD students, 1 MS student, 2 UGs in FV all working on FV of threading / msg-passing software

3 3 Multicores are the future! Their caches are visibly central… (photo courtesy of Intel Corporation.) > 80% of chips shipped will be multi-core

4 4 …and the number of organizations of multiprocessor caches is mindboggling (e.g. imagine 80 cores and deeper hierarchies). Interface L2 Cache+Local Dir L1 Cache L1 Cache Global Dir Main Memory Cluster 2Cluster 1Cluster 3 Interface L2 Cache+Local Dir L1 Cache L1 Cache Interface L2 Cache+Local Dir L1 Cache L1 Cache Shared / Private Inclusive / Exclusive

5 5 Protocol design happens in “the thick of things” (many interfaces, constraints of performance, power, testability). From “High-throughput coherence control and hardware messaging in Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.

6 6 Future Coherence Protocols  Cache coherence protocols that are tuned for the contexts in which they are operating can significantly increase performance and reduce power consumption [Liqun Cheng] Producer-consumer sharing pattern-aware protocol [Cheng, HPCA07]  21% speedup and 15% reduction in network traffic Interconnect-aware coherence protocols [Cheng, ISCA06]  Heterogeneous Interconnect  Improve performance AND reduce power  11% speedup and 22% wire power savings Bottom-line: Protocols are going to get more complex!

7 7 Designers have poor conceptual tools (e.g., “Informal MSC drawings”). Need better notations and tools. LDir L1-1 GDir Req_S (S) (S: L1-1) L1-2 (I) Swap Broadcast NAck Fwd_Req Gnt_S (S: L1-2)

8 8 Design Abstractions in More Modern Flows  An Interleaving Protocol Model (Murphi or TLA+ are the languages of choice here) FV here eliminates concurrency bugs  Detailed HDL model FV here eliminates implementation bugs; however  Correspondence with Interleaving Model is lost Need more detailed models anyhow  Interleaving Models are very abstract  Monolithic Verification of HDL Code Does not Scale  Design optimizations captured at HDL level Interleaving model becomes more obsolete  Need an Integrated Flow: Interleaving -> High level HW View -> Final HDL

9 9 Related Work in Formal HW Design  BlueSpec High level design is expressed using atomic transactions Synthesizes high level designs into hardware implementations  Automatic scheduling of high level design steps in hardware  May not meet performance goals  Malik et.al. Formal Architecture and Microarchitecture Modeling for Verification Meant for Instruction Set Processors  Need Formal theory of Refinement from Interleaving to High level HW Models

10 10 Our Goals  Develop Methodology to Verify “Realistic” Interleaving Models Useful Benchmarks for others Our particular contributions are towards Hierarchical protocols Largely Inspired by Chou et.al.’s work (FMCAD’04) Xiaofang Chen’s PhD is wrapping up a nice story here!  Develop Language and Formal Theory for Higher Level HW Specification & Refinement Ideas largely due to German & Janssen Xiaofang Chen’s PhD work is taking ideas from initial proposal all the way to practical realization!

11 11 A summary of our work over Y1-2 1.Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level 1.A/G method of complementary abstractions (FMCAD’06) 2.Extensions to Non-inclusive hierarchies (TR 06-014) 3.Abstract each level separately (to be submitted) 4.Error-trace checking (to be submitted) 2.A theory of transaction based design and verification (writeup finished; initial experiments finished) 3.Modular verification of transactions (writeup in progress; initial experiments finished) Number the projects 1.1, 1.2, 1.3, 1.4, 2, and 3

12 12 Project 1.[1-4] Timeline 1.1: FMCAD’06 results 1.2: Another hierarchical benchmark (non-inclusive) 1.3: Abstraction per level (more scalable) 1.4: Automatic Recognition of spurious/real bugs

13 13 1.[1-4]: Hierarchical Protocols RAC L2 Cache+Local Dir L1 Cache L1 Cache Global Dir Main Memory Home ClusterRemote Cluster 1Remote Cluster 2 RAC L2 Cache+Local Dir L1 Cache L1 Cache RAC L2 Cache+Local Dir L1 Cache L1 Cache

14 14 Abstracted Protocol #1 RAC L2 Cache+Local Dir’ Global Dir Main Memory Home Cluster Remote Cluster 1Remote Cluster 2 RAC L2 Cache+Local Dir L1 Cache L1 Cache RAC L2 Cache+Local Dir’

15 15 Abstracted Protocol #2 RAC L2 Cache+Local Dir’ Global Dir Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 RAC L2 Cache+Local Dir L1 Cache L1 Cache RAC L2 Cache+Local Dir’

16 16 Non-Circular Assume/Guarantee  We can’t verify this due to state explosion: h ║ r1 ║ r2 ╞ Coh  Instead Check-1: h ║ R1 ║ R2 ╞ Coh1 Λ Guarant1 Check-2: H ║ r1 ║ R2 ╞ Coh2 Λ Guarant2

17 17  Protocol features Broadcast channels Non-imprecise local dir  Verification challenges A/G cannot infer local dir from just intra- clusters Coherence may involve multiple L1 caches 1.2: We applied the non-circular A/G method to a Non-Inclusive Hierarchical Protocol….

18 18 Verifying Non-Inclusive Protocols  Inferring “L2.State = Excl” from Outside the cluster Inside the cluster  Use history variables to change non- inclusive to inclusive protocols

19 19 Experimental Results Protocols# of StatesMem (GB) Model Check Hierarchy> 1,521,900,00020No Abs-1234,478,10520Y Abs-2283,124,38320Y Reduction is over 65%

20 20 1.3: We then tried a “Split Hierarchy Per Level Approach” to using non-circular A/G RAC L2 Cache+Local Dir’ Global Dir Main Memory RAC L2 Cache+Local Dir’ RAC L2 Cache+Local Dir’ L2 Cache+Local Dir L1 Cache L1 Cache ABS #1 L2 Cache+Local Dir L1 Cache L1 Cache ABS #2 ABS #3

21 21 A Sample Scenario Home ClusterRemote Cluster 1 Remote Cluster 2 1. Req_Ex 2. Fwd Req_Ex 3. Fwd Req_Ex 4. Fwd Req_Ex5. Grant 6. Grant Excl Invld

22 22 Map to Abstracted Protocols Remote Cluster 1Remote Cluster 2 2. Fwd Req_Ex 3. Fwd Req_Ex 5. Grant 6. Grant 1. Req_Ex 4. Fwd Req_Ex Invld Excl

23 23 Experimental Results Protocols# of States Exec time (sec) Mem (GB) Model Check Hierarchy> 438,120,000>125,79918No Inter1,500,6212692Y Intra-1564,878482Y Intra-2188,842 18 2Y Reduction is over 95% !

24 24 Project 1.4: Automatic Recognition of Spurious / Real Bugs in these approaches  Problem statement Given an error trace of ABS protocol Is it a real bug of the original protocol?  Solution In the original protocol, using BFS to guide the model checking to match the error trace Reason: because our abstraction is just projection

25 25 Basic Idea of Automatic Recognition v1=0, v2=0 v1=1, v2=2 v1=6, v2=8 …… v1=3, v2=1, v3=0 v1=0, v2=0, v3=0 v1=1, v2=2, v3=1 v1=0, v2=0, v3=3 keep drop …… Error trace of Abs. protocol Directed BFS of original protocol

26 26 Y3 Plans for Project 1:  Considerable Experience Gained  Three Large Benchmark Protocols (each is 3000+ lines of Murphi Code) on the web  Have Reduced Verif Complexity of Hier Protocols by 90%  Can Identify Spurious Errors Automatically  All Finite-state Not Parameterized No plans for Parameterized  Y3 Plans: Build Tool to support this methodology

27 27 Summary of Projects 2 and 3 1.Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level 1.A/G method of complementary abstractions (FMCAD’06) 2.Extensions to deeper, and non-inclusive hierarchies (TR 06-014) 3.Latest method that abstracts each level separately (to be submitted) 4.Error-trace checking (to be submitted) 2.A theory of transaction based design and verification (writeup finished) 3.Modular verification of transactions (writeup in progress)

28 28 Transaction Level HW Modeling The problem addressed: Bridge the gap between high-level specifications and RTL implementations Global properties cannot be formally verified at RTL Level! Specifications can be verified, but do they correctly represent the implementations?

29 29 Driving Design Benchmark due to German and Geert Janssen

30 30 What changes when moving from a spec to an implementation?  Atomicity  Concurrency  Granularity in modeling 1 1.1 1.2 1.3 client home client routerbuffer home

31 31 General Mappings between high level transitions and transactions that help implement them High Level Transition 1 Low Level Transitions that help realize 1 1 1.1 1.2 1.3 High Level Transitions take some non-zero unit of time (conceptual) Each Low Level Transition takes One Clock Cycle

32 32 High-Level and Low-Level Computations 1 1.1 1.2 1.3 23 2.12.23.1 3.2 3.3

33 33 Specification of High and Low Levels 1 1.1 1.2 1.3 In Murphi as a Guard  Action Rule In HMurphi as Multiple Guard  Action Rules enclosed in a Begin Transaction / End Transaction The Guards Decide when each low level transition can fire The Maximal Number of Low Level Transitions Enabled in any state are concurrently fired within each clock tick

34 34 Transaction  A transaction is a set of transitions in Impl that correspond to a transition in Spec Transaction Rule 1 …… Rule n Endtransaction;

35 35 Executions  Spec: interleaving One enabled transition fires at each step  Impl: concurrent All enabled transitions fire at each step …… 1 2 3 …… {1.1, 2.1} {1.2} {2.2, 3.1, 3.2}

36 36 A Few Notations  Observable variables: V H These are Variables used in both Spec and Impl Impl has additional internal variables also  A variable v is inactive at a state s if all transactions in Impl that can write to v are quiescent at s

37 37 A Formal Notion of Simulation  For every concurrent execution of Impl, exists an interleaving execution of Spec, V H ∩ inactive(l i ) match …… {…} {…} {…} l0l0 l1l1 l2l2 …… t0 t1 t2 h0h0 h1h1 h2h2

38 38 Simulation Checks Spec( I ) I Spec( I ’) Spec transition Impl transaction I’ Guard for Spec transition must hold I is a reachable state where the commit guard is true Observable vars changed by either Spec or Impl must match

39 39 Model Checking Approaches  Monolithic Cross product construction  Compositional Abstraction Assume/Guarantee

40 40 Compositional Approach  Abstraction Change read to an access of an input var Self-sourced read Add all transitions that write to a var  Assume/Guarantee Require all writes to var guarantee prop P Assume P holds on all reads

41 41 Example of Abstraction Transaction … Rule (v1 = d1) =>... … Endtransaction Transaction 1 Transaction 2 Transaction n ……

42 42 Example of Assume/Guarantee … Transaction 1: Request granted Transaction 2: Update Cache State := Excl Data := d Impl.State = Spec.State

43 43 Benchmarks  High level in FMCAD’04 tutorial  Low level provided by German and Janssen  Sizes: 1 Home node, 1 remote node Sizes are constrained by accessible VHDL tools!

44 44 Implementations  Muv: HMurphi  VHDL Written by German  Mud: Static analyzer for possible conflicts / dependencies  VHDL verifier IBM RuleBase

45 45 Preliminary Results Approaches# Flip- Flops # Gates Time (min) Monolithic212857417 Decomposed W/W conflicts 108576311 closures8921943 * This is for datapath = 1 bit * Intel Xeon CPU 3.0GHz, 2GB memory

46 46 When Datapath > 1 bit  Cannot check monolithic approach RuleBase 300 F-F academic license restriction  Decomposed approach W/W checks not affected Datapath bits# of F-F# of Gates 1892194 2972380 262896659

47 47 Future Work  Reduce the cost of W/W conflicts checking Localized reasoning  Apply to pipeline  More benchmarks  Try other VHDL tools SixthSense etc.

48 48 Publications, Software, Models  FMCAD 2006 paper  Presentation at Intel  Journal version of hierarchical coherence protocol verification (under prep)  TR on Theory of Transaction Based Specification and Verification (under prep)  Detailed VHDL-level German Protocol developed  Analysis Framework for HMurphi Developed  Preliminary Verification Experiments using Cadence IFV, IBM RuleBase, and IBM SixthSense  Xiaofang Chen’s Summer Internship at IBM T.J. Watson Res. Ctr.  Robert’s SRC Poster  Techcon 2007 submission  There will be more publications during 2007-8 following hiatus due to infrastructure build-up (many delays!)


Download ppt "1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan."

Similar presentations


Ads by Google