1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan.

Slides:



Advertisements
Similar presentations
Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.
Advertisements

Promising Directions in Hardware Design Verification Shaz Qadeer Serdar Tasiran Compaq Systems Research Center.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Hierarchical Cache Coherence Protocol Verification One Level at a Time through Assume Guarantee Xiaofang Chen, Yu Yang, Michael Delisi, Ganesh Gopalakrishnan.
Computer Abstractions and Technology
Using Formal Specifications to Monitor and Guide Simulation: Verifying the Cache Coherence Engine of the Alpha Microprocessor Serdar Tasiran Systems.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by Intel.
6/14/991 Symbolic verification of systems with state machines David L. Dill Jeffrey Su Jens Skakkebaek Computer System Laboratory Stanford University.
Presented by: Thabet Kacem Spring Outline Contributions Introduction Proposed Approach Related Work Reconception of ADLs XTEAM Tool Chain Discussion.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Specification and Encoding of Transaction Interaction Properties Divjyot Sethi Yogesh Mahajan Sharad Malik Princeton University Hardware Verification Workshop.
The Design Process Outline Goal Reading Design Domain Design Flow
Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Presenters: Ganesh Gopalakrishnan and Xiaofang Chen School of Computing,
1 A Compositional Approach to Verifying Hierarchical Cache Coherence Protocols Xiaofang Chen 1 Yu Yang 1 Ganesh Gopalakrishnan 1 Ching-Tsun Chou 2 1 University.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Verification of Hierarchical Cache Coherence Protocols for Future Processors Student: Xiaofang Chen Advisor: Ganesh Gopalakrishnan.
Shangri-La: Achieving High Performance from Compiled Network Applications while Enabling Ease of Programming Michael K. Chen, Xiao Feng Li, Ruiqi Lian,
Lock Inference for Systems Software John Regehr Alastair Reid University of Utah March 17, 2003.
1 Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Joint work with Xiaofang Chen (PhD student) Ching-Tsun Chou (Intel Corporation,
1 Ivan Lanese Computer Science Department University of Bologna Italy Concurrent and located synchronizations in π-calculus.
Scaling Formal Methods Toward Hierarchical Protocols in Shared Memory Processors Presenters: Ganesh Gopalakrishnan and Xiaofang Chen School of Computing,
Utah Verifier Group Research Overview Robert Palmer.
Counterexample Guided Invariant Discovery for Parameterized Cache Coherence Verification Sudhindra Pandav Konrad Slind Ganesh Gopalakrishnan.
1 Reducing Verification Complexity of a Multicore Coherence Protocol Using Assume/Guarantee Xiaofang Chen 1, Yu Yang 1, Ganesh Gopalakrishnan 1, Ching-Tsun.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Transaction Based Modeling and Verification of Hardware Protocols Xiaofang Chen, Steven M. German and Ganesh Gopalakrishnan Supported in part by SRC Contract.
Formal verification Marco A. Peña Universitat Politècnica de Catalunya.
Principle of Functional Verification Chapter 1~3 Presenter : Fu-Ching Yang.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Basic Concepts The Unified Modeling Language (UML) SYSC System Analysis and Design.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Advances in Language Design
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
A Simple Method for Extracting Models from Protocol Code David Lie, Andy Chou, Dawson Engler and David Dill Computer Systems Laboratory Stanford University.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Multi-core architectures. Single-core computer Single-core CPU chip.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Multi-Core Architectures
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
1 H ardware D escription L anguages Modeling Digital Systems.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Copyright 2009 Joanne DeGroat, ECE, OSU 1 ECE 762 Theory and Design of Digital Computers, II (A real course title: Design and Specification of Digital.
Verification Driven Formal Architecture and  Architecture Modeling Sharad Malik, Yogesh Mahajan, Carven Chan, Ali Bayazit Princeton University Wei Qin.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
© 2009 IBM Corporation Parallel Programming with X10/APGAS IBM UPC and X10 teams  Through languages –Asynchronous Co-Array Fortran –extension of CAF with.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
Parosh Aziz Abdulla 1, Mohamed Faouzi Atig 1, Zeinab Ganjei 2, Ahmed Rezine 2 and Yunyun Zhu 1 1. Uppsala University, Sweden 2. Linköping University, Sweden.
Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)
Gauss Students’ Views on Multicore Processors Group members: Yu Yang (presenter), Xiaofang Chen, Subodh Sharma, Sarvani Vakkalanka, Anh Vo, Michael DeLisi,
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Xiaofang Chen1 Yu Yang1 Ganesh Gopalakrishnan1 Ching-Tsun Chou2
Synthesis of Speed Independent Circuits Based on Decomposition
Opeoluwa Matthews, Jesse Bingham, Daniel Sorin
Welcome: Intel Multicore Research Conference
Memory Consistency Models
Memory Consistency Models
Yogesh Mahajan, Sharad Malik Princeton University
IP – Based Design Methodology
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Chapter 1 Introduction.
Lecture 25: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Presentation transcript:

1 Scaling Formal Methods toward Hierarchical Protocols in Shared Memory Processors: Annual Review Presentation – April 2007 Presenters: Ganesh Gopalakrishnan Xiaofang Chen School of Computing, University of Utah Salt Lake City, UT Intel SRC Customization Award 2005-TJ-1318

2 Project Personnel  IBM Mentor: Dr. Steven M. German  Intel Mentor: Dr. Ching-Tsun Chou  Primary Student: Xiaofang Chen  Summer internship planned - IBM T.J. Watson (6/07) where the research discussed here in Project 2 will be furthered  Other SRC Student: Robert Palmer (work involving TLA+ modeling of communication libraries)  Defense May 10; Expected to join Intel (6/07)  3 other PhD students, 1 MS student, 2 UGs in FV all working on FV of threading / msg-passing software

3 Multicores are the future! Their caches are visibly central… (photo courtesy of Intel Corporation.) > 80% of chips shipped will be multi-core

4 …and the number of organizations of multiprocessor caches is mindboggling (e.g. imagine 80 cores and deeper hierarchies). Interface L2 Cache+Local Dir L1 Cache L1 Cache Global Dir Main Memory Cluster 2Cluster 1Cluster 3 Interface L2 Cache+Local Dir L1 Cache L1 Cache Interface L2 Cache+Local Dir L1 Cache L1 Cache Shared / Private Inclusive / Exclusive

5 Protocol design happens in “the thick of things” (many interfaces, constraints of performance, power, testability). From “High-throughput coherence control and hardware messaging in Everest,” by Nanda et.al., IBM J.R&D 45(2), 2001.

6 Future Coherence Protocols  Cache coherence protocols that are tuned for the contexts in which they are operating can significantly increase performance and reduce power consumption [Liqun Cheng] Producer-consumer sharing pattern-aware protocol [Cheng, HPCA07]  21% speedup and 15% reduction in network traffic Interconnect-aware coherence protocols [Cheng, ISCA06]  Heterogeneous Interconnect  Improve performance AND reduce power  11% speedup and 22% wire power savings Bottom-line: Protocols are going to get more complex!

7 Designers have poor conceptual tools (e.g., “Informal MSC drawings”). Need better notations and tools. LDir L1-1 GDir Req_S (S) (S: L1-1) L1-2 (I) Swap Broadcast NAck Fwd_Req Gnt_S (S: L1-2)

8 Design Abstractions in More Modern Flows  An Interleaving Protocol Model (Murphi or TLA+ are the languages of choice here) FV here eliminates concurrency bugs  Detailed HDL model FV here eliminates implementation bugs; however  Correspondence with Interleaving Model is lost Need more detailed models anyhow  Interleaving Models are very abstract  Monolithic Verification of HDL Code Does not Scale  Design optimizations captured at HDL level Interleaving model becomes more obsolete  Need an Integrated Flow: Interleaving -> High level HW View -> Final HDL

9 Related Work in Formal HW Design  BlueSpec High level design is expressed using atomic transactions Synthesizes high level designs into hardware implementations  Automatic scheduling of high level design steps in hardware  May not meet performance goals  Malik et.al. Formal Architecture and Microarchitecture Modeling for Verification Meant for Instruction Set Processors  Need Formal theory of Refinement from Interleaving to High level HW Models

10 Our Goals  Develop Methodology to Verify “Realistic” Interleaving Models Useful Benchmarks for others Our particular contributions are towards Hierarchical protocols Largely Inspired by Chou et.al.’s work (FMCAD’04) Xiaofang Chen’s PhD is wrapping up a nice story here!  Develop Language and Formal Theory for Higher Level HW Specification & Refinement Ideas largely due to German & Janssen Xiaofang Chen’s PhD work is taking ideas from initial proposal all the way to practical realization!

11 A summary of our work over Y1-2 1.Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level 1.A/G method of complementary abstractions (FMCAD’06) 2.Extensions to Non-inclusive hierarchies (TR ) 3.Abstract each level separately (to be submitted) 4.Error-trace checking (to be submitted) 2.A theory of transaction based design and verification (writeup finished; initial experiments finished) 3.Modular verification of transactions (writeup in progress; initial experiments finished) Number the projects 1.1, 1.2, 1.3, 1.4, 2, and 3

12 Project 1.[1-4] Timeline 1.1: FMCAD’06 results 1.2: Another hierarchical benchmark (non-inclusive) 1.3: Abstraction per level (more scalable) 1.4: Automatic Recognition of spurious/real bugs

13 1.[1-4]: Hierarchical Protocols RAC L2 Cache+Local Dir L1 Cache L1 Cache Global Dir Main Memory Home ClusterRemote Cluster 1Remote Cluster 2 RAC L2 Cache+Local Dir L1 Cache L1 Cache RAC L2 Cache+Local Dir L1 Cache L1 Cache

14 Abstracted Protocol #1 RAC L2 Cache+Local Dir’ Global Dir Main Memory Home Cluster Remote Cluster 1Remote Cluster 2 RAC L2 Cache+Local Dir L1 Cache L1 Cache RAC L2 Cache+Local Dir’

15 Abstracted Protocol #2 RAC L2 Cache+Local Dir’ Global Dir Main Memory Home Cluster Remote Cluster 1 Remote Cluster 2 RAC L2 Cache+Local Dir L1 Cache L1 Cache RAC L2 Cache+Local Dir’

16 Non-Circular Assume/Guarantee  We can’t verify this due to state explosion: h ║ r1 ║ r2 ╞ Coh  Instead Check-1: h ║ R1 ║ R2 ╞ Coh1 Λ Guarant1 Check-2: H ║ r1 ║ R2 ╞ Coh2 Λ Guarant2

17  Protocol features Broadcast channels Non-imprecise local dir  Verification challenges A/G cannot infer local dir from just intra- clusters Coherence may involve multiple L1 caches 1.2: We applied the non-circular A/G method to a Non-Inclusive Hierarchical Protocol….

18 Verifying Non-Inclusive Protocols  Inferring “L2.State = Excl” from Outside the cluster Inside the cluster  Use history variables to change non- inclusive to inclusive protocols

19 Experimental Results Protocols# of StatesMem (GB) Model Check Hierarchy> 1,521,900,00020No Abs-1234,478,10520Y Abs-2283,124,38320Y Reduction is over 65%

20 1.3: We then tried a “Split Hierarchy Per Level Approach” to using non-circular A/G RAC L2 Cache+Local Dir’ Global Dir Main Memory RAC L2 Cache+Local Dir’ RAC L2 Cache+Local Dir’ L2 Cache+Local Dir L1 Cache L1 Cache ABS #1 L2 Cache+Local Dir L1 Cache L1 Cache ABS #2 ABS #3

21 A Sample Scenario Home ClusterRemote Cluster 1 Remote Cluster 2 1. Req_Ex 2. Fwd Req_Ex 3. Fwd Req_Ex 4. Fwd Req_Ex5. Grant 6. Grant Excl Invld

22 Map to Abstracted Protocols Remote Cluster 1Remote Cluster 2 2. Fwd Req_Ex 3. Fwd Req_Ex 5. Grant 6. Grant 1. Req_Ex 4. Fwd Req_Ex Invld Excl

23 Experimental Results Protocols# of States Exec time (sec) Mem (GB) Model Check Hierarchy> 438,120,000>125,79918No Inter1,500, Y Intra-1564,878482Y Intra-2188, Y Reduction is over 95% !

24 Project 1.4: Automatic Recognition of Spurious / Real Bugs in these approaches  Problem statement Given an error trace of ABS protocol Is it a real bug of the original protocol?  Solution In the original protocol, using BFS to guide the model checking to match the error trace Reason: because our abstraction is just projection

25 Basic Idea of Automatic Recognition v1=0, v2=0 v1=1, v2=2 v1=6, v2=8 …… v1=3, v2=1, v3=0 v1=0, v2=0, v3=0 v1=1, v2=2, v3=1 v1=0, v2=0, v3=3 keep drop …… Error trace of Abs. protocol Directed BFS of original protocol

26 Y3 Plans for Project 1:  Considerable Experience Gained  Three Large Benchmark Protocols (each is lines of Murphi Code) on the web  Have Reduced Verif Complexity of Hier Protocols by 90%  Can Identify Spurious Errors Automatically  All Finite-state Not Parameterized No plans for Parameterized  Y3 Plans: Build Tool to support this methodology

27 Summary of Projects 2 and 3 1.Three progressively better approaches to verify hierarchical cache coherence protocols at the interleaving level 1.A/G method of complementary abstractions (FMCAD’06) 2.Extensions to deeper, and non-inclusive hierarchies (TR ) 3.Latest method that abstracts each level separately (to be submitted) 4.Error-trace checking (to be submitted) 2.A theory of transaction based design and verification (writeup finished) 3.Modular verification of transactions (writeup in progress)

28 Transaction Level HW Modeling The problem addressed: Bridge the gap between high-level specifications and RTL implementations Global properties cannot be formally verified at RTL Level! Specifications can be verified, but do they correctly represent the implementations?

29 Driving Design Benchmark due to German and Geert Janssen

30 What changes when moving from a spec to an implementation?  Atomicity  Concurrency  Granularity in modeling client home client routerbuffer home

31 General Mappings between high level transitions and transactions that help implement them High Level Transition 1 Low Level Transitions that help realize High Level Transitions take some non-zero unit of time (conceptual) Each Low Level Transition takes One Clock Cycle

32 High-Level and Low-Level Computations

33 Specification of High and Low Levels In Murphi as a Guard  Action Rule In HMurphi as Multiple Guard  Action Rules enclosed in a Begin Transaction / End Transaction The Guards Decide when each low level transition can fire The Maximal Number of Low Level Transitions Enabled in any state are concurrently fired within each clock tick

34 Transaction  A transaction is a set of transitions in Impl that correspond to a transition in Spec Transaction Rule 1 …… Rule n Endtransaction;

35 Executions  Spec: interleaving One enabled transition fires at each step  Impl: concurrent All enabled transitions fire at each step …… …… {1.1, 2.1} {1.2} {2.2, 3.1, 3.2}

36 A Few Notations  Observable variables: V H These are Variables used in both Spec and Impl Impl has additional internal variables also  A variable v is inactive at a state s if all transactions in Impl that can write to v are quiescent at s

37 A Formal Notion of Simulation  For every concurrent execution of Impl, exists an interleaving execution of Spec, V H ∩ inactive(l i ) match …… {…} {…} {…} l0l0 l1l1 l2l2 …… t0 t1 t2 h0h0 h1h1 h2h2

38 Simulation Checks Spec( I ) I Spec( I ’) Spec transition Impl transaction I’ Guard for Spec transition must hold I is a reachable state where the commit guard is true Observable vars changed by either Spec or Impl must match

39 Model Checking Approaches  Monolithic Cross product construction  Compositional Abstraction Assume/Guarantee

40 Compositional Approach  Abstraction Change read to an access of an input var Self-sourced read Add all transitions that write to a var  Assume/Guarantee Require all writes to var guarantee prop P Assume P holds on all reads

41 Example of Abstraction Transaction … Rule (v1 = d1) =>... … Endtransaction Transaction 1 Transaction 2 Transaction n ……

42 Example of Assume/Guarantee … Transaction 1: Request granted Transaction 2: Update Cache State := Excl Data := d Impl.State = Spec.State

43 Benchmarks  High level in FMCAD’04 tutorial  Low level provided by German and Janssen  Sizes: 1 Home node, 1 remote node Sizes are constrained by accessible VHDL tools!

44 Implementations  Muv: HMurphi  VHDL Written by German  Mud: Static analyzer for possible conflicts / dependencies  VHDL verifier IBM RuleBase

45 Preliminary Results Approaches# Flip- Flops # Gates Time (min) Monolithic Decomposed W/W conflicts closures * This is for datapath = 1 bit * Intel Xeon CPU 3.0GHz, 2GB memory

46 When Datapath > 1 bit  Cannot check monolithic approach RuleBase 300 F-F academic license restriction  Decomposed approach W/W checks not affected Datapath bits# of F-F# of Gates

47 Future Work  Reduce the cost of W/W conflicts checking Localized reasoning  Apply to pipeline  More benchmarks  Try other VHDL tools SixthSense etc.

48 Publications, Software, Models  FMCAD 2006 paper  Presentation at Intel  Journal version of hierarchical coherence protocol verification (under prep)  TR on Theory of Transaction Based Specification and Verification (under prep)  Detailed VHDL-level German Protocol developed  Analysis Framework for HMurphi Developed  Preliminary Verification Experiments using Cadence IFV, IBM RuleBase, and IBM SixthSense  Xiaofang Chen’s Summer Internship at IBM T.J. Watson Res. Ctr.  Robert’s SRC Poster  Techcon 2007 submission  There will be more publications during following hiatus due to infrastructure build-up (many delays!)