Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.
L.N. Bhuyan Adapted from Patterson’s slides
The University of Adelaide, School of Computer Science
1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
Cache Optimization Summary
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
The University of Adelaide, School of Computer Science
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
1 Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.
1 Lecture 3: Snooping Protocols Topics: snooping-based cache coherence implementations.
CS252/Patterson Lec /28/01 CS 213 Lecture 9: Multiprocessor: Directory Protocol.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Logical Protocol to Physical Design
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Multiprocessor Cache Coherency
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
SafetyNet Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
1 Lecture 24: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA’03, Wisconsin A Low Overhead Fault Tolerant Coherence.
ECE 1747: Parallel Programming Basics of Parallel Architectures: Shared-Memory Machines.
RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
1 Lecture 7: PCM Wrap-Up, Cache coherence Topics: handling PCM errors and writes, cache coherence intro.
Performance of Snooping Protocols Kay Jr-Hui Jeng.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
1 Lecture 8: Snooping and Directory Protocols Topics: 4/5-state snooping protocols, split-transaction implementation details, directory implementations.
COSC6385 Advanced Computer Architecture
The University of Adelaide, School of Computer Science
CS 704 Advanced Computer Architecture
Multiprocessor Cache Coherency
Jason F. Cantin, Mikko H. Lipasti, and James E. Smith
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
Lecture 2: Snooping-Based Coherence
Chip-Multiprocessor.
11 – Snooping Cache and Directory Based Multiprocessors
/ Computer Architecture and Design
Lecture 25: Multiprocessors
Lecture 10: Consistency Models
High Performance Computing
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 24: Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Presentation transcript:

Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith

6/30/2001Workshop on Memory Performance Issues Introduction Multiprocessors are used for a variety commercial and mission-critical tasks Reliability is a growing concern Coherence is a fundamental feature of shared-memory MPs High design complexity Relatively low interconnect reliability

6/30/2001Workshop on Memory Performance Issues Introduction: Cache Coherence Protocols Notoriously difficult to design and verify Often conceptually simple, but with complex implementations for efficiency and handling special cases Multiple finite state machines operating concurrently

6/30/2001Workshop on Memory Performance Issues Introduction: A Simple Example MSI Protocol “Architected” State Invalid / Not Present Shared (readable) Modified (read/write) IS M Bus_RdX, Replace Read Write Bus_RdX, Replace Write Bus_Rd

6/30/2001Workshop on Memory Performance Issues M S Pend Rd I I Pend WB M Pend RdX S Pend RdX I Bus_RdX, Replace Read Bus_Av Replace Bus_Av Bus_Rd Write Bus_RdX Write Bus_RdX Introduction: Simple Example with a Bus MSI Protocol “Implementation” State Transient states for pending operations Arcs to satisfy requests while operations pending

6/30/2001Workshop on Memory Performance Issues Problem In practice, implementations can have dozens of states Atomic memory operations Split transaction buses Protocol optimizations Complexity grows exponentially with added states Random testing: Low Coverage Exhaustive testing: Too time consuming

6/30/2001Workshop on Memory Performance Issues Dynamic Verification Check the implementation at runtime It is easier to check a computation than to do the actual computation, provided there is a delay between the computation and the check (Rotenberg, AR-SMT) Simplified version of a processor implementation can be used for online verification (Austin, DIVA)

6/30/2001Workshop on Memory Performance Issues Dynamic Verification of Cache Coherence A distributed form of dynamic verification for multiprocessor memory systems Simplified version of protocol added to each node Maintains architected state Check completed transitions and actions against simple protocol Additional messages (assertions) sent between nodes to ensure coherence

6/30/2001Workshop on Memory Performance Issues Conceptual View for Superscalar Processors (DIVA) Single, centralized check processor Receives instructions serially in program order from implementation Physical registers Complex Execution Processor Check Processor Arch. registers Prediction Tables Committed results R.O.B. Arch. registers

6/30/2001Workshop on Memory Performance Issues Conceptual View for Coherence Distributed checking hardware Transitions received in parallel, in completion order Shared Logical Bus Shared Validation Bus Implementation Protocol Simple Protocol Completed Transitions

6/30/2001Workshop on Memory Performance Issues High Level Organization Cache Controller P Shared logical bus (addresses, data, control) Memory DV-CC Checker Validation bus (assertions to be checked)

6/30/2001Workshop on Memory Performance Issues Benefits Detects hardware faults Redundant computation Including intermittent network failures Detects design mistakes Checker is simple and easy to verify

6/30/2001Workshop on Memory Performance Issues Drawbacks Time is required for checking, but… May be overlapped with other activities Simple protocol requires fewer transitions Assertions consume bandwidth May need second bus / network Additional hardware But not much

6/30/2001Workshop on Memory Performance Issues DV for coherence in an SMP Architected state stored in a second tag array Transactions sent to the checker when architected state changes Address Initial State and Final States Input (Request, Snoop Responses, etc) Action (Send Data, Respond Shared, etc)

6/30/2001Workshop on Memory Performance Issues DV for coherence in an SMP (2) Checker compares the initial state of a transition against the architected state Final state and action recomputed and compared to implementation’s result Assertions broadcast to other nodes to check coherence and confirm completion of transactions Watchdog timer detects deadlock, livelock, and other omission failures

6/30/2001Workshop on Memory Performance Issues Init. stateFinal stateInputActionAddress Next State Logic =? Arch. Tag State Action Logic =? Error Detection / Diagnosis Transition From Implementation Protocol Update Tags Validation Bus Asst Send Buffer Checking a State Transition Watchdog timer

6/30/2001Workshop on Memory Performance Issues Arch. Tag State Error Detection / Diagnosis Watchdog timer Validation Bus Assert Recv Buffer OK Address Remote State Checking an Assertion

6/30/2001Workshop on Memory Performance Issues When to Broadcast Assertions For MSI: 1. I  S(readable copy loaded) 2. I  M(writeable copy loaded) 3. S  M(upgrade) 4. M  I(writeback) Note: The M  S transition results from remote reads, and doesn’t require an extra assertion. Replacements (S  I) are not considered here.

6/30/2001Workshop on Memory Performance Issues Preliminary Data (4-way SMP) Most memory references do not change cache state (checker need not have high bandwidth)

6/30/2001Workshop on Memory Performance Issues Preliminary Data (4-way SMP)

6/30/2001Workshop on Memory Performance Issues Future Work Performance impact for a real SMP protocol implementation In progress Directory-based protocols Dynamically verifying memory models Recovery Can stall to avoid error propagation Can write checkpoints periodically

6/30/2001Workshop on Memory Performance Issues In Summary Dynamic verification can be applied to multiprocessor systems (in a distributed manner) Improves fault-tolerance, and design verification may be relaxed More to come