Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.
Global States.
Global Environment Model. MUTUAL EXCLUSION PROBLEM The operations used by processes to access to common resources (critical sections) must be mutually.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Chapter 3 The Critical Section Problem
Multiple Processor Systems
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Threading Part 2 CS221 – 4/22/09. Where We Left Off Simple Threads Program: – Start a worker thread from the Main thread – Worker thread prints messages.
Continuously Recording Program Execution for Deterministic Replay Debugging.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Computer Architecture 2011 – coherency & consistency (lec 7) 1 Computer Architecture Memory Coherency & Consistency By Dan Tsafrir, 11/4/2011 Presentation.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Nov 14, 2005 Topic: Cache Coherence.
CS533 - Concepts of Operating Systems
Interprocessor arbitration
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Adopted from and based on Textbook: Operating System Concepts – 8th Edition, by Silberschatz, Galvin and Gagne Updated and Modified by Dr. Abdullah Basuhail,
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.
Solution to Dining Philosophers. Each philosopher I invokes the operations pickup() and putdown() in the following sequence: dp.pickup(i) EAT dp.putdown(i)
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Introduction to Concurrency.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Games Development 2 Concurrent Programming CO3301 Week 9.
A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories Also known as “Snoopy cache” Paper by: Mark S. Papamarcos and Janak H.
Ch 10 Shared memory via message passing Problems –Explicit user action needed –Address spaces are distinct –Small Granularity of Transfer Distributed Shared.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Ronny Krashinsky Erik Machnicki Software Cache Coherent Shared Memory under Split-C.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.
Agenda  Quick Review  Finish Introduction  Java Threads.
תרגול מס' 5: MESI Protocol
Background on the need for Synchronization
Lecture 21 Synchronization
Cache Coherence in Shared Memory Multiprocessors
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Atomic Operations in Hardware
12.4 Memory Organization in Multiprocessor Systems
Reactive Synchronization Algorithms for Multiprocessors
Multiprocessor Cache Coherency
Ivy Eva Wu.
Module IV Memory Organization.
Distributed Shared Memory
Lecture 25: Multiprocessors
High Performance Computing
Kernel Synchronization II
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago

Goal To use Spin to design a plausible cache coherence protocol –Introduce nothing in the Spin model that would not be realistic in hardware (e.g. instant global knowledge between unrelated state machines) To verify the correctness of the protocol

Background Definition: Cache = Small, high-speed memory that is used by a single processor. All processor memory accesses are via the cache. Problem: –In a multiprocessor system, each processor could have a cache. –Each cache could contain (potentially different) data for the same addresses. –Given this, how to ensure that processors see a consistent picture of memory?

Coherence protocol A Coherence protocol specifies how caches communicate with processors and each other so that processors will have a predictable view of memory. Caches that always provide this predictable view of memory are said to be coherent.

A Definition of Coherence A view of memory is coherent if the following property holds: –Given cacheline A, two processors may not see storage accesses to A in a conflicting order. –Example: –Processor 0 Processor 1 Processor 2 Processor 3 Store A, 0 Load A, 0 Load A, 0 Load A, 1 Store A, 1 Load A, 1 Load A, 0 Load A, 0 Coherent Coherent ** NOT Coherent Informally, a processor may not see old data after seeing new data.

Standard Coherence Protocol MESI (Modified, Exclusive, Shared, Invalid) –Standard protocol that is supposed to guarantee cache coherence Each block in the cacheline is marked with one of these states. Cacheline accesses are only allowed if the cache states are correct w.r.t the coherence protocol Examples: –A cache that is marked invalid may not provide data to a processor. –Cacheline data may not be updated unless the line is in the Exclusive or Modified

System Model Initial version Three state machines –ProcessorModel: Non-deterministically issues Loads and Stores to cache forever –CacheModel: Two parts - initially combined into a single process MainCache - Services processor requests. Snooper - Responds to messages from memory controller –MemoryController - Services requests from each cache and maintains coherency among all

MemoryController Processor MainCache Snooper Processor MainCache Snooper System Model

ProcessorModel Simple Continually issues Load/Store requests to associated Cache. –Communication done via Bus Model. –Read requests are blocking Coherence verification done when Load receives data (via Spin assert statement)

CacheModel Two parts: MainCache and Snooper –MainCache services ProcessorModel Load and Store requests and initiates contact with the MemoryController when an invalid cache state is encountered –Snooper services independent request from MemoryController. Requests necessary for MemoryController to coordinate coherence responses.

MemoryControllerModel Responsible for servicing Cache requests 3 Types of requests –Data request: Cache requires up-to-date data to supply to processor –Permission-to-store: A Cache may not transition to the Modified state w/o MCs permission –A combination of these two All types of requests may require MC to communicate with all system caches (via Snooper processes) to ensure coherence

Implementation of Busses All processes represent independent state machines. Need communication mechanism Use Spin depth 1 queues to simulate communication. Destructive/Blocking read of queues requires global bool to indicate bus activity (required for polling). –Global between processes valid to make up for differences between Spin queues and real busses

Problems - Part 1 MainCache and Snooper initially implemented as a single process. Process nondeterministically determines which to execute at each iteration Communication between Processor/Cache and Cache/Memory done with blocking queues Blocked receive in MainCache --> Snooper cannot execute Leads to deadlock in certain situations

Solution 1 Split MainCache and Snooper into separate processes. Both can access global cacheData and cacheState variables independently

--> Problems - Part2 As separate processes, Snooper and MainCache could change cache state unpredictably. Race conditions: Snooper changes cache state/data while MainCache is in mid- transaction --> returns invalidated data to processor.

Solution 2 Add locking mechanism to cache. –MainCache or Snooper may only access cache if they first lock it. Locking mechanism: For simplicity, cheated by using Spins atomic keyword to implement test-set on a shared variable. Assumption: Real hardware would have some similar mechanism available to lock caches. Question: Revised model now equivalent to original??

--> Problem 3 Memory controller allows multiple outstanding requests from caches. Snooper of cache which has a MainCache request outstanding cannot respond to MC queries for other outstanding requests (due to locked cacheline). Deadlock.

Solution 3 Disallow multiple outstanding Cache/MC transactions. Introduce global bool variable shared across all caches: outstandingBusOp. A cache may only issue requests to the memory controller if no requests from other caches outstanding. Global knowledge across all caches unrealistic. Equivalent to retries from MC??

--> Problem 4 Previous problems failed in Spin simulation within 1000 steps. Given last solution, random simulation failures vanish in first 3000 steps. Verification fails after ~20000 steps Cause of problem as yet unresolved

Verification How to verify coherence generally?? Verify something stronger: A processor will never see conflicting ordering of data if it always sees the newest data available in the system. For all loads, assert that data is new

Modeling of Data Concern that modeling data as random integer would cause Spin to run out of memory Model data as a bit with values OLD and NEW. All processor Stores store NEW data. When transitioning to a Modified state, a cache will change all other values of data in memory and other caches to OLD –Global access to data here strictly a part of verification effort, not algorithm. Thus allowed.

Debugging Found debugging parallel processes difficult. Made much easier by Spins message sequence diagrams –Graphically shows sends and receives of all messages. –Requires use of Spin queues rather than globals for interprocess communication

Future work Make existing protocol completely bug free Activate additional features disabled for debugging purposes (e.g. bus transaction types) Verify protocol specific rules –No two caches may be simultaneously Modified –Cache Modified or Exclusive --> no other cache is Shared