Dynamic Verification of Sequential Consistency

Slides:



Advertisements
Similar presentations
Full-System Timing-First Simulation Carl J. Mauer Mark D. Hill and David A. Wood Computer Sciences Department University of Wisconsin—Madison.
Advertisements

The University of Adelaide, School of Computer Science
Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
Exploring Memory Consistency for Massively Threaded Throughput- Oriented Processors Blake Hechtman Daniel J. Sorin 0.
Managing Wire Delay in Large CMP Caches Bradford M. Beckmann David A. Wood Multifacet Project University of Wisconsin-Madison MICRO /8/04.
A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.
Gwendolyn Voskuilen, Faraz Ahmad, and T. N. Vijaykumar Electrical & Computer Engineering ISCA 2010.
Zhongkai Chen 3/25/2010. Jinglei Wang; Yibo Xue; Haixia Wang; Dongsheng Wang Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China This paper.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
(C) 2001 Daniel Sorin Correctly Implementing Value Prediction in Microprocessors that Support Multithreading or Multiprocessing Milo M.K. Martin, Daniel.
(C) 2002 Daniel SorinWisconsin Multifacet Project SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery.
Speculative Sequential Consistency with Little Custom Storage Impetus Group Computer Architecture Lab (CALCM) Carnegie Mellon University
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
(C) 2003 Milo Martin Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper,
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Shuchang Shan † ‡, Yu Hu †, Xiaowei Li † † Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences.
1 The Google File System Reporter: You-Wei Zhang.
IVEC: Off-Chip Memory Integrity Protection for Both Security and Reliability Ruirui Huang, G. Edward Suh Cornell University.
Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University.
Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
SafetyNet Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill.
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2004 Daniel J. Sorin Duke University.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Safetynet: Improving The Availability Of Shared Memory Multiprocessors With Global Checkpoint/Recovery D. Sorin M. Martin M. Hill D. Wood Presented by.
A Survey of Fault Tolerance in Distributed Systems By Szeying Tan Fall 2002 CS 633.
Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke.
Transactional Memory Coherence and Consistency Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu,
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Timestamp snooping: an approach for extending SMPs Milo M. K. Martin et al. Summary by Yitao Duan 3/22/2002.
File-System Management
Presented by: Nick Kirchem Feb 13, 2004
Rerun: Exploiting Episodes for Lightweight Memory Race Recording
ASR: Adaptive Selective Replication for CMP Caches
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
A New Coherence Method Using A Multicast Address Network
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
Memory chips Memory chips have two main properties that determine their application, storage capacity (size) and access time(speed). A memory chip contains.
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Accelerating Dependent Cache Misses with an Enhanced Memory Controller
Model Checking for an Executable Subset of UML
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Improving Multiple-CMP Systems with Token Coherence
Simulating a $2M Commercial Server on a $2K PC
The University of Adelaide, School of Computer Science
Co-designed Virtual Machines for Reliable Computer Systems
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
University of Wisconsin-Madison Presented by: Nick Kirchem
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Dynamic Verification of Sequential Consistency Albert Meixner Daniel J. Sorin Dept. of Computer Dept. of Electrical and Science Computer Engineering Duke University Duke University

Introduction Multithreaded systems becoming ubiquitous Commercial workloads rely heavily on parallel machines Reliability and availability are crucial Backward Error Recovery can provide high availability Recover to known good state upon error But can only recover from errors detected in time Memory system is of special interest Complex – Many components, large transistor count Numerous error hazards

Memory System Error Detection Must cover all memory system components DRAMs, caches, controllers, interconnect, and write buffers Mechanisms for individual components exist Storage structures: ECC Interconnect: checksums, sequence numbering Cache and memory controllers: replication Adding detection to all components is hard Complicates design of every component Requires good intuition of interactions and possible errors  Want comprehensive, end-to-end error detection

Dynamic Verification Dynamic verification Correct system operation constantly monitored at runtime End-to-end scheme Detects transient errors, design bugs, and manufacturing errors Differs from statically verifying that design is bug-free High level invariants are checked, instead of individual components Simplified design of system components Can detect any low-level error that violates invariant

Memory Consistency Memory consistency model Formal specification of memory system behavior in a multithreaded system Defines order in which memory accesses from different CPUs can become globally visible Many consistency models exist, we focus on one Verifying memory consistency = Verifying correctness of the memory system Ideal invariant for dynamic verification

Sequential Consistency (SC) Requires appearance of total global order of all loads and stores in system Each load must receive value of most recent store in total order to the same address Program order of all processors is preserved in total order SC is most intuitive consistency model Good for programmers Speculation can make SC almost as fast as more relaxed models Our contribution: Dynamic Verification of Sequential Consistency (DVSC)

Outline Introduction DVSC-Direct DVSC-Indirect Results Conclusion

DVSC-Direct Program Order Program Order Global Order CPU 1 t=1.1 LD A→1 t=2.1 ST B←2 t=3.1 LD A→2 LD A→1 ST B←2 LD A→2 CPU 2 Program Order t=1.2 LD C→1 t=2.2 ST A←2 t=3.2 LD C→1 LD C→1 ST A←2 LD C→1 Global Order Verifier LD A→2 ST B←2 LD C→1 LD C→1 ST A←2 LD A→1

DVSC-Indirect: Idea Verify conditions sufficient for Sequential Consistency In-order performance of memory operations Cache coherence Conditions formally defined and proven by Plakal et al. [SPAA 1998] Two mechanisms On-chip checker for in-order performance Distributed checker for cache coherence

In-Order Performance Verification A load of block B receives the value of… …the most recent local store to B or most recent global store to B performed after all local stores Trivially observed on in-order processor with coherent caches Modern processors execute out-of-order Results of ooo-execution are considered speculative until in-order re-execution and verification DVSC-Indirect uses DIVA checker core by Austin [Micro 1999] Could substitute other mechanisms

Cache Coherence All processors observe the same order of stores to a given memory location Difficult because the same memory location can exist in different caches Maintained by a coherence protocol Different protocols: MOSI, MSI, MOESI, Token Coherence, … Different maintenance mechanisms: directory, snooping Verification uses “divide and conquer” Verify conditions provably sufficient for cache coherence Initially defined for proof of sequential consistency by Plakal et al. [SPAA1998]

Cache Coherence Verification Coherence Conditions Cache accesses are contained in an epoch Stores in read-write epochs Loads in read-write or read-only epochs Read-write epochs do not overlap other epochs Block data at beginning of epoch equals block data at end of last read-write epoch Verification Check if accesses are in appropriate epoch during DIVA-replay Collect epoch information at every node and send to verifier Verifier checks epoch history for overlaps and data propagation Epoch The time interval between obtaining and losing permissions on a block.

Implementation Overview CPU Core CPU Core CPU Core DIVA DIVA DIVA Cache Record Epochs Cache Record Epochs Cache Record Epochs Interconnect Memory Collect Epochs Memory Collect Epoch Memory Collect Epochs Verify Epochs Verify Epochs Verify Epochs Epoch History Epoch History Epoch History

At the Cache Controller All caches keep track of active epochs in the Cache Epoch Table (CET) Epoch Inform sent to the memory controller when epoch ends Begin and end data are hashed Every DIVA cache access checks CET for active epoch Ensure access is contained in epoch Verification off the critical path Second order performance effect from bandwidth usage Epoch Inform CET Type: read-write or read-only Begin time Begin data End time End data

At the Memory Controller Check for epoch overlaps and correct value propagation Generally requires entire block history → O(N) space If epoch informs are processed in order… Need end value of last read-write epoch for propagation check Need end time of last read-write and last read-only epoch for overlap check O(1) space Epochs arrive almost in order Fix remaining re-orderings in priority queue before verifications Epoch state in Memory Epoch Table (MET) Last end time of read-only epoch and read-write epoch, last value

Experimental Evaluation Empirically determine error detection capability Error injection into caches, controller, interconnect, switches, etc. Quantify error-free overhead Increase in interconnect bandwidth consumption Potential decrease in application performance

Simulation Methodology Full-system simulation of 8-CPU UltraSPARC SMP Simics functional simulation GEMS-based timing simulation 2 GB RAM, 4-way 32KB I+D L1, 4-way 1MB L2 SafetyNet for backward error recovery MOSI-Directory and MOSI-Snooping Benchmarks Apache 2 Static web-server SpecJBB 3-Tier Java system OLTP Online transaction system with DB2 Slashcode Dynamic website with perl and mysql Barnes Barnes-Hut from SPLASH2

Bottleneck Link Bandwidth - Directory

Error-Free Runtime - Directory slower

Conclusions DVSC-Direct and DVSC-Indirect enable end-to-end verification of the memory system DVSC-Indirect imposes acceptable hardware and performance overhead An extension of DVSC-Indirect to relaxed consistency is currently under development

Questions?