Token Coherence: Decoupling Performance and Correctness Milo M. D. Martin Mark D. Hill David A. Wood University of Wisconsin-Madison ISCA-30 (2003)

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

Coherence Ordering for Ring-based Chip Multiprocessors Mike Marty and Mark D. Hill University of Wisconsin-Madison.
Cache Coherence Mechanisms (Research project) CSCI-5593
1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
1 Lecture 4: Directory Protocols Topics: directory-based cache coherence implementations.
Cache Optimization Summary
Manager-Client Pairing: A Framework for Implementing Coherence Hierarchies Jesse G. Beu Michael C. Rosier Thomas M. Conte Tinker Research Georgia Institute.
CS 258 Parallel Computer Architecture Lecture 15.1 DASH: Directory Architecture for Shared memory Implementation, cost, performance Daniel Lenoski, et.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
(C) 2003 Milo Martin Token Coherence: Decoupling Performance and Correctness Milo Martin, Mark Hill, and David Wood Wisconsin Multifacet Project
Token Tenure: PATCHing Token Counting Using Directory-Based Cache Coherence Arun Raghavan, Colin Blundell, Milo Martin University of Pennsylvania {arraghav,
A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors Karin StraussAMD Advanced Architecture and Technology.
DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
(C) 2002 Milo MartinHPCA, Feb Bandwidth Adaptive Snooping Milo M.K. Martin, Daniel J. Sorin Mark D. Hill, and David A. Wood Wisconsin Multifacet.
(C) 2003 Milo Martin Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper,
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
(C) 2004 Daniel SorinDuke Architecture Using Speculation to Simplify Multiprocessor Design Daniel J. Sorin 1, Milo M. K. Martin 2, Mark D. Hill 3, David.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
(C) 2005 Multifacet Project Token Coherence: A Framework for Implementing Multiple-CMP Systems Mike Marty 1, Jesse Bingham 2, Mark Hill 1, Alan Hu 2, Milo.
Building Expressive, Area-Efficient Coherence Directories Michael C. Huang Guofan Jiang Zhejiang University University of Rochester IBM 1 Lei Fang, Peng.
Quantifying and Comparing the Impact of Wrong-Path Memory References in Multiple-CMP Systems Ayse Yilmazer, University of Rhode Island Resit Sendag, University.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
SafetyNet Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
(C) 2003 Daniel SorinDuke Architecture Dynamic Verification of End-to-End Multiprocessor Invariants Daniel J. Sorin 1, Mark D. Hill 2, David A. Wood 2.
Effects of wrong path mem. ref. in CC MP Systems Gökay Burak AKKUŞ Cmpe 511 – Computer Architecture.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 9: May 3, 2001 Distributed Shared Memory.
“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
Timestamp snooping: an approach for extending SMPs Milo M. K. Martin et al. Summary by Yitao Duan 3/22/2002.
March University of Utah CS 7698 Token Coherence: Decoupling Performance and Correctness Article by: Martin, Hill & Wood Presented by: Michael Tabet.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Computer Architecture Lecture 29: Cache Coherence Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 4/10/2015.
A New Coherence Method Using A Multicast Address Network
Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors Milo Martin, Pacia Harper, Dan Sorin§, Mark.
Multiprocessor Cache Coherency
CMSC 611: Advanced Computer Architecture
Cache Coherence Protocols:
Lecture 1: Parallel Architecture Intro
Improving Multiple-CMP Systems with Token Coherence
E. Bilir, R. Dickson, Y. Hu, M. Plakal, D. Sorin,
Natalie Enright Jerger, Li Shiuan Peh, and Mikko Lipasti
CS 213 Lecture 11: Multiprocessor 3: Directory Organization
Lecture 25: Multiprocessors
Token Coherence: Decoupling Performance and Correctness
The University of Adelaide, School of Computer Science
Prof. Onur Mutlu ETH Zürich Fall November 2017
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Presentation transcript:

Token Coherence: Decoupling Performance and Correctness Milo M. D. Martin Mark D. Hill David A. Wood University of Wisconsin-Madison ISCA-30 (2003)

Outline Token Coherence Basics –What Is Token Coherence? –What Are Its Advantages? Review of Snooping and Directory-Based Coherence Schemes Token Coherence Details Evaluation Results and Conclusions

Token Coherence Basics Decouple Interconnect Performance and Protocol Correctness –Remember Amdahl’s Law: Go Faster In Most Cases At Expense Of Occasional, More Expensive Corner Case Handling Fast, Unordered Bus For Protocol Traffic –Races Allowed, Handling Built-In, But Deferred –Cache Lines Have Tokens To Track State Token Ownership Correlates With Coherence State (MOSI)

Token Coherence Benefits Fast Bus Makes Common Case Fast –Some Additional Bus Traffic (Not Too Much) –Common Is Very Common (95+%) Benefits Of Snooping And Directories –Unordered Bus Messages (Directories) –Cache-Cache Transfers Without Indirection (Snooping)

Outline Token Coherence Basics –What Is Token Coherence? –What Are Its Advantages? Review of Snooping and Directory-Based Coherence Schemes Token Coherence Details Evaluation Results and Conclusions

Coherence Review: Snooping Requests Broadcast, All Agents Snoop –One-Hop Communication (Low Latency) –Every Agent Processes Every Message Bus Is Point Of Synchronization (Total Order) –Hard To Scale To Higher Speeds/Larger Systems P0P1P2P3 Mem Total Order of Events: 1 Before 2 Before 3 To All Bus Agents

Coherence Review: Directories Directory Manages Ordering Of Requests –Bus Can Be Faster (Unordered) –Adds Level Of Indirection To Cache-Cache Transactions (Extra Latency) –Messages Addressed To Specific Agents P0P1P2P3 Dir Bus Order Irrelevant, Ordering At Directory: 1 Before 2 Before 3 33

Outline Token Coherence Basics –What Is Token Coherence? –What Are Its Advantages? Review of Snooping and Directory-Based Coherence Schemes Token Coherence Details Evaluation Results and Conclusions

Token Coherence Goals Achieve Efficiency Of Snooping –Low-Latency Inter-Cache Transfers Keep Advantage Of Directories –Unordered, Fast Interconnects Allowed How? –Use Of Tokens To Implement “Correctness Substrate” And Protocol(s) –Tokens Associated With Each Cache Line At Least N Tokens Per Line (N = # Processors)

Correctness Substrate Enforce Safety – Count Tokens 1.Tokens Preserved (Always T Tokens Per Line) 2.Must Hold All Tokens To Write 3.Must Hold One Token To Read 4.If A Message Has A Token, It Must Have Data Optimization – Add Special Owner Token 1.Tokens Preserved With One Owner Token Per Line 2.Must Hold All Tokens To Write 3.Must Hold One Token And Valid Data To Read 4.If A Message Has Owner Token, It Must Have Data

A Token Coherence Protocol TokenB – Token Coherence Using Broadcast –Processors Broadcast Transient Token Requests –Snooping Mayhem Ensues Without Total Ordering Of Transactions On Bus Races Allowed To Occur – Unanswered Messages Retried Last Resort (After Several Retries) Is Aptly-Named Persistent Request

Outline Token Coherence Basics –What Is Token Coherence? –What Are Its Advantages? Review of Snooping and Directory-Based Coherence Schemes Token Coherence Details Evaluation Results and Conclusions

Performance Evaluation Compared To: Snooping, Directories, And Glueless “Estimated” Protocol Simulation Results Show… –Token Coherence Works, Performs Well, Even –For 16-Processor System, >95% Of First Transient Requests Succeed With TokenB –Potentially Significant Additional Traffic Generated Over Directory Not Ultimately As Scalable As Directories

Conclusions And Discussion Token Coherence Provides Novel Solution To Snooping/Directory Hybrid Optimization Problem –Is Additional Space For Token Storage An Issue? –Why Didn’t Hammer/21364 Use This? With Glueless A Trend, Is This Doomed To Academia? –How Does This Compare To Multicast Snooping?