Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 7, 2005 Session 23.

Slides:



Advertisements
Similar presentations
CSCI 8150 Advanced Computer Architecture
Advertisements

The University of Adelaide, School of Computer Science
1 Lecture 6: Directory Protocols Topics: directory-based cache coherence implementations (wrap-up of SGI Origin and Sequent NUMA case study)
Cache Optimization Summary
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches II Steve Ko Computer Sciences and Engineering University at Buffalo.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
1 Lecture 4: Directory-Based Coherence Details of memory-based (SGI Origin) and cache-based (Sequent NUMA-Q) directory protocols.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 2: Snooping and Directory Protocols Topics: Snooping wrap-up and directory implementations.
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
1 Lecture 5: Directory Protocols Topics: directory-based cache coherence implementations.
ECE669 L18: Scalable Parallel Caches April 6, 2004 ECE 669 Parallel Computer Architecture Lecture 18 Scalable Parallel Caches.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
1 Lecture 3: Directory-Based Coherence Basic operations, memory-based and cache-based directories.
CS252/Patterson Lec /28/01 CS 213 Lecture 10: Multiprocessor 3: Directory Organization.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
CS 258 Parallel Computer Architecture LimitLESS Directories: A Scalable Cache Coherence Scheme David Chaiken, John Kubiatowicz, and Anant Agarwal Presented:
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
Lecture 3. Directory-based Cache Coherence Prof. Taeweon Suh Computer Science Education Korea University COM503 Parallel Computer Architecture & Programming.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
1 Lecture: Coherence Topics: snooping-based coherence, directory-based coherence protocols (Sections )
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
The University of Adelaide, School of Computer Science
Cache Coherence: Directory Protocol
Cache Coherence: Directory Protocol
תרגול מס' 5: MESI Protocol
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
12.4 Memory Organization in Multiprocessor Systems
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Directory-based Protocol
The University of Adelaide, School of Computer Science
Lecture 9: Directory-Based Examples II
CS5102 High Performance Computer Systems Distributed Shared Memory
Cache Coherence Protocols 15th April, 2006
Interconnect with Cache Coherency Manager
Lecture 8: Directory-Based Cache Coherence
Lecture 7: Directory-Based Cache Coherence
Lecture 25: Multiprocessors
Lecture 9: Directory-Based Examples
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 8: Directory-Based Examples
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Multiprocessors
Lecture: Coherence Topics: wrap-up of snooping-based coherence,
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 10: Directory-Based Examples II
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 7, 2005 Session 23

Computer Science and Engineering Contents Review of Directory Based Protocols Chained Directory Centralized Directory Invalidate Scalable Coherent Interface Stanford Distributed Directory Shared Memory Programming

Computer Science and Engineering Chained Directory Chained directories emulate full-map by distributing the directory among the caches. Chained directories emulate full-map by distributing the directory among the caches. Solving the directory size problem without restricting the number of shared block copies. Solving the directory size problem without restricting the number of shared block copies. Chained directories keep track of shared copies of a particular block by maintaining a chain of directory pointers. Chained directories keep track of shared copies of a particular block by maintaining a chain of directory pointers.

Computer Science and Engineering Chained Directory Interconnection Network X: Directory C0Data X: CTData C2Data Cache C0 Cache C1 Cache C2 Cache C3 Memory

Computer Science and Engineering Centralized Directory Invalidate Invalidating signals and a pointer to the requesting processor are forwarded to all processors that have a copy of the block. Each invalidated cache sends an acknowledgment to the requesting processor. After the invalidation is complete, only the writing processor will have a cache with a copy of the block.

Computer Science and Engineering Write by P3 Cache C0 X: Directory Cache C1 Cache C2 Cache C3 Data X: Data 1010 Memory Write inv-ack Invalidate & requester Invalidate & requester Write-reply

Computer Science and Engineering Scalable Coherent Interface (SCI) Doubly linked list of distributed directories. Each cached block is entered into a list of processors sharing that block. For every block address, the memory and cache entries have additional tag bits. Part of the memory tag identifies the first processor in the sharing list (the head). Part of each cache tag identifies the previous and following sharing list entries.

Computer Science and Engineering SCI Scenarios Initially memory is in the uncached state and cached copies are invalid. A read request is directed from a processor to the memory controller. The requested data is returned to the requester’s cache and its entry state is changed from invalid to the head state. This changes the memory state from uncached to cached.

Computer Science and Engineering SCI Scenarios (Cont.) When a new requester directs its read request to memory, the memory returns a pointer to the head. A cache-to-cache read request (called Prepend) is sent from the requester to the head cache. On receiving the request, the head cache sets its backward pointer to point to the requester’s cache. The requested data is returned to the requester’s cache and its entry state is changed to the head state. The requested data is returned to the requester’s cache and its entry state is changed to the head state.

Computer Science and Engineering SCI – Sharing List Addition Memory Cache C0 (head) Cache C2 (Invalid) 1) read 2) prepend Before Memory Cache C0 (middle) Cache C2 (head) After

Computer Science and Engineering SCI Scenarios (Cont.) The head of the list has the authority to purge other entries in the list to obtain an exclusive (read-write) entry.

Computer Science and Engineering SCI-- Head Purging Other Entries Cache C0 (tail) Cache C2 (middle) Cache C3 (head) Memory Purge

Computer Science and Engineering Stanford Distributed Directory (SDD) A singly linked list of distributed directories. Similar to the SCI protocol, memory points to the head of the sharing list. Each processor points only to its predecessor. The sharing list additions and removals are handled different from the SCI protocol.

Computer Science and Engineering SDD Scenarios On a read miss, a new requester sends a read-miss message to memory. The memory updates its head pointers to point to the requester and send a read- miss-forward signal to the old head. On receiving the request, the old head returns the requested data along with its address as a read-miss-reply. When the reply is received, at the requester’s cache, the data is copied and the pointer is made to point to the old head.

Computer Science and Engineering SDD– List Addition Memory Cache C0 (middle) Cache C2 (head) After Memory Cache C0 (head) Cache C2 (Invalid) 1) read 3) read-miss-reply Before 2) read- miss- forward

Computer Science and Engineering SDD Scenarios (cont.) On a write miss, a requester sends a write-miss message to memory. The memory updates its head pointers to point to the requester and sends a write- miss-forward signal to the old head. The old head invalidates itself, returns the requested data as a write- miss-reply-data signal, and send a write-miss-forward to the next cache in the list.

Computer Science and Engineering SDD Scenarios (cont.) When the next cache receives the write-miss-forward signal, it invalidates itself and sends a write-miss- forward to the next cache in the list. When the write- miss-forward signal is received by the tail or by a cache that no longer has copy of the block, a write- miss-reply is sent to the requester. The write is complete when the requester receives both write-miss- reply-data and write-miss-reply.

Computer Science and Engineering SDD- Write Miss List Removal Cache C3 (exclusive) Memory After Cache C0 (tail) Cache C2 (head) Cache C3 (invalid) Memory 1) write 3) write miss-forward 2) write miss-forward 3) write miss-reply-data 4) write miss-reply Before

Computer Science and Engineering Task Creation … … … fork join Supervisor Workers Supervisor Workers

Computer Science and Engineering Serial vs. Parallel Process Code Data Stack Code Private Data Private Stack Shared Data

Computer Science and Engineering Communication via Shared data Shared Data Code Private Data Private Stack Code Private Data Private Stack Access Process 1 Process 2

Computer Science and Engineering Synchronization P1P1 P2P2 P3P3 Lock ….. unlock Lock ….. unlock Lock ….. unlock Locks wait

Computer Science and Engineering Barriers Barrier proceed wait Barrier proceed wait Barrier proceed T2 T0 T1 Synchronization Point