Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February 28 2006 Session 13.

Slides:



Advertisements
Similar presentations
L.N. Bhuyan Adapted from Patterson’s slides
Advertisements

MESI cache coherence protocol
CMSC 611: Advanced Computer Architecture
SE-292 High Performance Computing
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Introduction to MIMD architectures
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
BusMultis.1 Review: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor –
1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Spring 2003CSE P5481 Cache Coherency Cache coherent processors reading processor must get the most current value most current value is the last write Cache.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
CMPE 421 Parallel Computer Architecture Multi Processing 1.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February Session 6.
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Lecture 13: Multiprocessors Kai Bu
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
.1 Intro to Multiprocessors. .2 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control.
S YMMETRIC S HARED M EMORY A RCHITECTURE Presented By: Rahul M.Tech CSE, GBPEC Pauri.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 January Session 2.
Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 9.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 10.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
Additional Material CEG 4131 Computer Architecture III
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 7, 2005 Session 23.
Performance of Snooping Protocols Kay Jr-Hui Jeng.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 11.
The University of Adelaide, School of Computer Science
Computer Engineering 2nd Semester
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
CS 147 – Parallel Processing
12.4 Memory Organization in Multiprocessor Systems
CMSC 611: Advanced Computer Architecture
Directory-based Protocol
Parallel and Multiprocessor Architectures – Shared Memory
Multiprocessors - Flynn’s taxonomy (1966)
/ Computer Architecture and Design
High Performance Computing
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Lecture 25: Multiprocessors
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Virtual Memory, Multiprocessors
Lecture 23: Virtual Memory, Multiprocessors
Lecture 17 Multiprocessors and Thread-Level Parallelism
CSL718 : Multiprocessors 13th April, 2006 Introduction
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13

Computer Science and Engineering Copyright by Hesham El-Rewini Next time Problem from last time Shared memory Systems Cash Coherence Protocol Contents

Computer Science and Engineering Copyright by Hesham El-Rewini Problem Assume that a switching component such as a transistor can switch in zero time. We propose to construct a disk- shaped computer chip with such a component. The only limitation is the time it takes to send electronic signals from one edge of the chip to the other. Make the simplifying assumption that electronic signals travel 300,000 kilometers per second. What must be the diameter of a round chip so that it can switch 10 9 times per second? What would the diameter be if the switching requirements were time per second?

Computer Science and Engineering Copyright by Hesham El-Rewini MIMD Shared Memory Systems Interconnection Networks MMMM PPPPP

Computer Science and Engineering Copyright by Hesham El-Rewini Shared Memory Single address space Communication via read & write Synchronization via locks

Computer Science and Engineering Copyright by Hesham El-Rewini Classification Multi-port Uniform memory Access (UMA) Non-uniform Memory Access (NUMA) Cache Only Memory Architecture (COMA) M P2P2 P1P1

Computer Science and Engineering Copyright by Hesham El-Rewini Uniform Memory Access (UMA) C P C P C P C P MMMM Bus

Computer Science and Engineering Copyright by Hesham El-Rewini Non Uniform Memory Access (NUMA) M P M P M P Interconnection Network M P

Computer Science and Engineering Copyright by Hesham El-Rewini Cache Only Memory Architecture (COMA) C P D C P D C P D C P D Interconnection Network

Computer Science and Engineering Copyright by Hesham El-Rewini Bus Based & switch based SM Systems Global Memory P C P C P C P C P C P C P C MMMM

Computer Science and Engineering Copyright by Hesham El-Rewini Bus-based Shared Memory Collection of wires and connectors Only one transaction at a time Bottleneck!! How can we solve the problem? Global Memory PPPPP

Computer Science and Engineering Copyright by Hesham El-Rewini Using Caches Global Memory P1 C1 P2 C2 P3 C3 Pn Cn - Cache Coherence problem - How many processors?

Computer Science and Engineering Copyright by Hesham El-Rewini Group Activity Variables Number of processors (n) Hit rate (h) Bus Bandwidth (B) Processor speed (V) Maximum number of processors n ?

Computer Science and Engineering Copyright by Hesham El-Rewini Group Activity

Computer Science and Engineering Copyright by Hesham El-Rewini Single Processor caching P x x Memory Cache Hit: data in the cache Miss: data is not in the cache Hit rate: h Miss rate: m = (1-h)

Computer Science and Engineering Copyright by Hesham El-Rewini Cache Coherence Policies Writing to Cache in 1 processor case Write Through Write Back

Computer Science and Engineering Copyright by Hesham El-Rewini Writing in the cache P x x Before Memory Cache P x’ Write through Memory Cache P x’ x Write back Memory Cache

Computer Science and Engineering Copyright by Hesham El-Rewini Cache Coherence P1 x P2P3 x Pn x x -Multiple copies of x -What if P1 updates x?

Computer Science and Engineering Copyright by Hesham El-Rewini Cache Coherence Policies Writing to Cache in n processor case Write Update - Write Through Write Invalidate - Write Back Write Update - Write Through Write Invalidate - Write Back

Computer Science and Engineering Copyright by Hesham El-Rewini Write-invalidate P1 x P2P3 x x P1 x’ P2P3 I x’ P1 x’ P2P3 I x BeforeWrite Through Write back

Computer Science and Engineering Copyright by Hesham El-Rewini Write-Update P1 x P2P3 x x P1 x’ P2P3 x’ P1 x’ P2P3 x’ x BeforeWrite Through Write back

Computer Science and Engineering Copyright by Hesham El-Rewini Snooping Protocols Snooping protocols are based on watching bus activities and carry out the appropriate coherency commands when necessary. Global memory is moved in blocks, and each block has a state associated with it, which determines what happens to the entire contents of the block. The state of a block might change as a result of the operations Read-Miss, Read-Hit, Write-Miss, and Write-Hit.

Computer Science and Engineering Copyright by Hesham El-Rewini Write Invalidate Write Through Multiple processors can read block copies from main memory safely until one processor updates its copy. At this time, all cache copies are invalidated and the memory is updated to remain consistent.

Computer Science and Engineering Copyright by Hesham El-Rewini Write Through- Write Invalidate (cont.) StateDescription Valid [VALID] The copy is consistent with global memory Invalid [INV] The copy is inconsistent

Computer Science and Engineering Copyright by Hesham El-Rewini Write Through- Write Invalidate (cont.) EventActions Read HitUse the local copy from the cache. Read Miss Fetch a copy from global memory. Set the state of this copy to Valid. Write HitPerform the write locally. Broadcast an Invalid command to all caches. Update the global memory. Write Miss Get a copy from global memory. Broadcast an invalid command to all caches. Update the global memory. Update the local copy and set its state to Valid. ReplaceSince memory is always consistent, no write back is needed when a block is replaced.

Computer Science and Engineering Copyright by Hesham El-Rewini Example 1 C P C Q M X = 5 P reads XP reads X Q reads XQ reads X Q updates XQ updates X Q reads XQ reads X Q updates XQ updates X P updates XP updates X Q reads XQ reads X

Computer Science and Engineering Copyright by Hesham El-Rewini Complete the table (Write through write invalidate) MemoryP’sCacheQ’sCache EventXXStateXState 0 Original value 5 1 P reads X (Read Miss) 55VALID