Performance of Snooping Protocols Kay Jr-Hui Jeng.

Slides:

Advertisements

Similar presentations

Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.

Advertisements

L.N. Bhuyan Adapted from Patterson’s slides

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Cache Optimization Summary

Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.

The University of Adelaide, School of Computer Science

Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 4 -- Spring 2001.

CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.

1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.

1 Lecture 20: Coherence protocols Topics: snooping and directory-based coherence protocols (Sections )

1 Lecture 1: Introduction Course organization:  4 lectures on cache coherence and consistency  2 lectures on transactional memory  2 lectures on interconnection.

1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )

CS 258 Spring An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing Per Stenström, Mats Brorsson, and Lars Sandberg Presented by Allen.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Nov 14, 2005 Topic: Cache Coherence.

CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Snoopy Coherence Protocols Small-scale multiprocessors.

Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.

1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.

Multiprocessor Cache Coherency

1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.

Tufts University Department of Electrical and Computer Engineering

Dynamic Verification of Cache Coherence Protocols Jason F. Cantin Mikko H. Lipasti James E. Smith.

Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

A Low-Overhead Coherence Solution for Multiprocessors with Private Cache Memories Also known as “Snoopy cache” Paper by: Mark S. Papamarcos and Janak H.

Lecture 13: Multiprocessors Kai Bu

Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Evaluating the Performance of Four Snooping Cache Coherency Protocols Susan J. Eggers, Randy H. Katz.

S YMMETRIC S HARED M EMORY A RCHITECTURE Presented By: Rahul M.Tech CSE, GBPEC Pauri.

Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.

1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.

Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE.

1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.

Additional Material CEG 4131 Computer Architecture III

1 Lecture: Coherence Protocols Topics: snooping-based protocols.

“An Evaluation of Directory Schemes for Cache Coherence” Presented by Scott Weber.

CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.

Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University

The University of Adelaide, School of Computer Science

22/12/2005 Distributed Shared-Memory Architectures by Seda Demirağ Distrubuted Shared-Memory Architectures by Seda Demirağ.

Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.

COSC6385 Advanced Computer Architecture

Computer Engineering 2nd Semester

The University of Adelaide, School of Computer Science

The University of Adelaide, School of Computer Science

A Study on Snoop-Based Cache Coherence Protocols

Multiprocessor Cache Coherency

Jason F. Cantin, Mikko H. Lipasti, and James E. Smith

CMSC 611: Advanced Computer Architecture

The University of Adelaide, School of Computer Science

James Archibald and Jean-Loup Baer CS258 (Prof. John Kubiatowicz)

CMSC 611: Advanced Computer Architecture

High Performance Computing

Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Cache coherence CEG 4131 Computer Architecture III

Update : about 8~16% are writes

Lecture 17 Multiprocessors and Thread-Level Parallelism

CSL718 : Multiprocessors 13th April, 2006 Introduction

The University of Adelaide, School of Computer Science

Lecture 17 Multiprocessors and Thread-Level Parallelism

Presentation transcript:

Performance of Snooping Protocols Kay Jr-Hui Jeng

Outline Snooping protocols Snooping protocols Simulation results Simulation results Comparison of performance Comparison of performance Conclusion Conclusion

Snooping protocols a protocol for maintaining cache coherency in symmetric multiprocessing environments. In a snooping system, all caches on the bus snoop the bus to determine if they have a copy of the block of data that is requested on the bus. Every cache has a copy of the sharing status of every block of physical memory it has. Multiple copies of a document in a multiprocessing environment typically can be read without any coherence problems; however, a processor must have exclusive access to the bus in order to write. a protocol for maintaining cache coherency in symmetric multiprocessing environments. In a snooping system, all caches on the bus snoop the bus to determine if they have a copy of the block of data that is requested on the bus. Every cache has a copy of the sharing status of every block of physical memory it has. Multiple copies of a document in a multiprocessing environment typically can be read without any coherence problems; however, a processor must have exclusive access to the bus in order to write.

Comparisons of Write-invalidate & Write-update Write-invalidateWrite-update Writing Writing processor forces all others to invalidate their copies Writing processor forces all others to update their copies Advantage Less bus traffic Other processors can get the data faster disadvantage Update the data to cache slower Higher bus traffic

Comparisons of Write-back & Write-through Write-backWrite-through Writing The memory is updated only when the block in the cache is being replace The memory is updated every the cache is updated Advantage Less bus traffic Memory can get the new data faster Every write is observable-> Simple disadvantage Update the data to memory slower Higher bus traffic

Types of Snooping Protocols Write-Invalidate and Write-Through Write-Invalidate and Write-Through Write-Invalidate and Write-Back Write-Invalidate and Write-Back Write-Once Write-Once Write-Update and Partial Write-Through (Firefly) Write-Update and Partial Write-Through (Firefly) Write-Update and Write-Back (Dragon) Write-Update and Write-Back (Dragon)

Write-Invalidate and Write-Through The memory is always consistent with the most recently updated cache copy The memory is always consistent with the most recently updated cache copy Multiple processors can read block copies from main memory safely until one processor updates its copy. At this time, all cache copies are invalidated and the memory is updated to remain consistent. Multiple processors can read block copies from main memory safely until one processor updates its copy. At this time, all cache copies are invalidated and the memory is updated to remain consistent.

Write-Invalidate and Write-Through VALID INV Bus Write-Miss, Bus Write-Hit Write-Hit, Read-Hit Write-Miss, Read-Miss Read-Miss

Synapse It belongs to write-invalidate & write-back protocol It belongs to write-invalidate & write-back protocol A multiprocessor for fault-tolerant transaction processing A multiprocessor for fault-tolerant transaction processing It has two system buses, the added bandwidth of the extra bus allows the system to be expanded to more processors (max 28) It has two system buses, the added bandwidth of the extra bus allows the system to be expanded to more processors (max 28) The including single-bit tag with each cache block in main memory =>whether main memory is to respond to a miss on that block The including single-bit tag with each cache block in main memory =>whether main memory is to respond to a miss on that block

Synapse

Write-Once Write-invalidate protocol (Goodman) Write-invalidate protocol (Goodman) Designed for single-board using Multi-bus Designed for single-board using Multi-bus Combination of Write-through and Write-back Combination of Write-through and Write-back

Write-Once

Write-Update and Partial Write-Through (Firefly) An update to one cache is written to memory at the same time; Broadcast to other caches sharing the updated block An update to one cache is written to memory at the same time; Broadcast to other caches sharing the updated block These caches snoop on the bus and perform updates to their local copies. These caches snoop on the bus and perform updates to their local copies. There is also a special bus line, which is used to detect sharing There is also a special bus line, which is used to detect sharing Multiple writers are permitted-the data for each write to a shared block are transmitted to each and to the backing store =>The protocol never causes an invalidation Multiple writers are permitted-the data for each write to a shared block are transmitted to each and to the backing store =>The protocol never causes an invalidation

Write-Update and Partial Write- Through (Firefly)

Write-Update and Write-Back (Dragon) It similar to write-update and partial write through It similar to write-update and partial write through Memory updates are done only when the block is being replaced Memory updates are done only when the block is being replaced Writes to shared blocks are not immediately sent to main memory, only to other caches that have a copy of the block Writes to shared blocks are not immediately sent to main memory, only to other caches that have a copy of the block

Write-Update and Write-Back (Dragon)

Experiment Results (1)

Experiment Results (2)

Experiment Results (3)

Experiment Results (4)

Experiment Results (5)

Result analysis (1) F-7: Dragon and Firefly protocols are identical in the handling of private blocks. The performance of write-once is dependent on the trade-off between single word writes and the reduction in the write-back. The performance of Synapse is below the others as a result of the additional overhead of treating write hits on unmodified block as write misses. F-7: Dragon and Firefly protocols are identical in the handling of private blocks. The performance of write-once is dependent on the trade-off between single word writes and the reduction in the write-back. The performance of Synapse is below the others as a result of the additional overhead of treating write hits on unmodified block as write misses.

Result analysis (2) Comparison of F-8~F-10 with F-7 indicates the impact of handling shared blocks efficiently Comparison of F-8~F-10 with F-7 indicates the impact of handling shared blocks efficiently F-8, F-11 and F-14 (share block=16), which demonstrate that the distributed write approach of Dragon and Firefly has the best performance in the handling of shared data. F-8, F-11 and F-14 (share block=16), which demonstrate that the distributed write approach of Dragon and Firefly has the best performance in the handling of shared data. F-11, F-12 and F-13: Because no invalidations, the performance of Dragon and Firefly decreases as the average actual sharing decreases and the number of shared blocks increases. F-11, F-12 and F-13: Because no invalidations, the performance of Dragon and Firefly decreases as the average actual sharing decreases and the number of shared blocks increases.

Result analysis (3) The performance of the Dragon exceeds that of the Firefly at levels of high sharing(F-8 and F-9) because the Firefly must send distributed writes to global memory while the Dragon sends them to the caches only. The performance of the Dragon exceeds that of the Firefly at levels of high sharing(F-8 and F-9) because the Firefly must send distributed writes to global memory while the Dragon sends them to the caches only. The performance of write-once is lower than above protocols as a result of the added overhead of updating memory each time a Dirty block is missed in another cache. The performance of write-once is lower than above protocols as a result of the added overhead of updating memory each time a Dirty block is missed in another cache.

Result analysis (4) The performance of Synapse is lower, because it increased overhead of read misses on blocks that are Dirty in another cache and to the added overhead of loading new data on a write hit on an unmodified block. The performance of Synapse is lower, because it increased overhead of read misses on blocks that are Dirty in another cache and to the added overhead of loading new data on a write hit on an unmodified block. The results of Write-Invalidate and Write- Through which has the lowest performances, because blocks are not loaded into the cache on a write miss. The results of Write-Invalidate and Write- Through which has the lowest performances, because blocks are not loaded into the cache on a write miss.

References Hesham El-Rewini, Mostafa Abd-El-Barr. Advanced Computer Architecture and Parallel Processing, John Wiley, Hesham El-Rewini, Mostafa Abd-El-Barr. Advanced Computer Architecture and Parallel Processing, John Wiley, James Archibald and Jean-Loup Baer. Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model. ACM Transactions on Computer Systems, Vol. 4, No. 4, November 1986, (1986). James Archibald and Jean-Loup Baer. Cache Coherence Protocols: Evaluation Using a Multiprocessor Simulation Model. ACM Transactions on Computer Systems, Vol. 4, No. 4, November 1986, (1986). ocol.html ocol.html ocol.html ocol.html

References eOnceHelp.htm eOnceHelp.htm eOnceHelp.htm eOnceHelp.htm html html html html /ch5-6.pdf /ch5-6.pdf /ch5-6.pdf /ch5-6.pdf pdf 1.pdf 1.pdf 1.pdf

Thank You!