Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE.

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
MESI cache coherence protocol
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Performance of Cache Memory
Virtual Memory Introduction to Operating Systems: Module 9.
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Technical University of Lodz Department of Microelectronics and Computer Science Elements of high performance microprocessor architecture Shared-memory.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Translation Buffers (TLB’s)
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
COEN 180 Main Memory Cache Architectures. Basics Speed difference between cache and memory is small. Therefore:  Cache algorithms need to be implemented.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
1 Cache coherence CEG 4131 Computer Architecture III Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
CMPE 421 Parallel Computer Architecture
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Spring EE 437 Lillevik 437s06-l21 University of Portland School of Engineering Advanced Computer Architecture Lecture 21 MSP shared cached MSI protocol.
Chapter Twelve Memory Organization
Cache Control and Cache Coherence Protocols How to Manage State of Cache How to Keep Processors Reading the Correct Information.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
CS333 Intro to Operating Systems Jonathan Walpole.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 13.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
Performance of Snooping Protocols Kay Jr-Hui Jeng.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 February Session 7.
The University of Adelaide, School of Computer Science
CSCI206 - Computer Organization & Programming
COSC6385 Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Jonathan Walpole Computer Science Portland State University
Main Memory Cache Architectures
תרגול מס' 5: MESI Protocol
Computer Engineering 2nd Semester
The Hardware/Software Interface CSE351 Winter 2013
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
Example Cache Coherence Problem
Module IV Memory Organization.
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
CS 704 Advanced Computer Architecture
Main Memory Cache Architectures
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Translation Buffers (TLB’s)
High Performance Computing
CS 3410, Spring 2014 Computer Science Cornell University
Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini
Translation Buffers (TLB’s)
Cache coherence CEG 4131 Computer Architecture III
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Translation Buffers (TLBs)
Review What are the advantages/disadvantages of pages versus segments?
Multiprocessors and Multi-computers
Presentation transcript:

Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE Graduate Student, Southern Methodist University.

Goals Create a pure software cache system as a test bed. Implement five cache write policies for maintaining coherency on the test bed. Perform experiments and test different scenarios Gather statistics, measure and make conclusions.

Cache Basics Cache is a small store placed between a processor and its main memory in a shared memory system Faster Volatile store Exploits locality of reference. Spatial locality: Neighboring locations in a store have a higher chance of being accessed. Temporal locality: Once accessed, a location in a store will be accessed repeatedly over time. Hit: An event when data to be read is already in the cache. Large number of hits give better throughput.

Issues Multiple copies of a datum exist. Keeping copies of cached items in sync Sync’ing should not affect performance or throughput of the system.

Project Details Implement various cache policies. Tinker with tunables to understand effects on the system. Measure performance/effectiveness of the policies NOT the algorithms or implementation. Software written in C on Windows Operating System.

Model of the System Input: I/O Load Policy Parameter Diagnostic Output: Cache and Main Mem Dumps Policies Main Memory Caches Processing Units

Inputs and Outputs The input is given through a file which contains: –I/O type (0=Read, 1=Write) –I/O address. –Processor to perform I/O on. –The data to be written for the basic system where no computations are performed A parser converts input to actual I/O. Policies can be specified by the user. Observe dumps of cache/main memory to verify functionality.

Assumptions and Simplifications Inputs are small sequences of reads and writes. Use small caches to create maximum activity. Memory and cache locations are byte wide. All caches have the same write policy configured at any point in time. Each cache entry has the following structure: DATAADDRSTATUS

Policies Implemented Write Through – Write Invalidate Write Back – Write Invalidate Write Once Write Update – Partial Write Through Write Back – Write Update

Policy 1: WRITE THROUGH WRITE INVALIDATE STATES VALID Copy consistent with main memory INVALID Copy inconsistent with main memory

READWRITE Policy 1: WRITE THROUGH WRITE INVALIDATE HITMISSHITMISS Read the copy found in cache. Done! Any other cache has a valid copy No other cache has. Go to global memory Replacement is required if no space to accommodate incoming new copy. Since cache is always consistent with main memory. No write back is required. STATUS=VALID Write over the copy found in cache. Update global memory and invalidate other Caches. STATUS=VALID Any other cache has a valid copy No other cache has. Go to global memory Write new data over this copy. Update global memory. Invalidate others. Replacement may be needed if no space. No write back. STATUS=VALID

Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

Policy 2: WRITE BACK WRITE INVALIDATE STATES RO-SHARED Multiple copies consistent with main memory INVALID Copy inconsistent with main memory RW-EXCLUSIVE Only one copy inconsistent with main memory (Ownership)

READWRITE Policy 2: WRITE BACK WRITE INVALIDATE HITMISSHIT MISS Read the copy found in cache. Done! RW copy in no other cache. Get a copy from global RW copy in another cache Get it. Update global memory. If Status=RW Write over it. STATUS=RW Other has RWNo other cache has RW. Go to global memory. Write new data. Invalidate others STATUS=RW If entry to be replaced=RW, write back to global. If entry to be replaced=I/RO No write back. STATUS=RO In both. Write over it. STATUS=RO In both caches if got from another cache. SPACE?? n y If Status=RO Write over it. Invalidate others. STATUS=RW Copy into own. Invalidate others. Write new data. STATUS=RW SPACE?? If no space, Replace. If copy to be replaced = RW, write back to global. Otherwise simply write over it. No write back. STATUS=RW

Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

Policy 3: WRITE ONCE STATES RESERVED Written once consistent with main memory VALID Copy Consistent with main memory DIRTY Written more than once. Inconsistent With main memory INVALID Copy Consistent with main memory

READWRITE Policy 3: WRITE ONCE HITMISSHIT MISS Read the copy found in cache. Done! DIRTY copy in no other cache. Get a copy from global DIRTY copy in another cache Get it. Update global memory. If Status=D/RES Write over it. STATUS=D Other has DIRTYNo other cache has DIRTY. Go to global memory. Write new data. Invalidate others If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/RES No write back. STATUS=VALID In both. Write over it. STATUS =VALID In both caches if got from another cache. SPACE?? n y If Status=VALID Write over it. Invalidate others. Update global STATUS=RES Copy into own. Invalidate others. Write new data. SPACE?? If no space, Replace. If copy to be replaced = DIRTY, write back to global. Otherwise simply write over it. No write back. STATUS=DIRTY

Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

Policy 4: WRITE UPDATE PARTIAL WRITE THROUGH STATES SHARED Multiple copies consistent with main memory DIRTY Only one copy inconsistent with main memory (Ownership) VALID-EXCLUSIVE Only one copy consistent with main memory

READ Policy 4: WRITE UPDATE ‘PARTIAL’ WRITE THROUGH HITMISS Read the copy found in cache. Done! No other cache has a copy. Get a copy from global DIRTY copy in another cache Get it. Update global. If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=VALX Write over it. STATUS =VALX SPACE?? n y VALX/SHARE copy in another cache Get it. If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=SHARE In both. Write over it. STATUS=SHARE In both caches. n y

WRITE Policy 4: Contd… HITMISS Copy=D/VALX Write locally Copy=SHARE Write over Update all sharing caches. Update global. STATUS=SHARE Another cache has a copy. Get it Write over Update all caches Update global No other cache has a copy. Get it from global memory. Write over it. SPACE?? If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=SHARE Write over it. STATUS=SHARE n y SPACE?? If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=DIRTY Write over it. STATUS=DIRTY n y

Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

Policy 5: WRITE UPDATE WRITE BACK STATES SHARED-CLEAN Multiple shared Copies, could be Consistent with Main memory. (No ownership) VALID-EX Only one copy Consistent with main memory SHARED-DIRTY Multiple shared Copies, last one to be modified (Ownership) DIRTY Unshared and updated Inconsistent With main memory

READ Policy 5: WRITE UPDATE WRITE BACK HITMISS Read the copy found in cache. Done! No other cache has a copy. Get a copy from global DIRTY/SD copy in another cache Get it. If entry to be replaced=D/SD, write back to global. If entry to be replaced=VALX/SC No write back. STATUS=VALX Write over it. STATUS =VALX SPACE?? n y VALX/SC copy in another cache Get it. If entry to be replaced=D/SD write back to global. If entry to be replaced=VALX/SC No write back. Supplying cache STATUS=SD Taking cache STATUS=SC Write over it. Supplying cache STATUS=SD Taking cache STATUS=SC n y SPACE?? If entry to be replaced=D/SD, write back to global. If entry to be replaced=VALX/SC No write back. STATUS=SC In both. Write over it. STATUS=SC In both caches. n y

WRITE Policy 5: contd… HITMISS Copy=D/VALX Write locally STATUS=DIRTY Copy=SC/SD Write over Update all sharing caches. STATUS (own)=SD STATUD (others)=SC Another cache has a copy. Get it Write over. Update all caches No other cache has a copy. Get it from global memory. Write over it. SPACE?? If entry to be replaced=D/SD, write back to global. If VALX/SC No write back. Supplying cache STATUS=SC Taking cache STATUS=SD Write over it. Supplying cache STATUS=SC Taking cache STATUS=SD n y SPACE?? If entry to be replaced=D/SD, write back to global. If entry to be replaced=VALX/SC No write back. STATUS=DIRTY Write over it. STATUS=DIRTY n y

Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

A Practical Experiment: Matrix Multiplication 3 x 3 matrix data from input file to main memory Start with empty caches. Matrices multiplied by reading values from main memory. Results written to main memory. Policy used is Write Through - Write Invalidate Three processor/cache sets. Each processor computes three elements of each row. Each cache has only 7 locations, 6 inputs and 1 result. Lot of inter-cache exchange Replacements abound due to small cache

Logic = Processor 0 Replace

Processor 0 As it is seen most of the times each processor can find what it wants in another cache! Processor 2 Processor 1

Replacement Logic Each entry also carries a Use tag and a Replaced bit. When the entry is accessed the Use tag is incremented. When the entry is replaced the Replaced bit is set So always entries with smaller use tags will be replaced The replaced bit takes care that an entry that has just been replaced is not immediately replaced in the next cycle because it will always have a smaller use tag!

The Broadcast Issue! Shared memory systems interconnected using a BUS, I implemented it as a loop where I invalidate other caches Could also do with event based system. Processor posts an ‘event’ to all caches when it updates an entry. Other caches invalidate their entries on demand based on the events posted.

Future Work Implement matrix multiplication for all policies

References Advanced Computer Architecture and Parallel Processing, Hesham El-Rewini, Mostafa Abd-El-Barr /vivio.htm

Questions / Answers

Thank You !