Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE.

Similar presentations


Presentation on theme: "Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE."— Presentation transcript:

1 Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE fvohra@mail.smu.edu Graduate Student, Southern Methodist University.

2 Goals Create a pure software cache system as a test bed. Implement five cache write policies for maintaining coherency on the test bed. Perform experiments and test different scenarios Gather statistics, measure and make conclusions.

3 Cache Basics Cache is a small store placed between a processor and its main memory in a shared memory system Faster Volatile store Exploits locality of reference. Spatial locality: Neighboring locations in a store have a higher chance of being accessed. Temporal locality: Once accessed, a location in a store will be accessed repeatedly over time. Hit: An event when data to be read is already in the cache. Large number of hits give better throughput.

4 Issues Multiple copies of a datum exist. Keeping copies of cached items in sync Sync’ing should not affect performance or throughput of the system.

5 Project Details Implement various cache policies. Tinker with tunables to understand effects on the system. Measure performance/effectiveness of the policies NOT the algorithms or implementation. Software written in C on Windows Operating System.

6 Model of the System Input: I/O Load Policy Parameter Diagnostic Output: Cache and Main Mem Dumps Policies Main Memory Caches Processing Units

7 Inputs and Outputs The input is given through a file which contains: –I/O type (0=Read, 1=Write) –I/O address. –Processor to perform I/O on. –The data to be written for the basic system where no computations are performed A parser converts input to actual I/O. Policies can be specified by the user. Observe dumps of cache/main memory to verify functionality.

8 Assumptions and Simplifications Inputs are small sequences of reads and writes. Use small caches to create maximum activity. Memory and cache locations are byte wide. All caches have the same write policy configured at any point in time. Each cache entry has the following structure: DATAADDRSTATUS

9 Policies Implemented Write Through – Write Invalidate Write Back – Write Invalidate Write Once Write Update – Partial Write Through Write Back – Write Update

10 Policy 1: WRITE THROUGH WRITE INVALIDATE STATES VALID Copy consistent with main memory INVALID Copy inconsistent with main memory

11 READWRITE Policy 1: WRITE THROUGH WRITE INVALIDATE HITMISSHITMISS Read the copy found in cache. Done! Any other cache has a valid copy No other cache has. Go to global memory Replacement is required if no space to accommodate incoming new copy. Since cache is always consistent with main memory. No write back is required. STATUS=VALID Write over the copy found in cache. Update global memory and invalidate other Caches. STATUS=VALID Any other cache has a valid copy No other cache has. Go to global memory Write new data over this copy. Update global memory. Invalidate others. Replacement may be needed if no space. No write back. STATUS=VALID

12 Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

13 Policy 2: WRITE BACK WRITE INVALIDATE STATES RO-SHARED Multiple copies consistent with main memory INVALID Copy inconsistent with main memory RW-EXCLUSIVE Only one copy inconsistent with main memory (Ownership)

14 READWRITE Policy 2: WRITE BACK WRITE INVALIDATE HITMISSHIT MISS Read the copy found in cache. Done! RW copy in no other cache. Get a copy from global RW copy in another cache Get it. Update global memory. If Status=RW Write over it. STATUS=RW Other has RWNo other cache has RW. Go to global memory. Write new data. Invalidate others STATUS=RW If entry to be replaced=RW, write back to global. If entry to be replaced=I/RO No write back. STATUS=RO In both. Write over it. STATUS=RO In both caches if got from another cache. SPACE?? n y If Status=RO Write over it. Invalidate others. STATUS=RW Copy into own. Invalidate others. Write new data. STATUS=RW SPACE?? If no space, Replace. If copy to be replaced = RW, write back to global. Otherwise simply write over it. No write back. STATUS=RW

15 Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

16 Policy 3: WRITE ONCE STATES RESERVED Written once consistent with main memory VALID Copy Consistent with main memory DIRTY Written more than once. Inconsistent With main memory INVALID Copy Consistent with main memory

17 READWRITE Policy 3: WRITE ONCE HITMISSHIT MISS Read the copy found in cache. Done! DIRTY copy in no other cache. Get a copy from global DIRTY copy in another cache Get it. Update global memory. If Status=D/RES Write over it. STATUS=D Other has DIRTYNo other cache has DIRTY. Go to global memory. Write new data. Invalidate others If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/RES No write back. STATUS=VALID In both. Write over it. STATUS =VALID In both caches if got from another cache. SPACE?? n y If Status=VALID Write over it. Invalidate others. Update global STATUS=RES Copy into own. Invalidate others. Write new data. SPACE?? If no space, Replace. If copy to be replaced = DIRTY, write back to global. Otherwise simply write over it. No write back. STATUS=DIRTY

18 Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

19 Policy 4: WRITE UPDATE PARTIAL WRITE THROUGH STATES SHARED Multiple copies consistent with main memory DIRTY Only one copy inconsistent with main memory (Ownership) VALID-EXCLUSIVE Only one copy consistent with main memory

20 READ Policy 4: WRITE UPDATE ‘PARTIAL’ WRITE THROUGH HITMISS Read the copy found in cache. Done! No other cache has a copy. Get a copy from global DIRTY copy in another cache Get it. Update global. If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=VALX Write over it. STATUS =VALX SPACE?? n y VALX/SHARE copy in another cache Get it. If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=SHARE In both. Write over it. STATUS=SHARE In both caches. n y

21 WRITE Policy 4: Contd… HITMISS Copy=D/VALX Write locally Copy=SHARE Write over Update all sharing caches. Update global. STATUS=SHARE Another cache has a copy. Get it Write over Update all caches Update global No other cache has a copy. Get it from global memory. Write over it. SPACE?? If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=SHARE Write over it. STATUS=SHARE n y SPACE?? If entry to be replaced=DIRTY, write back to global. If entry to be replaced=V/SHARE No write back. STATUS=DIRTY Write over it. STATUS=DIRTY n y

22 Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

23 Policy 5: WRITE UPDATE WRITE BACK STATES SHARED-CLEAN Multiple shared Copies, could be Consistent with Main memory. (No ownership) VALID-EX Only one copy Consistent with main memory SHARED-DIRTY Multiple shared Copies, last one to be modified (Ownership) DIRTY Unshared and updated Inconsistent With main memory

24 READ Policy 5: WRITE UPDATE WRITE BACK HITMISS Read the copy found in cache. Done! No other cache has a copy. Get a copy from global DIRTY/SD copy in another cache Get it. If entry to be replaced=D/SD, write back to global. If entry to be replaced=VALX/SC No write back. STATUS=VALX Write over it. STATUS =VALX SPACE?? n y VALX/SC copy in another cache Get it. If entry to be replaced=D/SD write back to global. If entry to be replaced=VALX/SC No write back. Supplying cache STATUS=SD Taking cache STATUS=SC Write over it. Supplying cache STATUS=SD Taking cache STATUS=SC n y SPACE?? If entry to be replaced=D/SD, write back to global. If entry to be replaced=VALX/SC No write back. STATUS=SC In both. Write over it. STATUS=SC In both caches. n y

25 WRITE Policy 5: contd… HITMISS Copy=D/VALX Write locally STATUS=DIRTY Copy=SC/SD Write over Update all sharing caches. STATUS (own)=SD STATUD (others)=SC Another cache has a copy. Get it Write over. Update all caches No other cache has a copy. Get it from global memory. Write over it. SPACE?? If entry to be replaced=D/SD, write back to global. If VALX/SC No write back. Supplying cache STATUS=SC Taking cache STATUS=SD Write over it. Supplying cache STATUS=SC Taking cache STATUS=SD n y SPACE?? If entry to be replaced=D/SD, write back to global. If entry to be replaced=VALX/SC No write back. STATUS=DIRTY Write over it. STATUS=DIRTY n y

26 Results Keep I/O load constant. Vary cache size. Measure cache hits and main memory accesses.

27 A Practical Experiment: Matrix Multiplication 3 x 3 matrix data from input file to main memory Start with empty caches. Matrices multiplied by reading values from main memory. Results written to main memory. Policy used is Write Through - Write Invalidate Three processor/cache sets. Each processor computes three elements of each row. Each cache has only 7 locations, 6 inputs and 1 result. Lot of inter-cache exchange Replacements abound due to small cache

28 Logic = Processor 0 Replace

29 Processor 0 As it is seen most of the times each processor can find what it wants in another cache! Processor 2 Processor 1

30 Replacement Logic Each entry also carries a Use tag and a Replaced bit. When the entry is accessed the Use tag is incremented. When the entry is replaced the Replaced bit is set So always entries with smaller use tags will be replaced The replaced bit takes care that an entry that has just been replaced is not immediately replaced in the next cycle because it will always have a smaller use tag!

31 The Broadcast Issue! Shared memory systems interconnected using a BUS, I implemented it as a loop where I invalidate other caches Could also do with event based system. Processor posts an ‘event’ to all caches when it updates an entry. Other caches invalidate their entries on demand based on the events posted.

32 Future Work Implement matrix multiplication for all policies

33 References Advanced Computer Architecture and Parallel Processing, Hesham El-Rewini, Mostafa Abd-El-Barr https://www.cs.tcd.ie/Jeremy.Jones/vivio /vivio.htm

34 Questions / Answers

35 Thank You !


Download ppt "Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE."

Similar presentations


Ads by Google