Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.

Similar presentations


Presentation on theme: "Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar."— Presentation transcript:

1 Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar

2 2Cache Coherence Protocols Outline Introduction Introduction Background Information Background Information The cache coherence problem The cache coherence problem Cahce Enforcement Strategies Cahce Enforcement Strategies Consistency models Consistency models Simple Solutions Simple Solutions Hardware Protocols Hardware Protocols Snooping protocols Snooping protocols Directory-based protocols Directory-based protocols Compiler and Software protocols Compiler and Software protocols Future work and conclusions Future work and conclusions

3 3Cache Coherence Protocols The Cache Coherence Problem Caches allow greater performance by storing frequently used data in faster memory Caches allow greater performance by storing frequently used data in faster memory Since all processors share the same address space, it is possible for more than one processor to cache an address (or data item) at a time Since all processors share the same address space, it is possible for more than one processor to cache an address (or data item) at a time If one processor updates the data item without informing the other processor, inconsistencies may result and cause incorrect executions If one processor updates the data item without informing the other processor, inconsistencies may result and cause incorrect executions

4 4Cache Coherence Protocols Cache Coherence Problem

5 5Cache Coherence Protocols Cache Coherence (cont.) For correct execution, coherence must be enforced between the caches For correct execution, coherence must be enforced between the caches Two major factors are: Two major factors are: performance performance implementation cost implementation cost Four primary design issues are: Four primary design issues are: coherence detection strategy coherence detection strategy coherence enforcement strategy coherence enforcement strategy precision of block-sharing information precision of block-sharing information cache block size cache block size

6 6Cache Coherence Protocols Cache Enforcement Strategies A cache enforcement strategy is the mechanism which makes caches consistent A cache enforcement strategy is the mechanism which makes caches consistent write-update (WU) write-update (WU) write-invalidate (WI) write-invalidate (WI) hybrid protocols, competitive-update (CU) hybrid protocols, competitive-update (CU) Performance of WU and WI vary depending on the application and the number of writes Performance of WU and WI vary depending on the application and the number of writes Hybrid protocols switch between WU and WI based on the # of writes to a block Hybrid protocols switch between WU and WI based on the # of writes to a block

7 7Cache Coherence Protocols Consistency Models A consistency model defines how the consistency of data values is maintained A consistency model defines how the consistency of data values is maintained Some consistency models are: Some consistency models are: sequential consistency sequential consistency weak consistency weak consistency release consistency release consistency Weak consistency models are more efficient to implement and require fewer coherence messages Weak consistency models are more efficient to implement and require fewer coherence messages

8 8Cache Coherence Protocols Shared Caches (1) Processors share a single cache, essentially punting the problem. Useful for very small machines. E.g., DPC in the Encore, Alliant FX/8. Problems are limited cache bandwidth and cache interference Benefits are fine-grain sharing and prefetch effects

9 9Cache Coherence Protocols Non-cacheable Items (2) Make shared data non-cacheable Make shared data non-cacheable One of the simplest software solution One of the simplest software solution Also at hardware, make cache locations unreachable Also at hardware, make cache locations unreachable

10 10Cache Coherence Protocols Broadcast Writes (3) Every cache write request is sent to all other caches Every cache write request is sent to all other caches Firstly need to discover whether each cache hold this data Firstly need to discover whether each cache hold this data Other copies are either updated or invalidated Other copies are either updated or invalidated Significant additional memory transactions occur Significant additional memory transactions occur

11 11Cache Coherence Protocols Hardware Protocols Snoop Bus Mechanism Snoop Bus Mechanism Directory Based Methods Directory Based Methods Full Directory Full Directory Limited Directory Limited Directory Chained Directory Chained Directory

12 12Cache Coherence Protocols Snoop Bus Protocol Snooping protocols rely on a shared bus between the processors for coherence Snooping protocols rely on a shared bus between the processors for coherence On a processor write, the write is passed through the cache to main memory on the bus On a processor write, the write is passed through the cache to main memory on the bus Any processor caching the address may update or invalidate its cache entry as appropriate Any processor caching the address may update or invalidate its cache entry as appropriate Snooping protocols do not scale well beyond 32 processors because of the shared bus Snooping protocols do not scale well beyond 32 processors because of the shared bus The choice between WU, WI, and CU is especially important to reduce communication The choice between WU, WI, and CU is especially important to reduce communication

13 13Cache Coherence Protocols MESI (4-state) Invalidation Protocol Each line in the cache can be in one of 4 states Each line in the cache can be in one of 4 states Modifed (exclusive) : only in 1 cache, modified Modifed (exclusive) : only in 1 cache, modified Exclusive (unmodified) : only in 1 cache, unmodified Exclusive (unmodified) : only in 1 cache, unmodified Shared (unmodified) Shared (unmodified) Invalid Invalid

14 14Cache Coherence Protocols MESI State Transition Diagram

15 15Cache Coherence Protocols MESI Example

16 16Cache Coherence Protocols Directory-Based Protocols Directory-based protocols do not rely on a shared bus to exchange coherence information (use point-to-point connections) Directory-based protocols do not rely on a shared bus to exchange coherence information (use point-to-point connections) more scaleable (can have hundreds of processors) more scaleable (can have hundreds of processors) each processor can have its own memory each processor can have its own memory implement weak consistency for efficiency implement weak consistency for efficiency

17 17Cache Coherence Protocols Directory-Based Protocols (cont.) Each node maintains a directory storing cache information and memory information Each node maintains a directory storing cache information and memory information A processor communicates with the directory to access memory A processor communicates with the directory to access memory if a processor requests a non-local memory page, the directory uses its information to find the page if a processor requests a non-local memory page, the directory uses its information to find the page Then, it uses messages to retrieve the page and insure all other processors have consistent info. Then, it uses messages to retrieve the page and insure all other processors have consistent info. Since the directory maintains which processors are caching the page, it only needs to send messages to those processors Since the directory maintains which processors are caching the page, it only needs to send messages to those processors

18 18Cache Coherence Protocols Directory-Based Protocols (cont.) Designing a directory requires defining: Designing a directory requires defining: cache block granularity cache block granularity cache controller design cache controller design directory structure directory structure Cache block granularity is the size of the cache and the size of a cache line Cache block granularity is the size of the cache and the size of a cache line CC-NUMA machines have a separate, smaller cache from main memory CC-NUMA machines have a separate, smaller cache from main memory COMA machines use node’s entire memory as cache for remote pages COMA machines use node’s entire memory as cache for remote pages Block size affects performance (false sharing) Block size affects performance (false sharing)

19 19Cache Coherence Protocols Directory-Based Protocols (cont.) Cache controller is hardware that maintains the directory and processes memory requests Cache controller is hardware that maintains the directory and processes memory requests custom hardware custom hardware programmable protocol processor programmable protocol processor The directory structure is how the cache and memory information is organized The directory structure is how the cache and memory information is organized p+1-bit full directory p+1-bit full directory linked-list directories linked-list directories tagged directories tagged directories

20 20Cache Coherence Protocols Directory Models Full Directory Full Directory Link to all caches for all shared locations Link to all caches for all shared locations Limited Directory Limited Directory To some caches having shared data, n < N To some caches having shared data, n < N Chained (linked)Directory Chained (linked)Directory To one chache, form ths cache to others, single/double link To one chache, form ths cache to others, single/double link

21 21Cache Coherence Protocols Directory Sample (full)

22 22Cache Coherence Protocols Lock-Based Protocols New work that promises to be more scaleable than directory protocols New work that promises to be more scaleable than directory protocols Implements scope consistency which is similar to lazy release consistency Implements scope consistency which is similar to lazy release consistency Coherence information exchanged by reading and writing notices from the lock which protects the shared memory Coherence information exchanged by reading and writing notices from the lock which protects the shared memory Currently, implemented in software similar to DSM, but may move to hardware if performance gains can be realized Currently, implemented in software similar to DSM, but may move to hardware if performance gains can be realized

23 23Cache Coherence Protocols Software Protocols Software protocols enforce consistency with limited hardware support by relying either on the compiler or specialized software handlers Software protocols enforce consistency with limited hardware support by relying either on the compiler or specialized software handlers Similar to distributed shared memory (DSM) systems but at a lower level Similar to distributed shared memory (DSM) systems but at a lower level sharing usually in blocks not pages sharing usually in blocks not pages needs to be more efficient for better performance needs to be more efficient for better performance architecture support for sharing architecture support for sharing

24 24Cache Coherence Protocols Classification of Software Protocols Several criteria distinguish software protocols: Several criteria distinguish software protocols: dynamism - compile-time or run-time analysis dynamism - compile-time or run-time analysis selectivity - level of coherence actions selectivity - level of coherence actions restrictiveness - conservative or as-needed consistency enforcement restrictiveness - conservative or as-needed consistency enforcement adaptivity - can protocol adapt to access patterns adaptivity - can protocol adapt to access patterns granularity - size and structure of coherence data granularity - size and structure of coherence data blocking - program block on which coherence is enforced blocking - program block on which coherence is enforced positioning - position of coherence instructions positioning - position of coherence instructions updating - how memory is updated after a write updating - how memory is updated after a write checking - how incoherence is detected checking - how incoherence is detected

25 25Cache Coherence Protocols Software Coherence with Limited Hardware Support Compiler must generate consistent code as no hardware coherence provided Compiler must generate consistent code as no hardware coherence provided Hardware maintains time tags which are updated on every write Hardware maintains time tags which are updated on every write On a read, compiler generates coherence reads which check time tags to insure data is consistent On a read, compiler generates coherence reads which check time tags to insure data is consistent Relies on the compiler to detect read which may be inconsistent, and the hardware must maintain these time tags Relies on the compiler to detect read which may be inconsistent, and the hardware must maintain these time tags Using tags, it is also possible to perform dynamic self- invalidation of blocks Using tags, it is also possible to perform dynamic self- invalidation of blocks Many techniques based on using these time tags Many techniques based on using these time tags

26 26Cache Coherence Protocols Software Coherence with Limited Hardware Support (cont.) If hardware has no time tags, Petersen and Li developed an algorithm which uses only page translation hardware and page status tables If hardware has no time tags, Petersen and Li developed an algorithm which uses only page translation hardware and page status tables Sharing information is maintained by a software handler at the page-level Sharing information is maintained by a software handler at the page-level On a page access or fault, the software handler checks the sharing information, updates page tables, and performs coherence actions On a page access or fault, the software handler checks the sharing information, updates page tables, and performs coherence actions Slower than hardware as software handlers involve the OS and are on the critical memory access path Slower than hardware as software handlers involve the OS and are on the critical memory access path

27 27Cache Coherence Protocols Enforcing Coherence by Restricting Parallelism Compilers can also guarantee coherence by structuring the language to limit parallelism Compilers can also guarantee coherence by structuring the language to limit parallelism easier to enforce coherence easier to enforce coherence limits the programmer and potential parallelism limits the programmer and potential parallelism simplifies compiler design simplifies compiler design good performance can be achieved with no hardware support good performance can be achieved with no hardware support Parallel language restrictions include: Parallel language restrictions include: doall parallel loops doall parallel loops master/slave processes master/slave processes

28 28Cache Coherence Protocols Optimizing Compilers Optimizing compilers are designed to maintain coherence with limited hardware support without overly restricting the programmer Optimizing compilers are designed to maintain coherence with limited hardware support without overly restricting the programmer rely on detecting data dependencies rely on detecting data dependencies may use synchronization variables (locks, barriers) may use synchronization variables (locks, barriers) can provide the hardware with hints can provide the hardware with hints can detect when coherence is not needed can detect when coherence is not needed may have problems with dynamic sharing may have problems with dynamic sharing offer good performance, but are hard to design offer good performance, but are hard to design

29 29Cache Coherence Protocols Future Work Hardware protocols are well defined, and the directory structure is near optimal Hardware protocols are well defined, and the directory structure is near optimal Cost improvements can be obtained by mass producing cache controller chips Cost improvements can be obtained by mass producing cache controller chips Software protocols are a good area for future research because they are also applicable at higher-levels of sharing (DSM, databases,...) Software protocols are a good area for future research because they are also applicable at higher-levels of sharing (DSM, databases,...) Optimizing compilers need to be improved to detect data dependencies and optimize code for the parallel environment Optimizing compilers need to be improved to detect data dependencies and optimize code for the parallel environment

30 30Cache Coherence Protocols Conclusions Hardware protocols offer the best performance but require high hardware costs Hardware protocols offer the best performance but require high hardware costs Software protocols can be used when there is no hardware support with a slight performance penalty Software protocols can be used when there is no hardware support with a slight performance penalty Optimizing compilers can enforce coherence or provide hints to the hardware Optimizing compilers can enforce coherence or provide hints to the hardware A combination of hardware and compiler optimizations is the best A combination of hardware and compiler optimizations is the best


Download ppt "Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar."

Similar presentations


Ads by Google