ECE200 – Computer Organization Chapter 9 – Multiprocessors.

ECE200 – Computer Organization Chapter 9 – Multiprocessors

What we’ll cover today Multiprocessor motivation Multiprocessor organizations Shared memory multiprocessors  Cache coherence  Synchronization

Multiprocessor motivation, part 1 Many scientific applications take too long to run on a single processor machine  Modeling of weather patterns, astrophysics, chemical reactions, ocean currents, etc. Many of these are parallel applications which largely consist of loops which operate on independent data Such applications can make efficient use of a multiprocessor machine with each loop iteration running on a different processor and operating on independent data

Multiprocessor motivation, part 2 Many multi-user environments require more compute power than available from a single processor machine  Airline reservation system, department store chain inventory system, file server for a large department, web server for a major corporation, etc. These consist of largely parallel transactions which operate on independent data Such applications can make efficient usage of a multiprocessor machine with each transaction running on a different processor and operating on independent data

Multiprocessor organizations Shared memory multiprocessors  All processors share the same memory address space  Single copy of the OS (although some parts may be parallel)  Relatively easy to program and port sequential code to  Difficult to scale to large numbers of processors  Uniform memory access (UMA) machine block diagram

Multiprocessor organizations Distributed memory multiprocessors  Processors have their own memory address space  Message passing used to access another processor’s memory  Multiple copies of the OS  Usually commodity hardware and network (e.g., Ethernet)  More difficult to program  Easier to scale hardware and more inherently fault resilient

Multiprocessor variants Non-uniform memory access (NUMA) shared memory multiprocessors  All memory can be addressed by all processors, but access to a processor’s own local memory is faster than access to another processor’s remote memory  Looks like a distributed machine, but interconnection network is usually custom-designed switches and/or buses

Multiprocessor variants Distributed shared memory (DSM) multiprocessors  Commodity hardware of a distributed memory multiprocessor, but all processors have the illusion of shared memory  Operating system handles accesses to remote memory “transparently” on behalf of the application  Relieves application developer of the burden of memory management across the network

Multiprocessor variants Shared memory machines connected together over a network (operating as a distributed memory or DSM machine) network controller … network

Shared memory multiprocessors Major design issues  Cache coherence: ensuring that stores to cached data are seen by other processors  Synchronization: the coordination among processors accessing shared data  Memory consistency: definition of when a processor must observe a write from another processor

Cache coherence problem Two writeback caches becoming incoherent CPU 0 cache CPU 1 cache main memory A A (1) CPU 0 reads block A

Cache coherence problem Two writeback caches becoming incoherent CPU 0 cache CPU 1 cache main memory A A (1) CPU 0 reads block A CPU 0 cache CPU 1 cache main memory A A (2) CPU 1 reads block A A

Cache coherence problem Two writeback caches becoming incoherent CPU 0 cache CPU 1 cache main memory A A (1) CPU 0 reads block A CPU 0 cache CPU 1 cache main memory A A (2) CPU 1 reads block A A CPU 0 cache CPU 1 cache main memory A A (3) CPU 0 writes block A A old, out of date copies of block A

Cache coherence protocols Ensures that cached blocks that are written to are observable by all processors Assigns a state field to all cached blocks Defines actions for performing reads and writes to blocks in each state that ensure cache coherence Actions are much more complicated than described here in a real machine with a split transaction bus

MESI cache coherence protocol Commonly used (or variant thereof) in shared memory multiprocessors Idea is to ensure that when a cache wants to write to a cache block that other remote caches invalidate their copies first Each cache block is in one of four states (2 bits stored with each cache block)  Invalid: contents are not valid  Shared: other processor caches may have the same copy; main memory has the same copy  Exclusive: no other processor cache has a copy; main memory has the same copy  Modified: no other processor cache has a copy; main memory has an old copy

MESI cache coherence protocol Actions on a load that results in cache hit  Local cache actions Read block  Remote cache actions None Actions on a load that results in cache miss  Local cache actions Request block from bus If not in a remote cache, set state to Exclusive If also in a remote cache, set state to Shared  Remote cache actions Look up cache tags to see if the block is present If so, signal the local cache that we have a copy, provide it if it is in state Modified, and change the state of our copy to Shared

MESI cache coherence protocol Actions on a store that results in cache hit  Local cache actions Check state of block If Shared, send an Invalidation bus command to all remote caches Write the block and change the state to Modified  Remote cache actions Upon receipt of an Invalidation command on the bus, look up cache tags to see if the block is present If so, change the state of the block to Invalid Actions on a store that results in cache miss  Local cache actions Simultaneously request block from bus and send an Invalidation command After block received, write the block and set the state to Modified  Remote cache actions Look up cache tags to see if the block is present If so, signal the local cache that we have a copy, provide it if it is in state Modified, and change the state of our copy to Invalid

Cache coherence problem revisited CPU 0 cache CPU 1 cache main memory A A (1) CPU 0 reads block A Exclusive

Cache coherence problem revisited CPU 0 cache CPU 1 cache main memory A A (1) CPU 0 reads block A CPU 0 cache CPU 1 cache main memory A A (2) CPU 1 reads block A A Shared Exclusive

Cache coherence problem revisited CPU 0 cache CPU 1 cache main memory A A (1) CPU 0 reads block A CPU 0 cache CPU 1 cache main memory A A (2) CPU 1 reads block A A CPU 0 cache CPU 1 cache main memory A A (3) CPU 0 cache invalidates remote block A A Shared Invalidate command InvalidShared Exclusive

Cache coherence problem revisited CPU 0 cache CPU 1 cache main memory A A (1) CPU 0 reads block A CPU 0 cache CPU 1 cache main memory A A (2) CPU 1 reads block A A CPU 0 cache CPU 1 cache main memory A A (3) CPU 0 cache invalidates remote block A A Shared Invalidate command InvalidShared CPU 0 cache CPU 1 cache main memory A A (4) CPU 0 writes block A A Invalid Modified Exclusive

Synchronization For parallel programs to share data, we must make sure that accesses to a given memory location are ordered  Example: database of available inventory at a department store simultaneously accessed from different store computers; only one computer must “win the race” to reserve a particular item Solution  Architecture defines a special atomic swap instruction in which a memory location is tested for 0, and if so, is set to 1  Software associates a lock variable with each data that needs to be ordered (e.g., particular class of merchandise) and uses the atomic swap instruction to try to set it  Software acquires the lock before modifying the associated data (e.g., reserving the merchandise)  Software releases the lock by setting it to 0 when done

Synchronization flowchart “spinning”

Synchronization and coherence example

Questions?

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Similar presentations

Presentation on theme: "ECE200 – Computer Organization Chapter 9 – Multiprocessors."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Similar presentations

Presentation on theme: "ECE200 – Computer Organization Chapter 9 – Multiprocessors."— Presentation transcript:

Similar presentations

About project

Feedback