Presentation is loading. Please wait.

Presentation is loading. Please wait.

The University of Adelaide, School of Computer Science

Similar presentations


Presentation on theme: "The University of Adelaide, School of Computer Science"— Presentation transcript:

1 The University of Adelaide, School of Computer Science
Computer Architecture A Quantitative Approach, Fifth Edition The University of Adelaide, School of Computer Science 28 April 2017 Chapter 5 Multiprocessors and Thread-Level Parallelism Chapter 2 — Instructions: Language of the Computer

2 The University of Adelaide, School of Computer Science
28 April 2017 Introduction Introduction Thread-Level parallelism Have multiple program counters Uses MIMD model Targeted for tightly-coupled shared-memory multiprocessors Threads may be of same program: parallel processing or different ones : request level parallelism For n (2-32) processors, need n threads produced by software (programmer or OS). Amount of computation assigned to each thread = grain size Chapter 2 — Instructions: Language of the Computer

3 The University of Adelaide, School of Computer Science
28 April 2017 Types Introduction Symmetric multiprocessors (SMP) Usually multicore processors Small number of cores Share single memory with uniform memory latency Distributed shared memory (DSM) Memory distributed among processors Non-uniform memory access/latency (NUMA) Processors connected via direct (switched) and non-direct (multi-hop) interconnection networks Address space is shared by all processors. Chapter 2 — Instructions: Language of the Computer

4 Challenges of parallel processing
Limited parallelism in programs

5 Challenges of parallel processing
2) High communication cost (35 to 50 cycles on a single chip & 100 to 500 between multiple chips)

6 The University of Adelaide, School of Computer Science
28 April 2017 Cache Coherence Copies of shared data are provided in private caches to reduce latency and increase BW. Processors may see different values through their private caches: Centralized Shared-Memory Architectures Chapter 2 — Instructions: Language of the Computer

7 The University of Adelaide, School of Computer Science
28 April 2017 Cache Coherence Coherency All reads by any processor must return the most recently written value Writes to the same location by any two processors are seen in the same order by all processors Cache coherence protocols (hardware based) Directory based (mostly on DSMs & on recent multicores) Sharing status of each block kept in one location called directory in the shared memory. Snooping (mostly on SMPs) Each core tracks sharing status of each block that it contains, by looking at the common bus. Centralized Shared-Memory Architectures Chapter 2 — Instructions: Language of the Computer

8 Snooping Coherence Protocols
The University of Adelaide, School of Computer Science 28 April 2017 Snooping Coherence Protocols Write invalidate On write, invalidate all other copies Use bus itself to serialize Write cannot complete until bus access is obtained Write update On write, update all copies of a shared item. BW hungry, less popular Centralized Shared-Memory Architectures Chapter 2 — Instructions: Language of the Computer

9 Write invalidation protocol
The University of Adelaide, School of Computer Science 28 April 2017 Write invalidation protocol Snooping is done on cache tags. Block valid bit is used for invalidation. Locating an item when a read miss occurs In write-back cache, the updated value must be sent to the requesting processor from the dirty L1 or L2 cache. Cache lines marked as shared/exclusive Only writes to shared lines need an invalidate broadcast After this, the line is marked as exclusive If a copy of data is sent to another processor it is changed to shared. Centralized Shared-Memory Architectures Chapter 2 — Instructions: Language of the Computer

10 The University of Adelaide, School of Computer Science
28 April 2017 ESI or MSI protocol Centralized Shared-Memory Architectures Chapter 2 — Instructions: Language of the Computer

11 Snooping Coherence Protocols
The University of Adelaide, School of Computer Science 28 April 2017 Snooping Coherence Protocols Extensions: MESI: Use exclusive state to indicate clean block in only one cache and modified to indicate a write on a shared block. Prevents needing to invalidate a write to an exclusive block MOESI: Owned state to prevent write-back to memory when a miss is occurred. The block is labeled as shared in other cores and as owned in the owner. The shared memory version is out of date. Limitations: Shared memory access, bus arbitration and coherency control Useful for up to 8 core single chip multiprocessors. Centralized Shared-Memory Architectures Chapter 2 — Instructions: Language of the Computer

12 The University of Adelaide, School of Computer Science
28 April 2017 Performance Coherence influences cache miss rate Coherence misses True sharing misses Write to shared block (transmission of invalidation) Read an invalidated block False sharing misses Read an unmodified word in an invalidated block Performance of Symmetric Shared-Memory Multiprocessors Chapter 2 — Instructions: Language of the Computer

13

14 Performance Study: Commercial Workload
The University of Adelaide, School of Computer Science 28 April 2017 Performance Study: Commercial Workload Performance of Symmetric Shared-Memory Multiprocessors Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

15 Performance Study: Commercial Workload
The University of Adelaide, School of Computer Science 28 April 2017 Performance Study: Commercial Workload Performance of Symmetric Shared-Memory Multiprocessors Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

16 Performance Study: Commercial Workload
The University of Adelaide, School of Computer Science 28 April 2017 Performance Study: Commercial Workload Performance of Symmetric Shared-Memory Multiprocessors Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

17 The University of Adelaide, School of Computer Science
28 April 2017 Directory Protocols Directory keeps track of every block Which caches have each block Dirty status of each block Implement in shared L3 cache Keep bit vector of size = # cores for each block in L3 Not scalable beyond shared L3 Implement in a distributed fashion: Distributed Shared Memory and Directory-Based Coherence Chapter 2 — Instructions: Language of the Computer

18 The University of Adelaide, School of Computer Science
28 April 2017 Directory Protocols For each block, maintain state: Shared One or more nodes have the block cached, value in memory is up-to-date Set of node IDs Uncached Modified Exactly one node has a copy of the cache block, value in memory is out-of-date Owner node ID Directory maintains block states and sends invalidation messages Distributed Shared Memory and Directory-Based Coherence Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

19 Directory Protocols- directory state transition
The University of Adelaide, School of Computer Science 28 April 2017 Directory Protocols- directory state transition Distributed Shared Memory and Directory-Based Coherence Chapter 2 — Instructions: Language of the Computer

20 The University of Adelaide, School of Computer Science
28 April 2017 Directory Protocols For uncached block: Read miss Requesting node is sent the requested data and is made the only sharing node, block is now shared Write miss The requesting node is sent the requested data and becomes the sharing node, block is now exclusive For shared block: The requesting node is sent the requested data from memory, node is added to sharing set The requesting node is sent the value, all nodes in the sharing set are sent invalidate messages, sharing set only contains requesting node, block is now exclusive Distributed Shared Memory and Directory-Based Coherence Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

21 The University of Adelaide, School of Computer Science
28 April 2017 Directory Protocols For exclusive block: Read miss The owner is sent a data fetch message, block becomes shared, owner sends data to the directory, data written back to memory, sharers set contains old owner and requestor Data write back Block becomes uncached, sharer set is empty Write miss Message is sent to old owner to invalidate and send the value to the directory, requestor becomes new owner, block remains exclusive Distributed Shared Memory and Directory-Based Coherence Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

22 Directory Protocols- block state transition
The University of Adelaide, School of Computer Science 28 April 2017 Directory Protocols- block state transition Distributed Shared Memory and Directory-Based Coherence Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 — Instructions: Language of the Computer

23 Putting it all together

24 Core-i7 performance


Download ppt "The University of Adelaide, School of Computer Science"

Similar presentations


Ads by Google