Presentation is loading. Please wait.

Presentation is loading. Please wait.

SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.

Similar presentations


Presentation on theme: "SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU."— Presentation transcript:

1 SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU

2 SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism

3 SYNAR Systems Networking and Architecture Group Caches Level 1 / Level 2 / Level 3 Instruction/Data or unified

4 SYNAR Systems Networking and Architecture Group Direct-Mapped Cache Line size = 32 bytes Cache eviction

5 SYNAR Systems Networking and Architecture Group Set-Associative Cache 4-way set associative cache The data can go into any of the four locations When the entire set is full, which line should we replace? LRU – least recently used (LRU stack)

6 SYNAR Systems Networking and Architecture Group Cache Hit/Miss Cache hit – the data is found in the cache Cache miss – the data is not in the cache Miss rate: – misses per instruction – misses per cycle – misses per access (also miss ratio) Hit rate: – the opposite

7 SYNAR Systems Networking and Architecture Group Cache Miss Latency How long you have to wait if you miss in the cache Miss in L1  L2 latency (~20 cycles) Miss in L2  memory latency (~300 cycles) (if there is no L3)

8 SYNAR Systems Networking and Architecture Group Writing in Cache Write through – write directly to memory Write back – write to memory later, when the line is evicted

9 SYNAR Systems Networking and Architecture Group Caches on Multiprocessor Systems Bus cache memory cache © Herlihy-Shavit 2007

10 SYNAR Systems Networking and Architecture Group Processor Issues Load Request Bus cache memory cache data © Herlihy-Shavit 2007

11 SYNAR Systems Networking and Architecture Group Another Processor Issues Load Request Bus cache memory cache data Bus I got data data Bus I want data © Herlihy-Shavit 2007

12 SYNAR Systems Networking and Architecture Group memory Bus Processor Modifies Data cache data Now other copies are invalid data © Herlihy-Shavit 2007

13 SYNAR Systems Networking and Architecture Group Send Invalidation Message to Others memory Bus cache data Invalidate ! Bus Other caches lose read permission No need to change now: other caches can provide valid data © Herlihy-Shavit 2007

14 SYNAR Systems Networking and Architecture Group Processor Asks for Data memory Bus cache data Bus I want data data © Herlihy-Shavit 2007

15 SYNAR Systems Networking and Architecture Group Shared Caches Filled on demand No control over cache shares An aggressive thread can grab a large cache share, hurt others Thread 1 Thread 2 Thread 1 Thread 2

16 SYNAR Systems Networking and Architecture Group NUMA Systems Shared L3 Cache Core 0 L1, L2 cache Core 4 L1, L2 cache Core 8 L1, L2 cache Core 12 L1, L2 cache Memory node 0 NUMA Domain 0 Shared L3 Cache Core 2 L1, L2 cache NUMA Domain 2 Shared L3 Cache Core 3 L1, L2 cache Core 7 L1, L2 cache Core 11 L1, L2 cache Core 15 L1, L2 cache NUMA Domain 1 NUMA Domain 3 Core 6 L1, L2 cache Core 10 L1, L2 cache Core 14 L1, L2 cache Shared L3 Cache Core 1 L1, L2 cache Core 5 L1, L2 cache Core 9 L1, L2 cache Core 13 L1, L2 cache MC HT Memory node 2 HT MC Memory node 1 MC HT Memory node 3 HT MC Data T Threads TTTT TTTT TTTT TTTT

17 SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism

18 SYNAR Systems Networking and Architecture Group Branching and CPU Pipeline CPU pipeline

19 SYNAR Systems Networking and Architecture Group Branching Hurts Pipelining

20 SYNAR Systems Networking and Architecture Group Branch Prediction

21 SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism

22 SYNAR Systems Networking and Architecture Group Out-of-order Execution Modern CPUs are super-scalar They can issue more than one instructions per clock cycle If consecutive instructions depend on each other instruction-level parallelism is limited To keep the processor going at full speed, issue instructions out of order

23 SYNAR Systems Networking and Architecture Group Speculative Execution Out-of-order execution is limited to basic blocks To go beyond basic blocks, use speculative execution

24 SYNAR Systems Networking and Architecture Group Outline Caches Branch prediction Out-of-order execution Instruction Level Parallelism

25 SYNAR Systems Networking and Architecture Group Instruction-Level Parallelism Many programs fail to keep processor busy – Code with lots of loads – Code with frequent and unpredictable branches CPU cycles are wasted: power is consumed, no useful work is done Running multiple threads on the chip helps this


Download ppt "SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU."

Similar presentations


Ads by Google