Lecture 21: Memory Hierarchy

Slides:

Advertisements

Similar presentations

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

Advertisements

Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.

1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,

1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)

1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )

1 Lecture 14: Cache Innovations and DRAM Today: cache access basics and innovations, DRAM (Sections )

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

1 Lecture 13: Cache Innovations Today: cache access basics and innovations, DRAM (Sections )

1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)

Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.

Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.

The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.

CMSC 611: Advanced Computer Architecture

CSE 351 Section 9 3/1/12.

Cache Memory and Performance

Multilevel Memories (Improving performance using alittle “cash”)

Basic Performance Parameters in Computer Architecture:

CS 704 Advanced Computer Architecture

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

The Hardware/Software Interface CSE351 Winter 2013

Lecture: Cache Hierarchies

Cache Memory Presentation I

Consider a Direct Mapped Cache with 4 word blocks

Lecture: Large Caches, Virtual Memory

Lecture: Cache Hierarchies

Lecture 23: Cache, Memory, Security

Lecture 21: Memory Hierarchy

Rose Liu Electrical Engineering and Computer Sciences

Lecture 23: Cache, Memory, Virtual Memory

Lecture 08: Memory Hierarchy Cache Performance

Lecture 22: Cache Hierarchies, Memory

Lecture: Cache Innovations, Virtual Memory

Lecture 22: Cache Hierarchies, Memory

CPE 631 Lecture 05: Cache Design

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Performance metrics for caches

Performance metrics for caches

Lecture 24: Memory, VM, Multiproc

10/16: Lecture Topics Memory problem Memory Solution: Caches Locality

CDA 5155 Caches.

Adapted from slides by Sally McKee Cornell University

CMSC 611: Advanced Computer Architecture

Performance metrics for caches

Lecture 20: OOO, Memory Hierarchy

CS 704 Advanced Computer Architecture

EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007

Lecture 20: OOO, Memory Hierarchy

/ Computer Architecture and Design

Lecture 22: Cache Hierarchies, Memory

Lecture 11: Cache Hierarchies

CS 3410, Spring 2014 Computer Science Cornell University

Lecture 21: Memory Hierarchy

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Performance metrics for caches

Cache - Optimization.

Principle of Locality: Memory Hierarchies

Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )

Lecture 14: Cache Performance

Performance metrics for caches

10/18: Lecture Topics Using spatial locality

Presentation transcript:

Lecture 21: Memory Hierarchy Today’s topics: Cache organization Cache hits/misses

Cache Hierarchies Data and instructions are stored on DRAM chips – DRAM is a technology that has high bit density, but relatively poor latency – an access to data in memory can take as many as 300 cycles today! Hence, some data is stored on the processor in a structure called the cache – caches employ SRAM technology, which is faster, but has lower bit density Internet browsers also cache web pages – same concept

Memory Hierarchy As you go further, capacity and latency increase Disk 80 GB 10M cycles Memory 1GB 300 cycles L2 cache 2MB 15 cycles L1 data or instruction Cache 32KB 2 cycles Registers 1KB 1 cycle

Locality Why do caches work? Temporal locality: if you used some data recently, you will likely use it again Spatial locality: if you used some data recently, you will likely access its neighbors No hierarchy: average access time for data = 300 cycles 32KB 1-cycle L1 cache that has a hit rate of 95%: average access time = 0.95 x 1 + 0.05 x (301) = 16 cycles

a unique cache location. Accessing the Cache Byte address 101000 Offset 8-byte words 8 words: 3 index bits Direct-mapped cache: each address maps to a unique cache location. Sets Data array

The Tag Array Byte address 101000 Tag 8-byte words Compare Direct-mapped cache: each address maps to a unique address Tag array Data array

Example Access Pattern Byte address Assume that addresses are 8 bits long How many of the following address requests are hits/misses? 4, 7, 10, 13, 16, 68, 73, 78, 83, 88, 4, 7, 10… 101000 Tag 8-byte words Compare Direct-mapped cache: each address maps to a unique address Tag array Data array

Increasing Line Size Byte address A large cache line size  smaller tag array, fewer misses because of spatial locality 10100000 32-byte cache line size or block size Tag Offset Tag array Data array

Associativity Byte address Set associativity  fewer conflicts; wasted power because multiple data and tags are read 10100000 Tag Way-1 Way-2 Tag array Data array Compare

How many offset/index/tag bits if the cache has Associativity How many offset/index/tag bits if the cache has 64 sets, each set has 64 bytes, 4 ways Byte address 10100000 Tag Way-1 Way-2 Tag array Data array Compare

Example 32 KB 4-way set-associative data cache array with 32 byte line sizes How many sets? How many index bits, offset bits, tag bits? How large is the tag array?

Cache Misses On a write miss, you may either choose to bring the block into the cache (write-allocate) or not (write-no-allocate) On a read miss, you always bring the block in (spatial and temporal locality) – but which block do you replace? no choice for a direct-mapped cache randomly pick one of the ways to replace replace the way that was least-recently used (LRU) FIFO replacement (round-robin)

Writes When you write into a block, do you also update the copy in L2? write-through: every write to L1  write to L2 write-back: mark the block as dirty, when the block gets replaced from L1, write it to L2 Writeback coalesces multiple writes to an L1 block into one L2 write Writethrough simplifies coherency protocols in a multiprocessor system as the L2 always has a current copy of data

Types of Cache Misses Compulsory misses: happens the first time a memory word is accessed – the misses for an infinite cache Capacity misses: happens because the program touched many other words before re-touching the same word – the misses for a fully-associative cache Conflict misses: happens because two words map to the same location in the cache – the misses generated while moving from a fully-associative to a direct-mapped cache

Title Bullet