Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

Lecture 19: Cache Basics Today’s topics: Out-of-order execution

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.

CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.

Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches P & H Chapter 5.1, 5.2 (except writes)

Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University P & H Chapter

Caches Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H 5.2 (writes), 5.3, 5.5.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Caches 2 P & H Chapter 5.2 (writes), 5.3, 5.5.

Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Virtual Memory 2 P & H Chapter

1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.

The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.

The Memory Hierarchy II CPSC 321 Andreas Klappenecker.

331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )

331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.

1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University P & H Chapter 5.2-3, 5.5.

Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.17.

Systems I Locality and Caching

Caches Han Wang CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

Lecture 19: Virtual Memory

Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.

CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.

Caches Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /

CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.

CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and

3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems

1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.

The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

Computer Organization & Programming

Computer Organization CS224 Fall 2012 Lessons 45 & 46.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.

CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.

Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.

1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.

Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.

1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.

Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)

Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,

Improving Memory Access 2/3 The Cache and Virtual Memory

SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.

1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Deniz Altinbuken CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: (except writes) Caches and Memory.

COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.

Lecture 21: Memory Hierarchy

Caches (Writing) Hakim Weatherspoon CS 3410, Spring 2012

Adapted from slides by Sally McKee Cornell University

Lecture 22: Cache Hierarchies, Memory

CS 3410, Spring 2014 Computer Science Cornell University

Cache - Optimization.

Sarah Diesburg Operating Systems CS 3430

10/18: Lecture Topics Using spatial locality

Presentation transcript:

Multilevel Memory Caches Prof. Sirer CS 316 Cornell University

Storage Hierarchy TechnologyCapacityCost/GBLatency Tape1 TB$.17100s Disk300 GB$.344ms DRAM4GB$52020ns SRAM off512KB$ ns SRAM on16 KB???2ns Capacity and latency are closely coupled, cost is inversely proportional How do we create the illusion of large and fast memory? Tape Disk DRAM SRAM off chip SRAM on chip

Memory Hierarchy Principle: Hide latency using small, fast memories called caches Caches exploit locality Temporal locality: If a memory location is referenced, it is likely to be referenced again in the near future Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future

Cache Lookups (Read) Look at address issued by processor, search cache tags to see if that block is in the cache Hit: Block is in the cache, return requested data Miss: Block is not in the cache, read line from memory, evict an existing line from the cache, place new line in cache, return requested data

Cache Organization Cache has to be fast and small Gain speed by performing lookups in parallel, requires die real estate Reduce hardware required by limiting where in the cache a block might be placed Three common designs Fully associative: Block can be anywhere in the cache Direct mapped: Block can only be in one line in the cache Set-associative: Block can be in a few (2 to 8) places in the cache

Tags and Offsets Cache block size determines cache organization 31 Virtual Address 0 31 Tag 54 Offset 0 Block

Fully Associative Cache Offset Tag VTagBlock = = line select word/byte select hit encode

Direct Mapped Cache Offset Index Tag VTagBlock =

2-Way Set-Associative Cache Offset Index Tag VTagBlock = VTagBlock =

Valid Bits Valid bits indicate whether cache line contains an up-to-date copy of the values in memory Must be 1 for a hit Reset to 0 on power up An item can be removed from the cache by setting its valid bit to 0

Eviction Which cache line should be evicted from the cache to make room for a new line? Direct-mapped  no choice, must evict line selected by index Associative caches  random: select one of the lines at random  round-robin: similar to random  FIFO: replace oldest line  LRU: replace line that has not been used in the longest time

Cache Writes No-Write writes invalidate the cache and go to memory Write-Through writes go to main memory and cache Write-Back write cache, write main memory only when block is evicted CPU Cache SRAM Memory DRAM addr data

Dirty Bits and Write-Back Buffers Dirty bits indicate which lines have been written Dirty bits enable the cache to handle multiple writes to the same cache line without having to go to memory Write-back buffer A queue where dirty lines are placed Items added to the end as dirty lines are evicted from the cache Items removed from the front as memory writes are completed TagData Byte 0, Byte 1 … Byte N Line V D

Misses Three types of misses Cold  The line is being referenced for the first time Capacity  The line was evicted because the cache was not large enough Conflict  The line was evicted because of another access whose index conflicted

Cache Design Need to determine parameters Block size Number of ways Eviction policy Write policy Separate I-cache from D-cache

Virtual vs. Physical Caches L1 (on-chip) caches are typically virtual L2 (off-chip) caches are typically physical CPU Cache SRAM Memory DRAM addr data MMU Cache SRAM MMU CPU Memory DRAM addr data Cache works on physical addresses Cache works on virtual addresses

Cache Conscious Programming Speed up this program int a[NCOL][NROW]; int sum = 0; for(i = 0; i < NROW; ++i) for(j = 0; j < NCOL; ++j) sum += a[j][i];

Cache Conscious Programming Every access is a cache miss! int a[NCOL][NROW]; int sum = 0; for(j = 0; j < NCOL; ++j) for(i = 0; i < NROW; ++i) sum += a[j][i];

Cache Conscious Programming Same program, trivial transformation, 3 out of four accesses hit in the cache int a[NCOL][NROW]; int sum = 0; for(i = 0; i < NROW; ++i) for(j = 0; j < NCOL; ++j) sum += a[j][i];