Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 2 (and Appendix B) Memory Hierarchy Design Computer Architecture A Quantitative Approach,
The Lord of the Cache Project 3. Caches Three common cache designs: Direct-Mapped store in exactly one cache line Fully Associative store in any cache.
ECE/CS 552: Cache Performance Instructor: Mikko H Lipasti Fall 2010 University of Wisconsin-Madison Lecture notes based on notes by Mark Hill Updated by.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
February 9, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
Memory Chapter 7 Cache Memories.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
2/27/2002CSE Cache II Caches, part II CPU On-chip cache Off-chip cache DRAM memory Disk memory.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
Caches Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University See P&H 5.1, 5.2 (except writes)
ECE Dept., University of Toronto
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Chapter Twelve Memory Organization
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Computer Organization CS224 Fall 2012 Lessons 39 & 40.
SOFTENG 363 Computer Architecture Cache John Morris ECE/CS, The University of Auckland Iolanthe I at 13 knots on Cockburn Sound, WA.
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/ Computer Science, The University of Auckland Iolanthe at 13 knots.
處理器設計與實作 CPU LAB for Computer Organization 1. LAB 7: MIPS Set-Associative D- cache.
Realistic Memories and Caches
CSE 351 Section 9 3/1/12.
6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…
Realistic Memories and Caches
Multilevel Memories (Improving performance using alittle “cash”)
Morgan Kaufmann Publishers Memory & Cache
ECE 445 – Computer Organization
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Realistic Memories and Caches
Realistic Memories and Caches
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 22: Cache Hierarchies, Memory
CSC3050 – Computer Architecture
Lecture 21: Memory Hierarchy
Cache - Optimization.
Cache Memory Rabi Mahapatra
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Presentation transcript:

Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology 28, 2015

Multistage Pipeline PC Inst Memory Decode Register File Execute Data Memory d2e redirect fEpoch eEpoch nap e2c scoreboard The use of magic memories (combinational reads) makes such design unrealistic 28, 2015

Magic Memory Model Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) MAGIC RAM ReadData WriteData Address WriteEnable Clock In a real DRAM the data will be available several cycles after the address is supplied 28, 2015

Memory Hierarchy size: RegFile << SRAM << DRAM latency:RegFile << SRAM << DRAM bandwidth:on-chip >> off-chip On a data access: hit (data  fast memory)  low latency access miss (data  fast memory)  long latency access (DRAM) Small, Fast Memory SRAM CPU RegFile Big, Slow Memory DRAM holds frequently used data why? 28, 2015

Inside a Cache Cache Processor Main Memory Address Data copy of main mem locations 100, 101,... Data Block Line = Data Byte Data Byte Data Byte How many bits are needed for the tag? Enough to uniquely identify the block Address Tag 28, 2015

Cache Read Search cache tags to find match for the processor generated address Found in cache a.k.a. hit Return copy of data from cache Not in cache a.k.a. miss Read block of data from Main Memory – may require writing back a cache line Wait … Return data to processor and update cache Which line do we replace? 28, 2015

Write behavior On a write hit Write-through: write to both cache and the next level memory write-back: write only to cache and update the next level memory when line is evacuated On a write miss Allocate – because of multi-word lines we first fetch the line, and then update a word in it No allocate – word modified in memory 28, 2015

Cache Line Size A cache line usually holds more than one word Reduces the number of tags and the tag size needed to identify memory locations Spatial locality: Experience shows that if address x is referenced then addresses x+1, x+2 etc. are very likely to be referenced in a short time window  consider instruction streams, array and record accesses Communication systems (e.g., bus) are often more efficient in transporting larger data sets 28, 2015

Types of misses Compulsory misses (cold start) First time data is referenced Become insignificant if billions of instructions are run Capacity misses Working set is larger than cache size Solution: increase cache size Conflict misses Usually multiple memory locations are mapped to the same cache location to simplify implementations Thus it is possible that the designated cache location is full while there are empty locations in the cache. Solution: Set-Associative Caches 28, 2015

Internal Cache Organization Cache designs restrict where in cache a particular address can reside Direct mapped: An address can reside in exactly one location in the cache. The cache location is typically determined by the lowest order address bits n-way Set associative: An address can reside in any of the a set of n locations in the cache. The set is typically determine by the lowest order address bits 28, 2015

Direct-Mapped Cache The simplest implementation Tag Data Block V = Offset TagIndex t k b t HIT Data Word or Byte 2 k lines Block number Block offset What is a bad reference pattern? Strided = size of cache req address 28, 2015

Direct Map Address Selection higher-order vs. lower-order address bits Tag Data Block V = Offset Index t k b t HIT Data Word or Byte 2 k lines Tag Why higher-order bits as tag may be undesirable? Spatially local blocks conflict 28, 2015

Reduce Conflict Misses Memory time = Hit time + Prob(miss) * Miss penalty Associativity: Reduce conflict misses by allowing blocks to go to several sets in cache 2-way set associative: each block can be mapped to one of 2 cache sets Fully associative: each block can be mapped to any cache frame 28, 2015

2-Way Set-Associative Cache 28, 2015

Replacement Policy In order to bring in a new cache line, usually another cache line has to be thrown out. Which one? No choice in replacement if the cache is direct mapped Replacement policy for set-associative caches One that is not dirty, i.e., has not been modified  In I-cache all lines are clean  In D-cache if a dirty line has to be thrown out then it must be written back first Least recently used? Most recently used? Random? How much is performance affected by the choice? Difficult to know without benchmarks and quantitative measurements 28, 2015

Blocking vs. Non-Blocking cache Blocking cache: At most one outstanding miss Cache must wait for memory to respond Cache does not accept requests in the meantime Non-blocking cache: Multiple outstanding misses Cache can continue to process requests while waiting for memory to respond to misses We will first design a write-back, Write-miss allocate, Direct-mapped, blocking cache 28, 2015