Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Lecture 12 Reduce Miss Penalty and Hit Time
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Cache III Steve Ko Computer Sciences and Engineering University at Buffalo.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
On-Chip Cache Analysis A Parameterized Cache Implementation for a System-on-Chip RISC CPU.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
1 Lecture 11: SMT and Caching Basics Today: SMT, cache access basics (Sections 3.5, 5.1)
Cache Memory Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
February 9, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 7 - Memory Hierarchy-II Krste Asanovic Electrical Engineering and.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University See P&H Chapter: , 5.8, 5.10, 5.15; Also, 5.13 & 5.17.
Systems I Locality and Caching
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 3 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 19: Cache Operation & Design Spring 2009.
Memory/Storage Architecture Lab Computer Architecture Memory Hierarchy.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Chapter Twelve Memory Organization
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
Realistic Memories and Caches – Part II Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 2, 2012L14-1.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
Caches-2 Constructive Computer Architecture Arvind
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Realistic Memories and Caches
6.175: Constructive Computer Architecture Tutorial 5 Epochs, Debugging, and Caches Quan Nguyen (Troubled by the two biggest problems in computer science…
Realistic Memories and Caches
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Caches-2 Constructive Computer Architecture Arvind
Lecture 21: Memory Hierarchy
Lecture 21: Memory Hierarchy
Realistic Memories and Caches
Realistic Memories and Caches
Caches-2 Constructive Computer Architecture Arvind
Caches and store buffers
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 22: Cache Hierarchies, Memory
CS 3410, Spring 2014 Computer Science Cornell University
Lecture 21: Memory Hierarchy
Lecture 13: Cache Basics Topics: terminology, cache organization (Sections )
Caches-2 Constructive Computer Architecture Arvind
Presentation transcript:

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Three-Stage SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 fr Epoch wbr stall? The use of magic memories makes this design unrealistic March 21, 2012 L13-2http://csg.csail.mit.edu/6.S078

A Simple Memory Model Reads and writes are always completed in one cycle a Read can be done any time (i.e. combinational) If enabled, a Write is performed at the rising clock edge (the write address and data must be stable at the clock edge) MAGIC RAM ReadData WriteData Address WriteEnable Clock In a real DRAM the data will be available several cycles after the address is supplied March 21, 2012 L13-3http://csg.csail.mit.edu/6.S078

Memory Hierarchy size: RegFile << SRAM << DRAM why? latency:RegFile << SRAM << DRAM why? bandwidth:on-chip >> off-chip why? On a data access: hit (data  fast memory)  low latency access miss (data  fast memory)  long latency access (DRAM) Small, Fast Memory SRAM CPU RegFile Big, Slow Memory DRAM AB holds frequently used data March 21, 2012 L13-4http://csg.csail.mit.edu/6.S078

Inside a Cache CACHE Processor Main Memory Address Data copy of main memory location 100 copy of main memory location 101 Address Tag Line or data block Data Byte Data Byte Data Byte How many bits are needed in tag? Enough to uniquely identify block March 21, 2012 L13-5http://csg.csail.mit.edu/6.S078

Cache Algorithm (Read) Look at Processor Address, search cache tags to find match. Then either Found in cache a.k.a. HIT Return copy of data from cache Not in cache a.k.a. MISS Read block of data from Main Memory – may require writing back a cache line Wait … Return data to processor and update cache Which line do we replace? Replacement policy March 21, 2012 L13-6http://csg.csail.mit.edu/6.S078

Write behavior On a write hit Write-through: write to both cache and the next level memory Writeback: write only to cache and update the next level memory when line is evacuated On a write miss Allocate – because of multi-word lines we first fetch the line, and then update a word in it Not allocate – word modified in memory We will design a writeback, write-allocate cache March 21, 2012 L13-7http://csg.csail.mit.edu/6.S078

Blocking vs. Non-Blocking cache Blocking cache: 1 outstanding miss Cache must wait for memory to respond Blocks in the meantime Non-blocking cache: N outstanding misses Cache can continue to process requests while waiting for memory to respond to N misses March 21, 2012 L13-8http://csg.csail.mit.edu/6.S078 We will design a non-blocking, writeback, write-allocate cache

What do caches look like External interface processor side memory (DRAM) side Internal organization Direct mapped n-way set-associative Multi-level next lecture: Incorporating caches in processor pipelines March 21, 2012 L13-9http://csg.csail.mit.edu/6.S078

Data Cache – Interface (0,n) interface DataCache; method ActionValue#(Bool) req(MemReq r); method ActionValue#(Maybe#(Data)) resp; endinterface cache req accepted resp Processor DRAM hit wire missFifo writeback wire ReqRespMem Resp method should be invoked every cycle to not drop values Denotes if the request is accepted this cycle or not March 21, 2012 L13-10http://csg.csail.mit.edu/6.S078

ReqRespMem is an interface passed to DataCache interface ReqRespMem; method Bool reqNotFull; method Action reqEnq(MemReq req); method Bool respNotEmpty; method Action respDeq; method MemResp respFirst; endinterface March 21, 2012 L13-11http://csg.csail.mit.edu/6.S078

Interface dynamics Cache hits are combinational Cache misses are passed onto next level of memory, and can take arbitrary number of cycles to get processed Cache requests (misses) will not be accepted if it triggers memory requests that cannot be accepted by memory One request per cycle to memory March 21, 2012 L13-12http://csg.csail.mit.edu/6.S078

Direct mapped caches March 21, 2012L

Direct-Mapped Cache Tag Data Block V, D = Offset TagIndex t k b t HIT Data Word or Byte 2 k lines Block number Block offset What is a bad reference pattern? Strided = size of cache req address March 21, 2012 L13-14http://csg.csail.mit.edu/6.S078

Direct Map Address Selection higher-order vs. lower-order address bits Tag Data Block V = Offset Index t k b t HIT Data Word or Byte 2 k lines Tag  Why might this be undesirable?  Spatially local blocks conflict March 21, 2012 L13-15http://csg.csail.mit.edu/6.S078

Data Cache – code structure module mkDataCache#(ReqRespMem mem)(DataCache); // state declarations RegFile#(Index, Data) dataArray <- mkRegFileFull; RegFile#(Index, Tag) tagArray <- mkRegFileFull; Vector#(Rows, Reg#(Bool)) tagValid <- replicateM(mkReg(False)); Vector#(Rows, Reg#(Bool)) dirty <- replicateM(mkReg(False)); FIFOF#(MemReq) missFifo <- mkSizedFIFOF(reqFifoSz); RWire#(MemReq) hitWire <- mkUnsafeRWire; Rwire#(Index) replaceIndexWire <- mkUnsafeRWire; method Action#(Bool) req(MemReq r) … endmethod; method ActionValue#(Data) resp … endmethod; endmodule March 21, 2012 L13-16http://csg.csail.mit.edu/6.S078

Data Cache typedefs typedef 32 AddrSz; typedef 256 Rows; typedef Bit#(AddrSz) Addr; typedef Bit#(TLog#(Rows)) Index; typedef Bit#(TSub#(AddrSz, TAdd#(TLog#(Rows), 2))) Tag; typedef 32 DataSz; typedef Bit#(DataSz) Data; Integer reqFifoSz = 6; function Tag getTag(Addr addr); function Index getIndex(Addr addr); function Addr getAddr(Tag tag, Index idx); March 21, 2012 L13-17http://csg.csail.mit.edu/6.S078

Data Cache req method ActionValue#(Bool) req(MemReq r); Index index = getIndex(r.addr); if(tagArray.sub(index) == getTag(r.addr) && tagValid[index]) begin hitWire.wset(r); return True; end else if(miss && evict) begin … return False; end else if(miss && !evict) begin... return True; end Combinational hit case The request is accepted not accepted accepted March 21, 2012 L13-18http://csg.csail.mit.edu/6.S078

Data Cache req – miss case... else if(tagValid[index] && dirty[index]) begin if(mem.reqNotFull) begin writebackWire.wset(index); mem.reqEnq(MemReq{op: St, addr:getAddr(tagArray.sub(index),index), data: dataArray.sub(index)}); end return False; end … Need to evict? victim is evicted March 21, 2012 L13-19http://csg.csail.mit.edu/6.S078

Data Cache req else if(mem.reqNotFull && missFifo.notFull) begin missFifo.enq(r); r.op = Ld; mem.reqEnq(r); return True; end else return False; endmethod miss&!evict&canhandle miss&!evict&!canhandle March 21, 2012 L13-20http://csg.csail.mit.edu/6.S078

Data Cache resp – hit case method ActionValue#(Maybe#(Data)) resp; if(hitWire.wget matches tagged Valid.r) begin let index = getIndex(r.addr); if(r.op == St) begin dirty[index] <= True; dataArray.upd(index, r.data); end return tagged Valid (dataArray.sub(index)); end … March 21, 2012 L13-21http://csg.csail.mit.edu/6.S078

Data Cache resp else if(writebackWire.wget matches tagged Valid.idx) begin tagValid[idx] <= False; dirty[idx] <= False; return tagged Invalid; end … Writeback case: resp handles this to ensure that state (dirty, tagValid) is updated only in one place March 21, 2012 L13-22http://csg.csail.mit.edu/6.S078

Data Cache resp – miss case else if(mem.respNotEmpty) begin let index = getIndex(reqFifo.first.addr); dataArray.upd(index, reqFifo.first.op == St? reqFifo.first.data : mem.respFirst); if(reqFifo.first.op == St) dirty[index] <= True; tagArray.upd(index, getTag(reqFifo.first.addr)); tagValid[index] <= True; mem.respDeq; missFifo.deq; return tagged Valid (mem.respFirst); end March 21, 2012 L13-23http://csg.csail.mit.edu/6.S078

Data Cache resp else begin return tagged Invalid; end endmethod All other cases (missResp not arrived, no request given, etc) March 21, 2012 L13-24http://csg.csail.mit.edu/6.S078

March 21, 2012 L13-25http://csg.csail.mit.edu/6.S078

Multi-Level Caches Inclusive vs. Exclusive Memory disk L1 Icache L1 Dcache regs L2 Cache Processor Options: separate data and instruction caches, or a unified cache March 21, 2012 L13-26http://csg.csail.mit.edu/6.S078

Write-Back vs. Write-Through Caches in Multi-Level Caches Write back Writes only go into top level of hierarchy Maintain a record of “dirty” lines Faster write speed (only has to go to top level to be considered complete) Write through All writes go into L1 cache and then also write through into subsequent levels of hierarchy Better for “cache coherence” issues No dirty/clean bit records required Faster evictions 27

Write Buffer Source: Skadron/Clark March 21, 2012 L13-28http://csg.csail.mit.edu/6.S078

Summary Choice of cache interfaces Combinational vs non-combinational responses Blocking vs non-blocking FIFO vs non-FIFO responses Many possible internal organizations Direct Mapped and n-way set associative are the most basic ones Writeback vs Write-through Write allocate or not … next integrating into the pipeline March 21, 2012 L13-29http://csg.csail.mit.edu/6.S078