Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 19: Cache Operation & Design Spring 2009.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

Lecture 8: Memory Hierarchy Cache Performance Kai Bu
Lecture 12 Reduce Miss Penalty and Hit Time
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 8: Sequential Design Spring 2009 W. Rhett.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
How caches take advantage of Temporal locality
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 13: Regression Testing, MemAccess Block.
Spring 2007W. Rhett Davis with slight modification by Dean Brock UNCA ECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 11: Memories,
CMPE 421 Parallel Computer Architecture
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 26 Fasih ur Rehman.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 16: Introduction to Buses and Interfaces.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
Spring 2007W. Rhett Davis with minor modification by Dean Brock UNCA ECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 12: Intro to the.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Spring 20067W. Rhett Davis with minor modification by Dean Brock UNCA ECE 406Slide 1 ECE 406 Design of Complex Digital Systems Lecture 11: Data Converter,
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 12: Intro to the LC-3 Micro-architecture.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 9: State Machines & Reset Behavior Spring.
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 20: Cache Design Spring 2009 W. Rhett Davis.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 10: Data-Converter Example Spring 2009 W.
Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 18: More Complex Interfaces Spring 2009.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
CS161 – Design and Architecture of Computer
CMSC 611: Advanced Computer Architecture
Cache Memory.
CS161 – Design and Architecture of Computer
Multilevel Memories (Improving performance using alittle “cash”)
How will execution time grow with SIZE?
Basic Performance Parameters in Computer Architecture:
Cache Memory Presentation I
William Stallings Computer Organization and Architecture 7th Edition
Cache memory Direct Cache Memory Associate Cache Memory
Lecture 08: Memory Hierarchy Cache Performance
Chapter 6 Memory System Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
CS 3410, Spring 2014 Computer Science Cornell University
Cache - Optimization.
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 1 ECE 406 – Design of Complex Digital Systems Lecture 19: Cache Operation & Design Spring 2009 W. Rhett Davis NC State University with significant material from Paul Franzon, Bill Allen, & Xun Liu

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 2 Announcements l HW#8 Due Thursday l Proj#2 Due in 16 days (Start early!)

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 3 Summary of Last Lecture l How can you tell if an interface has flow- control? l What can you do to reduce the complexity of the state transition diagram for an interface with flow control?

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 4 Today’s Lecture l Cache Introduction l Cache Examples l Project #2 Introduction

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 5 Cache Memory Fundamentals l A cache memory is another memory block in the system, which works closely with the "main" memory block, to improve the performance of memory accesses: l Cache memory is: » faster than main memory » usually physically closer to the decode and execution units » smaller in capacity than the main memory » holds the frequently accessed data and/or instructions

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 6 Cache Memory Fundamentals l Programmers want large amounts of fast memory » for function and performance l Large main memories are usually slow l Programs do not access all code/data uniformly, rather smaller amounts of the total data and code (instructions) are accessed more frequently than the rest l Programs exhibit: » "Spatial Locality" - high probability that an instruction --- which is physically close in memory to the one just accessed --- will be accessed » "Temporal Locality" - high probability that a recently accessed instruction will be accessed again

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 7 Multi-Level Cache Hierarchy Main Memory Level 2 Cache Level 1 Cache Frequently access block of memory (data and/or instructions) Cache Memory Fundamentals

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 8 Elements of a Cache l Size » in relation to the main memory l Mapping » direct, set associative, fully associative, etc. l Replacement algorithm (for a cache "miss") » LRU, FIFO, LFU, Random, etc. l Write policy » write back, through, once, allocate, etc. l Line size (block size) l Cache Levels » number of caches, "memory hierarchy" l Cache Coherency » across multiple processors with caches l Type of Accesses » Unified (both instruction & data), Split (separate instr. & data caches)

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 9 Basic Cache Operation Cache Controller receives address of data or instruction to be accessed, from CPU Is the data / instruction in the cache ? Forward the data / instruction to the CPU Done yes = "cache hit" Cache Controller accesses main memory to get the requested data / instruction Allocate / replace the lines in the cache for the requested data / instruction REPLACEMENT POLICY? Load data / instruction and its associated block in cache READ / WRITE POLICY? Forward the data / instruction to the CPU no = "cache miss" order and sequence depends on the replacement and read/write policies

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 10 Cache Policies l Replacement Policy » Miss: –decide which location(s) and what contents in cache to replace with the requested data and its associated "block" » Policy Options: –LRU, FIFO, LFU, Random » We will use a direct-mapped cache, which means that only one cache location is mapped to each main-memory location. A Miss will always require replacement l Read Policy Options » Hit: –(1) Forward requested data to the CPU » Miss: –(1) "Load Through" - forward to CPU as the cache is filled from main memory –(2) Fill cache first from main memory, then forward to the CPU –We will use option (2)

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 11 Cache Policies l Write Policy Options » Hit: –(1) "Write Through" - write to both cache and main memory –(2) "Write Back" - write to cache, update main memory upon a cache "flush“ –We will use option (1) » Miss: –1) "Write Allocate" - write to main memory and then fill cache –2) "Write No-Allocate" - write to main memory only –We will use option (1)

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 12 Today’s Lecture l Cache Introduction l Cache Examples l Project #2 Introduction

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 13 Direct Mapped Caches l Each Main-Memory Address is divided into three fields: l Example 1: » 32 main-memory locations (5 address bits) » 16 bits per word » 0 offset bits (1 word per block) » 3 index bits (8 blocks in cache) » 2 tag bits » Cache RAM will be 8 words x 18 bits Tag IndexOffset

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 14 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index 10101

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 15 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index memory locations mapped to cache locations - the cache holds copies of what is in main memory

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 16 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index lower order bits used as the cache "index"

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 17 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index conflicted mappings

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 18 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index higher order bits used as the cache "tag"

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 19 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index higher order bits used as the cache "tag" - to determine which particular memory line is in cache - that is, which "index" is in cache Cache Tag

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 20 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index which particular memory line is in cache ? - that is, which "index" is in cache Cache Tag ?

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 21 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index compare the "tag" - to determine which particular memory line is in cache - that is, which "index" is in cache Cache Tag

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 22 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index Cache Tag "compare" "decode" - therefore, a comparator and a decoder is needed cache controller

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 23 Basic Cache Architecture Main Memory 32 Locations Cache 8 "Blocks" Cache Index Cache Tag Valid Array indicates if a Cache Block and Tag have been loaded. An invalid entry should always result in a “miss” Valid Array

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 24 Another Direct-Mapped Example l Example 2 (Used on HW#8 & Proj#2): » 2 16 main-memory locations (16 address bits) » 16 bits per word » 2 offset bits –How many words per block? » 4 index bits –How many blocks in cache? » How many tag bits? » How big will the Cache RAM be?

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 25 Example Program l AddressDataAssembly Language l // AND R0, R0, #0 l // ADD R0, R0, #7 l // AND R1, R1, #0 l // loop1ADD R1, R1, #5 l F // ADD R0, R0, #-1 l FD // BRP loop1 l // ST R1, var1 l 3007EC04 // LEA R6, dest l 3008C180 // JMP R6 l // var1 NOP l 300A0000 // var2 NOP l 300B0000 // var3 NOP l 300C25FC // dest LD R2, var1 l 300D14A1 // ADD R2, R2, #1 l 300E75BE // STR R2, R6, #-2 l 300F7DBF // STR R6, R6, #-1 l 3010A7FA // LDI R3, var3 l 3011B5F9 // STI R2, var3 l 30120FFF // last BRNZP last

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 26 Exercise l For the first 7 instructions, find the following: » tag, index, and offset for each memory access » Type of Cache Operation (e.g. read hit, read miss, write hit, or write miss) » Show the contents of the cache RAM

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 27 Exercise l 1 st instruction » Fetch from location 3000: –offset: –index: –tag: » Operation: » Cache RAM Contents: IndexValidTagData … F

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 28 Exercise l 2 nd instruction » Fetch from location 3001: –offset: –index: –tag: » Operation: » Cache RAM Contents: IndexValidTagData … F

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 29 Exercise l 5 th instruction » Fetch from location 3004: –offset: –index: –tag: » Operation: » Cache RAM Contents: IndexValidTagData … F

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 30 Exercise l 7 th instruction » Fetch from location 3006: » Write 0023 to location 3009: –offset: –index: –tag: » After writing to main memory, the block is loaded » Cache RAM Contents: IndexValidTagData … F

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 31 Exercise l What if the next instruction were a read from location 1105? –offset: –index: –tag: » l What if the next instruction were a write to location 300A? –offset: –index: –tag: »

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 32 Today’s Lecture l Cache Introduction l Cache Examples l Project #2 Introduction

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 33 Project #1 System l Synchronous Memory with Separate din/dout/address lines

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 34 Project #2 Changes l Asynchronous Off-Chip Memory with Shared din/dout/address lines l Cache sits between processor and memory l LC3 Unchanged except for “macc” signal » High when state is Fetch, Read Memory, Write Memory, or Read Indirect Address l SimpleLC3 and Memory blocks will be provided

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 35 Data Transfer Interface Cache Off-chip Memory Read request (rrqst) Data/Address(data) Read ready(rrdy) Read data ready(rdrdy) Read data accept(rdacpt) Write request(wrqst) Write accept(wacpt) addr din rd dout complete clock reset Memory Access (macc)

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 36 Protocol for Read Miss addressdata0data1data2data3 Read request Read ready Read data ready Read data accept

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 37 Protocol for Write Hit addressdata Write request Write accept

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 38 Protocol for Write Miss Read request addrdata data0data1data2data3 Read data ready Read data accept Write reqst Write acpt data

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 39 Cache System Block-Diagram

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 40 UnifiedCache Schematic

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 41 CacheController Block l Takes the handshaking signals from the LC-3 CPU and Off-Chip Memory as inputs l Takes miss indicator from CacheData as input l Maintains the state of the Cache and Interfaces l Maintains a 2-bit counter that specifies the word offset to be loaded into the cache

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 42 Controller State Machine 0 3 Read-hit Read-miss rrdy=1 rdrdy=1 Read-complete 5 Write wacpt=1 wacpt=0, hit 67 wacpt=1 wacpt=0 rdrdy=1 reset 8 wacpt=0, miss always macc =0 || Read-incomplete

Spring 2009W. Rhett DavisNC State UniversityECE 406Slide 43 Use Counter to Read Four Words 0 3 Read-hit Read-miss rrdy=1 rdrdy=1 Read-complete 5 Other Write wacpt=1 wacpt=0, hit 67 wacpt=1 wacpt=0 rdrdy=1 reset 8 wacpt=0, miss always macc =0 || Read-incomplete 32 rdrdy= rdrdy=0