1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan.

Slides:



Advertisements
Similar presentations
Computation I pg 1 Embedded Computer Architecture Memory Hierarchy: Cache Recap Course 5KK73 Henk Corporaal November 2014
Advertisements

CS 430 – Computer Architecture
Memory Subsystem and Cache Adapted from lectures notes of Dr. Patterson and Dr. Kubiatowicz of UC Berkeley.
1 Lecture 20 – Caching and Virtual Memory  2004 Morgan Kaufmann Publishers Lecture 20 Caches and Virtual Memory.
©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 15 Instructor: L.N. Bhuyan
331 Week13.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 20 - Memory.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer ArchitectureFall 2008 © October 27th, 2008 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 13: Memory Hierarchy—Ways to Reduce Misses.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Memory Chapter 7 Cache Memories.
COMP3221: Microprocessors and Embedded Systems Lecture 26: Cache - II Lecturer: Hui Wu Session 2, 2005 Modified from.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 3, 2003 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
331 Lec20.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
CS61C L20 Caches I (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #20: Caches Andy Carle.
361 Computer Architecture Lecture 14: Cache Memory
CIS629 - Fall 2002 Caches 1 Caches °Why is caching needed? Technological development and Moore’s Law °Why are caches successful? Principle of locality.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
ENGS 116 Lecture 121 Caches Vincent H. Berk Wednesday October 29 th, 2008 Reading for Friday: Sections C.1 – C.3 Article for Friday: Jouppi Reading for.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © November 7th, 2007 Majd F. Sakr CS-447– Computer Architecture.
Caching I Andreas Klappenecker CPSC321 Computer Architecture.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
COMP3221 lec34-Cache-II.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lectures 34: Cache Memory - II
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
L18 – Memory Hierarchy 1 Comp 411 – Fall /30/2009 Memory Hierarchy Memory Flavors Principle of Locality Program Traces Memory Hierarchies Associativity.
Cs 61C L17 Cache.1 Patterson Spring 99 ©UCB CS61C Cache Memory Lecture 17 March 31, 1999 Dave Patterson (http.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs61c/schedule.html.
331 Lec20.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 13 Basics of Cache [Adapted from Dave Patterson’s UCB CS152.
1  2004 Morgan Kaufmann Publishers Chapter Seven.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Computer ArchitectureFall 2007 © November 12th, 2007 Majd F. Sakr CS-447– Computer Architecture.
DAP Spr.‘98 ©UCB 1 Lecture 11: Memory Hierarchy—Ways to Reduce Misses.
Computing Systems Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
CPE432 Chapter 5A.1Dr. W. Abu-Sufah, UJ Chapter 5A: Exploiting the Memory Hierarchy, Part 1 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
CS1104: Computer Organisation School of Computing National University of Singapore.
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
Lecture 14 Memory Hierarchy and Cache Design Prof. Mike Schulte Computer Architecture ECE 201.
EEE-445 Review: Major Components of a Computer Processor Control Datapath Memory Devices Input Output Cache Main Memory Secondary Memory (Disk)
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
EEL5708/Bölöni Lec 4.1 Fall 2004 September 10, 2004 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Review: Memory Hierarchy.
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
1010 Caching ENGR 3410 – Computer Architecture Mark L. Chang Fall 2006.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
Computer Organization & Programming
Csci 211 Computer System Architecture – Review on Cache Memory Xiuzhen Cheng
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 30 – Caches I After more than 4 years C is back at position number 1 in.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 14: Memory Hierarchy Chapter 5 (4.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
COSC3330 Computer Architecture
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
Yu-Lun Kuo Computer Sciences and Information Engineering
The Goal: illusion of large, fast, cheap memory
Improving Memory Access 1/3 The Cache and Virtual Memory
ECE232: Hardware Organization and Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
CS-447– Computer Architecture Lecture 20 Cache Memories
Some of the slides are adopted from David Patterson (UCB)
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Presentation transcript:

©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan

©UCB Recap: Machine Organization: 5 classic components of any computer Personal Computer Processor (CPU) (active) Computer Control (“brain”) Datapath (“brawn”) Memory (passive) (where programs, & data live when running) Devices Input Output Components of every computer belong to one of these five categories

©UCB °Users want large and fast memories! SRAM access times are.5 – 5ns at cost of $4000 to $10,000 per GB. °DRAM access times are 50-70ns at cost of $100 to $200 per GB. °Disk access times are 5 to 20 million ns at cost of $.50 to $2 per GB. Memory Trends 2004

©UCB Memory Latency Problem µProc 60%/yr. (2X/1.5yr) DRAM 5%/yr. (2X/15 yrs) DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy

©UCB The Goal: Illusion of large, fast, cheap memory °Fact: Large memories are slow, fast memories are small °How do we create a memory that is large, cheap and fast (most of the time)? °Hierarchy of Levels Uses smaller and faster memory technologies close to the processor Fast access time in highest level of hierarchy Cheap, slow memory furthest from processor °The aim of memory hierarchy design is to have access time close to the highest level and size equal to the lowest level

©UCB Recap: Memory Hierarchy Pyramid Processor (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB Level 3... transfer datapath: bus Decreasing distance from CPU, Decreasing Access Time (Memory Latency)

©UCB Why Hierarchy works: Natural Locality °The Principle of Locality: Programs access a relatively small portion of the address space at any second Memory Address 02 n - 1 Probability of reference °Temporal Locality (Locality in Time):  Recently accessed data tend to be referenced again soon °Spatial Locality (Locality in Space):  nearby items will tend to be referenced soon 1 0

©UCB Memory Hierarchy: Terminology Hit: data appears in level X: Hit Rate: the fraction of memory accesses found in the upper level Miss: data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Hit Time: Time to access the upper level which consists of Time to determine hit/miss + memory access time Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor Note: Hit Time << Miss Penalty

©UCB Current Memory Hierarchy Control Data- path Processor regs Secon- dary Mem- ory L2 Cache Speed(ns):1ns2ns6ns100ns10,000,000ns Size (MB): ,000 Cost ($/MB):--$100$30$1 $0.05 Technology:RegsSRAMSRAMDRAMDisk L1 cache Main Mem- ory

©UCB Memory Hierarchy Technology °Random Access: access time is the same for all locations (Hardware decoder used) °Sequential Access: Very slow, Data accessed sequentially, access time is location dependent, considered as I/O, (Example: Disks and Tapes) °DRAM: Dynamic Random Access Memory High density, low power, cheap, slow Dynamic: needs to be “refreshed” regularly °SRAM: Static Random Access Memory Low density, high power, expensive, fast Static: content last “forever”(until lose power)

©UCB °SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) °DRAM: value is stored as a charge on capacitor (must be refreshed) very small but slower than SRAM (factor of 5 to 10) Memories: Review

©UCB How is the hierarchy managed? °Registers  Memory By the compiler (or assembly language Programmer) °Cache  Main Memory By hardware °Main Memory  Disks By combination of hardware and the operating system (virtual memory: will cover next) By the programmer (Files)

©UCB Measuring Cache Performance CPU time = Execution cycles X clock cycle time If cache miss: (Execution cycles + Memory stall cycles) X clock cycle time Read-stall cycles = # reads X Read miss rate X Read miss penalty Write-stall cycles = # writes X write miss rate X write miss penalty Memory-stall cycles = Read-stall + write stall = Memory accesses X miss rate X miss penalty = # instns X misses/instn X miss penalty Measuring Cache Performance

©UCB Example Q: Cache miss penalty = 50 cycles and all instns take 2.0 cycles without memory stalls. Assume cache miss rate of 2% and 1.33 (why?) memory references per instn. What is the impact of cache? Ans: CPU time= IC x (CPI + Memory stall cycles/instn) x cycle time t Performance including cache misses is CPU time= IC x (2.0 + (1.33 x.02 x 50)) x cycle time = IC x 3.33 x t For a perfect cache that never misses CPU time = IC x 2.0 x t Hence, including the memory hierarchy stretches CPU time by 1.67 But, without memory hierarchy, the CPI would increase to x 1.33 or 68.5 – a factor of over 30 times longer.

©UCB Cache Organization (1) How do you know if something is in the cache? (2) If it is in the cache, how to find it? °Answer to (1) and (2) depends on type or organization of the cache °In a direct mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache

©UCB Simplest Cache: Direct Mapped Memory 4-Block Direct Mapped Cache Block Address Cache Index Cache Block 0 can be occupied by data from: Memory block 0, 4, 8, 12 -Cache Block 1 can be occupied by data from: Memory block 1, 5, 9, two 0100 two 1000 two 1100 two Block Size = 32/64 Bytes

©UCB Simplest Cache: Direct Mapped Main Memory 4-Block Direct Mapped Cache Block Address Cache Index °index determines block in cache °index = (address) mod (# blocks) °If number of cache blocks is power of 2, then cache index is just the lower n bits of memory address [ n = log 2 (# blocks) ] tag index Memory block address

©UCB Simplest Cache: Direct Mapped w/Tag Main Memory Direct Mapped Cache Block Address °tag determines which memory block occupies cache block °tag = left hand bits of address °hit: cache tag field = tag bits of address °miss: tag field  tag bits of addr. tag 11 data cache index

©UCB Finding Item within Block °In reality, a cache block consists of a number of bytes/words (32 or 64 bytes) to (1) increase cache hit due to locality property and (2) reduce the cache miss time. °Mapping: memory block I can be mapped to cache block frame I mod x, where x is the number of blocks in the cache -Called congruent mapping °Given an address of item, index tells which block of cache to look in °Then, how to find requested item within the cache block? °Or, equivalently, “What is the byte offset of the item within the cache block?”

©UCB Issues with Direct-Mapped °If block size > 1, rightmost bits of index are really the offset within the indexed block ttttttttttttttttt iiiiiiiiii oooo tagindexbyte to checkto offset if have selectwithin correct blockblockblock

©UCB Accessing data in a direct mapped cache Three types of events: °cache miss: nothing in cache in appropriate block, so fetch from memory °cache hit: cache block is valid and contains proper address, so read desired word °cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory Cache Access Procedure: (1) Use Index bits to select cache block (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits (3) If they match, use the offset to read out the word/byte.

©UCB Data valid, tag OK, so read offset return word d... Valid Tag 0x0-3 0x4-70x8-b0xc-f abcd ° Index

©UCB An Example Cache: DecStation 3100 °Commercial Workstation: ~1985 °MIPS R2000 Processor (similar to pipelined machine of chapter 6) °Separate instruction and data caches: direct mapped 64K Bytes (16K words) each Block Size: 1 Word (Low Spatial Locality) Solution: Increase block size – 2 nd example

©UCB 1614 ValidTagData Hit K entries 16 bits32 bits Byte Offset Data Address (showing bit positions) DecStation 3100 Cache If miss, cache controller stalls the processor, loads data from main memory

©UCB 1612Byte offset HitData K entries 16 bits128 bits Mux Block offsetIndex Tag Address (showing bit positions) KB Cache with 4-word (16-byte) blocks TagData V

©UCB Miss rates: 1-word vs. 4-word block (cache similar to DecStation 3100) I-cacheD-cacheCombined Programmiss ratemiss ratemiss rate gcc6.1%2.1%5.4% spice1.2%1.3%1.2% gcc2.0%1.7%1.9% spice0.3%0.6%0.4% 1-word block 4-word block

©UCB Miss Rate Versus Block Size % 35% 30% 25% 20% 15% 10% 5% 0% M i s s r a t e Block size (bytes) 1 KB 8 KB 16 KB 64 KB 256 KB total cache size Figure for direct mapped cache

©UCB Extreme Example: 1-block cache °Suppose choose block size = cache size? Then only one block in the cache °Temporal Locality says if an item is accessed, it is likely to be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access is likely to be a miss -Continually loading data into the cache but forced to discard them before they are used again -Worst nightmare of a cache designer: Ping Pong Effect

©UCB Block Size and Miss Penality °With increase in block size, the cost of a miss also increases °Miss penalty: time to fetch the block from the next lower level of the hierarchy and load it into the cache °With very large blocks, increase in miss penalty overwhelms decrease in miss rate °Can minimize average access time if design memory system right

©UCB Block Size Tradeoff Miss Penalty Block Size Average Access Time Increased Miss Penalty & Miss Rate Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Block Size