Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

Lecture 19: Cache Basics Today’s topics: Out-of-order execution
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #2 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Systems I Locality and Caching
CSE378 Virtual memory.1 Evolution in memory management techniques In early days, single program ran on the whole machine –used all the memory available.
CMPE 421 Parallel Computer Architecture
EECS 318 CAD Computer Aided Design LECTURE 10: Improving Memory Access: Direct and Spatial caches Instructor: Francis G. Wolff Case.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
The Goal: illusion of large, fast, cheap memory Fact: Large memories are slow, fast memories are small How do we create a memory that is large, cheap and.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Caches Hiding Memory Access Times. PC Instruction Memory 4 MUXMUX Registers Sign Ext MUXMUX Sh L 2 Data Memory MUXMUX CONTROLCONTROL ALU CTL INSTRUCTION.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 26 Memory Hierarchy Design (Concept of Caching and Principle of Locality)
CMSC 611: Advanced Computer Architecture
COSC3330 Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
CSE 351 Section 9 3/1/12.
Memory and cache CPU Memory I/O.
The Goal: illusion of large, fast, cheap memory
Improving Memory Access 1/3 The Cache and Virtual Memory
Ramya Kandasamy CS 147 Section 3
How will execution time grow with SIZE?
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
ECE 445 – Computer Organization
Lecture 21: Memory Hierarchy
Cache Memories September 30, 2008
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Memory and cache CPU Memory I/O.
Lecture 23: Cache, Memory, Virtual Memory
Systems Architecture II
Performance metrics for caches
Performance metrics for caches
Performance metrics for caches
Lecture 20: OOO, Memory Hierarchy
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Lecture 20: OOO, Memory Hierarchy
CS-447– Computer Architecture Lecture 20 Cache Memories
Lecture 21: Memory Hierarchy
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Evolution in memory management techniques
Cache - Optimization.
Fundamentals of Computing: Computer Architecture
Evolution in memory management techniques
Evolution in memory management techniques
Principle of Locality: Memory Hierarchies
Cache Memory and Performance
Performance metrics for caches
Review What are the advantages/disadvantages of pages versus segments?
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory = main memory Secondary memory = disks Nowadays, hierarchy within the primary memory One or more levels of caches on-chip (SRAM, expensive, fast) Generally one level of cache off-chip (DRAM or SRAM; less expensive, slower) Main memory (DRAM; slower; cheaper; more capacity) 12/5/2018 CSE378 Intro to caches

Goal of a memory hierarchy Keep close to the ALU the information that will be needed now and in the near future Memory closest to ALU is fastest but also most expensive So, keep close to the ALU only the information that will be needed now and in the near future Technology trends Speed of processors (and SRAM) increase by 60% every year Latency of DRAMS decrease by 7% every year Hence the processor-memory gap or the memory wall bottleneck 12/5/2018 CSE378 Intro to caches

Processor-Memory Performance Gap x Memory latency decrease (10x over 8 years but densities have increased 100x over the same period) o x86 CPU speed (100x over 10 years) Pentium IV 1000 o Pentium III o Pentium Pro o “Memory wall” Pentium 100 o o 386 “Memory gap” x x x x 10 x x 1 89 91 93 95 97 99 01 12/5/2018 CSE378 Intro to caches

Typical numbers Technology Typical access time $/Mbyte SRAM 3-20 ns $50-200 DRAM 40-120ns $1-10 Disk milliseconds  106 ns $0.01-0.1 12/5/2018 CSE378 Intro to caches

Principle of locality A memory hierarchy works because text and data are not accessed randomly Computer programs exhibit the principle of locality Temporal locality: data/code used in the past is likely to be reused in the future (e.g., code in loops, data in stacks) Spatial locality: data/code close (in memory addresses) to he data/code that is being presently referenced will be referenced in the near future (straight-line code sequence, traversing an array) 12/5/2018 CSE378 Intro to caches

Caches Registers are not sufficient to keep enough data locality close to the ALU Main memory (DRAM) is too “far”. It takes many cycles to access it Instruction memory is accessed every cycle Hence need of fast memory between main memory and registers. This fast memory is called a cache. A cache is much smaller (in amount of storage) than main memory Goal: keep in the cache what’s most likely to be referenced in the near future 12/5/2018 CSE378 Intro to caches

Basic use of caches When fetching an instruction, first check to see whether it is in the cache If so (cache hit) bring the instruction from the cache to the IR. If not (cache miss) go to next level of memory hierarchy, until found When performing a load, first check to see whether it is in the cache If cache hit, send the data from the cache to the destination register If cache miss go to next level of memory hierarchy, until found When performing a store, several possibilities Ultimately, though, the store has to percolate to main memory 12/5/2018 CSE378 Intro to caches

Levels in the memory hierarchy 64-128 ALU registers On-chip cache: split I-cache; D-cache 8-256KB SRAM; a few ns SRAM/DRAM;  10-20 ns Off-chip cache; 128KB - 8MB DRAM; 40-100 ns Main memory; up to 4 GB Secondary memory; 10-100’s of GB a few milliseconds Archival storage 12/5/2018 CSE378 Intro to caches

Caches are ubiquitous Not a new idea. First cache in IBM System/85 (late 60’s) Concept of cache used in many other aspects of computer systems disk cache, network server cache etc. Works because programs exhibit locality Lots of research on caches in last 20 years because of the increasing gap between processor speed and (DRAM) memory latency Every current microprocessor has a cache hierarchy with at least one level on-chip 12/5/2018 CSE378 Intro to caches

Main memory access (review) Recall: In a Load (or Store) the address in an index in the memory array Each byte of memory has a unique address, i.e., the mapping between memory address and memory location is unique ALU Address Main Mem 12/5/2018 CSE378 Intro to caches

Cache Access for a Load or an Instr. fetch Cache is much smaller than main memory Not all memory locations have a corresponding entry in the cache at a given time When a memory reference is generated, i.e., when the ALU generates an address: There is a look-up in the cache: if the memory location is mapped in the cache, we have a cache hit. The contents of the cache location is returned to the ALU. If we don’t have a cache hit (cache miss), we have to look in next level in the memory hierarchy (i.e., other cache or main memory) 12/5/2018 CSE378 Intro to caches

Cache access ALU How do you know where to look? Address hit How do you know if there is a hit? miss Cache Main memory is accessed only if there was a cache miss Main memory 12/5/2018 CSE378 Intro to caches

Some basic questions on cache design When do we bring the contents of a memory location in the cache? Where do we put it? How do we know it’s there? What happens if the cache is full and we want to bring something new? In fact, a better question is “what happens if we want to bring something new and the place where it’s supposed to go is already occupied?” 12/5/2018 CSE378 Intro to caches

Some “top level” answers When do we bring the contents of a memory location in the cache? -- The first time there is a cache miss for that location, that is “on demand” Where do we put it? -- Depends on cache organization (see next slides) How do we know it’s there? -- Each entry in the cache carries its own name, or tag What happens if the cache is full and we want to bring something new? One entry currently in the cache will be replaced by the new one 12/5/2018 CSE378 Intro to caches

Generic cache organization Address Generated by ALU Address data Address data Address data Address data Cache entry or cache block or cache line Address data Address or tag If address (tag) generated by ALU = address (tag) of a cache entry, we have a cache hit; the data in the cache entry is good 12/5/2018 CSE378 Intro to caches

Cache organizations Mapping of a memory location to a cache entry can range from full generality to very restrictive In general, the data portion of a cache block contains several words If a memory location can be mapped anywhere in the cache (full generality) we have a fully associative cache If a memory location can be mapped at a single cache entry (most restrictive) we have a direct-mapped cache If a memory location can be mapped at one of several cache entries, we have a set-associative cache 12/5/2018 CSE378 Intro to caches

How to check for a hit? For a fully associative cache Check all tag (address) fields to see if there is a match with the address generated by ALU Very expensive if it has to be done fast because need to perform all the comparisons in parallel Fully associative caches do not exist for general-purpose caches For a direct mapped cache Check only the tag field of the single possible entry For a set associative cache Check the tag fields of the set of possible entries 12/5/2018 CSE378 Intro to caches

Cache organization -- direct-mapped Most restricted mapping Direct-mapped cache. A given memory location (block) can only be mapped in a single place in the cache. Generally this place given by: (block address) mod (number of blocks in cache) 12/5/2018 CSE378 Intro to caches

Direct-mapped cache All addresses mod C map to the same cache block C lines C lines Main memory 12/5/2018 CSE378 Intro to caches

Fully-associative cache Most general mapping Fully-associative cache. A given memory location (block) can be mapped anywhere in the cache. No cache of decent size is implemented this way but this is the (general) mapping for pages from virtual to physical space (disk to main memory, see later) and for small TLB’s (this will also be explained soon). 12/5/2018 CSE378 Intro to caches

Fully-associative cache Any memory block can map to any cache block Main memory 12/5/2018 CSE378 Intro to caches

Set-associative caches Less restricted mapping Set-associative cache. Blocks in the cache are grouped into sets and a given memory location (block) maps into a set. Within the set the block can be placed anywhere. Associativities of 2 (two-way set-associative),4, 8 and even 16 have been implemented. Direct-mapped = 1-way set-associative Fully associative with m entries is m-way set associative 12/5/2018 CSE378 Intro to caches

Set-associative cache Bank 0 Bank 1 A memory block maps into a specific block of either set Main memory 12/5/2018 CSE378 Intro to caches

Cache hit or cache miss? How to detect if a memory address (a byte address) has a valid image in the cache: Address is decomposed in 3 fields: block offset or displacement (depends on block size) index (depends on number of sets and set-associativity) tag (the remainder of the address) The tag array has a width equal to tag 12/5/2018 CSE378 Intro to caches

Hit detection (direct-mapped cache) These fields have the same size Tag index d d corresponds to number of bytes in the block; index corresponds to number of blocks in the cache; tag is the remaining bits in the address. tag data tag data tag data tag data tag data tag data If tag(gen. address) = tag(entry pointed by index in cache), we have a hit 12/5/2018 CSE378 Intro to caches

Example of a direct-mapped cache DEC Station 3100 64 KB (when giving the size, or capacity, of a cache, only the data part is counted) Each block is 4 bytes (block size); hence 16 K blocks (aka lines) displacement field: d = 2 bits (d = log2 (block size) ) index field: i = 14 bits (i = log2 (nbr of blocks) ) tag field : t = 32 - 14 -2 = 16 bits 12/5/2018 CSE378 Intro to caches

Dec 3100 Tag index d tag data tag data tag data tag data Data to ALU 16 bits 14 bits 2 bits 16 bits 32 bits tag data tag data tag data tag data Data to ALU tag data tag data Yes; cache hit =? No; cache miss; to main memory 12/5/2018 CSE378 Intro to caches

Exxample of a Cache with longer blocks (cap Exxample of a Cache with longer blocks (cap. 64 KB, block size 16 bytes) Tag index d 16 bits 12 bits 4 bits 16 bits 128 bits tag data tag data tag data tag data tag data tag data Yes; cache hit =? mux Data to ALU No; cache miss; to main memory 12/5/2018 CSE378 Intro to caches

Why set-associative caches? Cons The higher the associativity the larger the number of comparisons to be made in parallel for high-performance (can have an impact on cycle time for on-chip caches) Pros Better hit ratio Great improvement from 1 to 2, less from 2 to 4, minimal after that 12/5/2018 CSE378 Intro to caches

Set associative mapping Index = log (number of blocks)/ associativity Bank 1 Bank 0 Note: we need one comparator per bank + mux to see if we have a hit. 12/5/2018 CSE378 Intro to caches