Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.

Slides:



Advertisements
Similar presentations
Lecture 19: Cache Basics Today’s topics: Out-of-order execution
Advertisements

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
1 Lecture 20: Cache Hierarchies, Virtual Memory Today’s topics:  Cache hierarchies  Virtual memory Reminder:  Assignment 8 will be posted soon (due.
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
1 Lecture 12: Cache Innovations Today: cache access basics and innovations (Sections )
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Caches J. Nelson Amaral University of Alberta. Processor-Memory Performance Gap Bauer p. 47.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
Chapter 7 Large and Fast: Exploiting Memory Hierarchy Bo Cheng.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Translation Buffers (TLB’s)
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
1 Lecture: Cache Hierarchies Topics: cache innovations (Sections B.1-B.3, 2.1)
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Lecture 15: Virtual Memory EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CSE 378 Cache Performance1 Performance metrics for caches Basic performance metric: hit ratio h h = Number of memory references that hit in the cache /
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1  1998 Morgan Kaufmann Publishers Recap: Memory Hierarchy of a Modern Computer System By taking advantage of the principle of locality: –Present the.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Computer Organization CS224 Fall 2012 Lessons 45 & 46.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Cache Memory Chapter 17 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S. Dandamudi.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Caches CSE Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items will be accessed.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
High Performance Computing1 High Performance Computing (CS 680) Lecture 2a: Overview of High Performance Processors * Jeremy R. Johnson *This lecture was.
CS 61C: Great Ideas in Computer Architecture Caches Part 2 Instructors: Nicholas Weaver & Vladimir Stojanovic
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CS161 – Design and Architecture of Computer
CS161 – Design and Architecture of Computer
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
Morgan Kaufmann Publishers
Lecture 23: Cache, Memory, Virtual Memory
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Performance metrics for caches
Performance metrics for caches
Translation Buffers (TLB’s)
Computer Design and Organization
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
CSE 586 Computer Architecture Lecture 5
Principle of Locality: Memory Hierarchies
Translation Buffers (TLBs)
Performance metrics for caches
Review What are the advantages/disadvantages of pages versus segments?
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items will be accessed in the near future (e.g., code in loops, top of stack) Spatial locality –Items at addresses close to the addresses of recently accessed items will be accessed in the near future (sequential code, elements of arrays) Leads to memory hierarchy at two main interface levels: –Processor - Main memory -> Introduction of caches –Main memory - Secondary memory -> Virtual memory (paging systems)

Cache intro CSE 471 Autumn 012 Processor - Main Memory Hierarchy Registers: Those visible to ISA + those renamed by hardware (Hierarchy of) Caches: plus their enhancements –Write buffers, victim caches etc… TLB’s and their management Virtual memory system (O.S. level) and hardware assists (page tables) Inclusion of information (or space to gather information) level per level –Almost always true

Cache intro CSE 471 Autumn 013 Questions that Arise at Each Level What is the unit of information transferred from level to level ? –Word (byte, double word) to/from a register –Block (line) to/from cache –Page table entry + misc. bits to/from TLB –Page to/from disk When is the unit of information transferred from one level to a lower level in the hierarchy? –Generally, on demand (cache miss, page fault) –Sometimes earlier (prefetching)

Cache intro CSE 471 Autumn 014 Questions that Arise at Each Level (c’ed) Where in the hierarchy is that unit of information placed? –For registers, directed by ISA and/or register renaming method –For caches, in general in L1 Possibility of hinting to another level (Itanium) or of bypassing the cache entirely, or to put in special buffers How do we find if a unit of info is in a given level of the hierarchy? –Depends on mapping; –Use of hardware (for caches/TLB) and software structures (page tables)

Cache intro CSE 471 Autumn 015 Questions that Arise at Each Level (c’ed) What happens if there is no room for the item we bring in? –Replacement algorithm; depends on organization What happens when we change the contents of the info? –i.e., what happens on a write?

Cache intro CSE 471 Autumn 016 Caches (on-chip, off-chip) Caches consist of a set of entries where each entry has: –block (or line) of data: information contents –tag: allows to recognize if the block is there –status bits: valid, dirty, state for multiprocessors etc. Capacity (or size) of a cache: number of blocks * block size i.e., the cache metadata (tag + status bits) is not counted in the cache capacity Notation –First-level (on-chip) cache: L1; –Second-level (on-chip/off-chip): L2; third level (Off-chip) L3

Cache intro CSE 471 Autumn 017 Cache Organizations Direct-mapped cache. –A given memory location (block) can only be mapped in a single place in the cache. Generally this place given by: (block address) mod (number of blocks in cache) –To make the mapping easier, the number of blocks in a direct- mapped cache is a power of 2. –There have been proposals for caches, for vector processors, that have a number of blocks that are Mersenne prime numbers (the modulo arithmetic for those numbers has some “nice” properties)

Cache intro CSE 471 Autumn 018 Cache Organizations (c’ed) Fully-associative cache. –A given memory location (block) can be mapped anywhere in the cache. –No cache of decent size is implemented this way but this is the (general) mapping for pages (disk to main memory), for small TLB’s, and for some small buffers used as cache assists (e.g., victim caches, write caches).

Cache intro CSE 471 Autumn 019 Cache Organizations (c’ed) Set-associative cache. –Blocks in the cache are grouped into sets and a given memory location (block) maps into a set. Within the set the block can be placed anywhere. Associativities of 2 (two-way set-associative), 3, 4, 8 and even 16 have been implemented. –Direct-mapped = 1-way set-associative –Fully associative with m entries is m-way set associative Capacity –Capacity = number of sets * set-associativity * block size

Cache intro CSE 471 Autumn 0110 Cache Hit or Cache Miss? How to detect if a memory address (a byte address) has a valid image in the cache: Address is decomposed in 3 fields: –block offset or displacement (depends on block size) –index (depends on number of sets and set-associativity) –tag (the remainder of the address) The tag array has a width equal to tag

Cache intro CSE 471 Autumn 0111 Hit Detection tagindex displ. Example: cache capacity C, block size b Direct mapped: displ = log 2 b; index = log 2 (C/ b); tag = 32 -index - displ N -way S.A: displ = log 2 b; index = log 2 (C/ bN); tag = 32 -index - displ So what does it mean to have 3-way (N=3) set-associativity?

Cache intro CSE 471 Autumn 0112 Why Set-associative Caches? Cons –The higher the associativity the larger the number of comparisons to be made in parallel for high-performance (can have an impact on cycle time for on-chip caches) –Higher associativity requires a wider tag array (minimal impact) Pros –Better hit ratio –Great improvement from 1 to 2, less from 2 to 4, minimal after that but can still be important for large L2 caches –Allows parallel search of TLB and caches for larger (but still small) caches (see later)

Cache intro CSE 471 Autumn 0113 Replacement Algorithm None for direct-mapped Random or LRU or pseudo-LRU for set-associative caches –Not an important factor for performance for low associativity. Can become important for large associativity and large caches

Cache intro CSE 471 Autumn 0114 Writing in a Cache On a write hit, should we write: –In the cache only (write-back) policy –In the cache and main memory (or higher level cache) (write- through) policy On a write miss, should we –Allocate a block as in a read (write-allocate) –Write only in memory (write-around)

Cache intro CSE 471 Autumn 0115 The Main Write Options Write-through (aka store-through) –On a write hit, write both in cache and in memory –On a write miss, the most frequent option is write-around –Pro: consistent view of memory (better for I/O); no ECC required for cache –Con: more memory traffic (can be alleviated with write buffers) Write-back (aka copy-back) –On a write hit, write only in cache (requires dirty bit) –On a write miss, most often write-allocate (fetch on miss) but variations are possible –Pro-con reverse of write through

Cache intro CSE 471 Autumn 0116 Classifying the Cache Misses: The 3 C’s Compulsory misses (cold start) –The first time you touch a block. Reduced (for a given cache capacity and associativity) by having large blocks Capacity misses –The working set is too big for the ideal cache of same capacity and block size (i.e., fully associative with optimal replacement algorithm). Only remedy: bigger cache! Conflict misses (interference) –Mapping of two blocks to the same location. Increasing associativity decreases this type of misses. There is a fourth C: coherence misses (cf. multiprocessors)

Cache intro CSE 471 Autumn 0117 Example of Cache Hierarchies

Cache intro CSE 471 Autumn 0118 Examples (c’ed)