Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.

Slides:



Advertisements
Similar presentations
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Advertisements

Lecture 12 Reduce Miss Penalty and Hit Time
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
CSIE30300 Computer Architecture Unit 10: Virtual Memory Hsin-Chou Chi [Adapted from material by and
Virtual Memory Hardware Support
Cs 325 virtualmemory.1 Accessing Caches in Virtual Memory Environment.
Virtual Memory 3 Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University P & H Chapter
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Virtual Memory 2 P & H Chapter
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
CSCE 212 Chapter 7 Memory Hierarchy Instructor: Jason D. Bakos.
S.1 Review: The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
The Memory Hierarchy II CPSC 321 Andreas Klappenecker.
Translation Buffers (TLB’s)
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.
Virtual Memory Topics Virtual Memory Access Page Table, TLB Programming for locality Memory Mountain Revisited.
Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.
Cache intro CSE 471 Autumn 011 Principle of Locality: Memory Hierarchies Text and data are not accessed randomly Temporal locality –Recently accessed items.
1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.
Lecture 19: Virtual Memory
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
The Memory Hierarchy 21/05/2009Lecture 32_CA&O_Engr Umbreen Sabir.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CSIE30300 Computer Architecture Unit 08: Cache Hsin-Chou Chi [Adapted from material by and
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Chapter 5 Memory III CSE 820. Michigan State University Computer Science and Engineering Miss Rate Reduction (cont’d)
Outline Cache writes DRAM configurations Performance Associative caches Multi-level caches.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
CS.305 Computer Architecture Memory: Caches Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly made available.
CPE232 Cache Introduction1 CPE 232 Computer Organization Spring 2006 Cache Introduction Dr. Gheith Abandah [Adapted from the slides of Professor Mary Irwin.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
1 Chapter Seven. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value.
Cache Perf. CSE 471 Autumn 021 Cache Performance CPI contributed by cache = CPI c = miss rate * number of cycles to handle the miss Another important metric.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Memory Management memory hierarchy programs exhibit locality of reference - non-uniform reference patterns temporal locality - a program that references.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
CSE 351 Section 9 3/1/12.
CSC 4250 Computer Architectures
Multilevel Memories (Improving performance using alittle “cash”)
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Appendix B. Review of Memory Hierarchy
Cache Memory Presentation I
Morgan Kaufmann Publishers
Chapter 8 Digital Design and Computer Architecture: ARM® Edition
Evolution in Memory Management Techniques
CMSC 611: Advanced Computer Architecture
Lecture 23: Cache, Memory, Virtual Memory
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Performance metrics for caches
Performance metrics for caches
Translation Buffers (TLB’s)
Translation Buffers (TLB’s)
CSC3050 – Computer Architecture
Principle of Locality: Memory Hierarchies
Translation Buffers (TLBs)
Performance metrics for caches
Review What are the advantages/disadvantages of pages versus segments?
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase is also widening

Spring 2003CSE P5482 Memory Hierarchy Levels of memory with different sizes & speeds close to the CPU: small, fast access close to memory: large, slow access Memory hierarchies improve performance caches: demand-driven storage principal of locality of reference temporal: a referenced word will be referenced again soon spatial: words near a reference word will be referenced soon speed/size trade-off in technology  fast access for most references First Cache: IBM 360/85 in the late ‘60s

Spring 2003CSE P5483 Cache Review Cache organization Block: # bytes associated with 1 tag usually the # bytes transferred on a memory request Set: the blocks that can be accessed with the same index bits Associativity: an implementation that indicates a specific number of blocks in a set direct mapped set associative fully associative Size: # bytes of data How do you calculate this?

Spring 2003CSE P5484 Logical Diagram of a Cache

Spring 2003CSE P5485 Cache Review Accessing a cache number of index = log 2 (cache size / block size) (for a direct mapped cache) number of index = log 2 (cache size /( block size * associativity)) (for a set-associative cache)

Spring 2003CSE P5486 Logical Diagram of a Set-associative Cache

Spring 2003CSE P5487 Design Tradeoffs Cache size the bigger the cache, + the higher the hit ratio - the longer the access time

Spring 2003CSE P5488 Design Tradeoffs Block size the bigger the block, + the better the spatial locality + less block transfer overhead/block + less tag overhead/entry - might not access all the bytes in the block

Spring 2003CSE P5489 Design Tradeoffs Associativity the larger the associativity, + the higher the hit ratio - the larger the hardware cost (comparator/set) - the longer the hit time (a larger MUX) - need block replacement hardware - increase in tag bits (if same size cache) Associativity is more important for small caches than large because more memory locations map to the same line e.g., TLBs!

Spring 2003CSE P54810 Design Tradeoffs Memory update policy write-through performance depends on the # of writes write buffer decreases this read miss check store compression write-back performance depends on the # of dirty block replacements but... dirty bit & logic for checking it tag check before the write must flush the cache before I/O optimization: fetch before replace

Spring 2003CSE P54811 Design Tradeoffs Cache contents separate instruction & data caches separate access  double the bandwidth different configurations for I & D shorter access time unified cache lower miss rate less cache controller hardware

Spring 2003CSE P54812 Review of Address Translation Address translation: maps a virtual address to a physical address, using the page tables number of page offset bits = page size Translation Lookaside Buffer (TLB): (often) associative cache of most recently translated virtual-to-physical page mappings 64/128-entry, fully associative 4-8 byte blocks.5 -1 cycle hit time low tens of cycles miss penalty misses can be handled in software, software with hardware assists, firmware or hardware works because of locality of reference much faster than address translation using the page tables

Spring 2003CSE P54813 Using a TLB (1) Access a TLB using the virtual page number. (2) If a hit, concatenate the physical page number & the page offset bits, to form a physical address; set the reference bit; if writing, set the dirty bit. (3) If a miss, get the physical address from the page table; evict a TLB entry & update dirty/reference bits in the page table; update the TLB with the new mapping.

Spring 2003CSE P54814 Design Tradeoffs Virtual or physical addressing Virtually-addressed caches: access with a virtual address (index & tag) do address translation on a cache miss + faster for hits because no address translation + compiler support for better data placement

Spring 2003CSE P54815 Design Tradeoffs Virtually-addressed caches: - need to flush the cache on a context switch process identification (PID) can avoid this - synonyms “the synonym problem” if 2 processes are sharing data, two (different) virtual addresses map to the same physical address 2 copies of the same data in the cache on a write, only one will be updated; so the other has old data a solution: memory allocation restrictions processes share segments, so all data aliases have the same low-order bits, i.e., same offset from the beginning of a segment cache must be <= the segment size (more precisely, each set of the cache must be <= the segment size) index taken from offset, tag compare on segment #

Spring 2003CSE P54816 Design Tradeoffs Virtual or physical addressing physically addressed caches do address translation on every cache access access with a physical index & compare with physical tag - if a straightforward implementation, hit time increases because must translate the virtual address before access the cache + increase in hit time can be avoided if address translation is done in parallel with the cache access restrict cache size so that cache index bits are in the page offset (virtual & physical bits are the same) access the TLB at the same time compare the physical tag from the cache to the physical address (page frame #) from the TLB can increase cache size by increasing associativity, but still use page offset bits for the index + no cache flushing on a context switch + no synonym problem

Spring 2003CSE P54817 Cache Hierarchies Cache hierarchy different caches with different sizes & access times & purposes + decrease effective memory access time: many misses in the L1 cache will be satisfied by the L2 cache avoid going all the way to memory

Spring 2003CSE P54818 Cache Hierarchies Level 1 cache goal: fast access so minimize hit time (the common case)

Spring 2003CSE P54819 Cache Hierarchies Level 2 cache goal: keep traffic off the system bus

Spring 2003CSE P54820 Review of Cache Metrics Hit (miss) ratio = measures how well the cache functions useful for understanding cache behavior relative to the number of references intermediate metric Effective access time = (rough) average time it takes to do a memory reference performance of the memory system, including factors that depend on the implementation intermediate metric

Spring 2003CSE P54821 Measuring Cache Hierarchy Performance Effective Access Time for a cache hierarchy:...

Spring 2003CSE P54822 Local Miss Ratio: # accesses for the L1 cache: the number of references # accesses for the L2 cache: the number of misses in the L1 cache Example: 1000 references 40 L1 misses 10 L2 misses local MR (L1): local MR (L2): Measuring Cache Hierarchy Performance

Spring 2003CSE P54823 Measuring Cache Hierarchy Performance Global Miss Ratio: Example: 1000 References 40 L1 misses 10 L2 misses global MR (L1): global MR (L2):

Spring 2003CSE P54824 Miss Classification Usefulness is in providing insight into the causes of misses does not explain what caused particular, individual misses Compulsory first reference misses decrease by increasing block size Capacity due to finite size of the cache decrease by increasing cache size Conflict too many blocks map to the same set decrease by increasing associativity Coherence (invalidation) decrease by decreasing block size + improving processor locality