An Introduction to Cache Design

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
CIS °The Five Classic Components of a Computer °Today’s Topics: Memory Hierarchy Cache Basics Cache Exercise (Many of this topic’s slides were.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
Caches – basic idea Small, fast memory Stores frequently-accessed blocks of memory. When it fills up, discard some blocks and replace them with others.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
1 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: –illusion of having more physical memory –program relocation.
1 CMPE 421 Parallel Computer Architecture PART4 Caching with Associativity.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
DECStation 3100 Block Instruction Data Effective Program Size Miss Rate Miss Rate Miss Rate 1 6.1% 2.1% 5.4% 4 2.0% 1.7% 1.9% 1 1.2% 1.3% 1.2% 4 0.3%
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
1 Memory Hierarchy Design Chapter 5. 2 Cache Systems CPUCache Main Memory Data object transfer Block transfer CPU 400MHz Main Memory 10MHz Bus 66MHz CPU.
An Introduction to Cache Design 2016/10/1\course\cpeg323-08F\Topic7a1.
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.
CMSC 611: Advanced Computer Architecture
Memory: Page Table Structure
CS 704 Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Non Contiguous Memory Allocation
Improving Memory Access The Cache and Virtual Memory
Soner Onder Michigan Technological University
Chapter 2 Memory and process management
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
The Goal: illusion of large, fast, cheap memory
Lecture 12 Virtual Memory.
Improving Memory Access 1/3 The Cache and Virtual Memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Cache Memory Presentation I
Consider a Direct Mapped Cache with 4 word blocks
Morgan Kaufmann Publishers Memory & Cache
Morgan Kaufmann Publishers
William Stallings Computer Organization and Architecture 7th Edition
CS61C : Machine Structures Lecture 6. 2
\course\cpeg324-08F\Topic7c
CS 704 Advanced Computer Architecture
CS61C : Machine Structures Lecture 6. 2
CSCI206 - Computer Organization & Programming
Systems Architecture II
Lecture 08: Memory Hierarchy Cache Performance
Module IV Memory Organization.
Module IV Memory Organization.
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
10/16: Lecture Topics Memory problem Memory Solution: Caches Locality
Adapted from slides by Sally McKee Cornell University
Overheads for Computers as Components 2nd ed.
\course\cpeg324-05F\Topic7c
ECE232: Hardware Organization and Design
How can we find data in the cache?
Miss Rate versus Block Size
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
Contents Memory types & memory hierarchy Virtual memory (VM)
CS-447– Computer Architecture Lecture 20 Cache Memories
CS 3410, Spring 2014 Computer Science Cornell University
Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early days Primary memory.
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Principle of Locality: Memory Hierarchies
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

An Introduction to Cache Design 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Cache A safe place for hiding and storing things. Webster Dictionary 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Even with the inclusion of cache, almost all CPUs are still mostly strictly limited by the cache access-time: In most cases, if the cache access time were decreased, the machine would speedup accordingly. - Alan Smith - Even more so for MPs! 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a While one can imagine ref. patterns that can defeat existing cache M designs, it is the author’s experience that cache M improve performance for any program or workload which actually does useful computation. 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Optimizing the design of a cache memory generally has four aspects: Maximizing the probability of finding a memory reference’s target in the cache (the hit ratio), Minimizing the time to access information that is indeed in the cache (access time), Minimizing the delay due to a miss, and Minimizing the overheads of updating main memory, maintaining cache coherence etc. 2018/11/19 \course\cpeg323-05F\Topic7a

Key Factor in Design Decision for VM and Cache Access-timeMainMem Access-timeCache Access-timeSecondaryMem = 4 ~ 20 . = 104 ~ 106 . Cache control is usually implemented in hardware. 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Technology in 1990s: Technology in 2000s ? 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Technology in 2000s: 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Secondary Memory Main Memory Processor Cache Cache in Memory Hierarchy 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Four Questions for Classifying Memory Hierarchies: The fundamental principles that drive all memory hierarchies allow us to use terms that transcend the levels we are talking about. These same principles allow us to pose four questions about any level of the hierarchy: 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Four Questions for Classifying Memory Hierarchies Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy) 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a These questions will help us gain an understanding of the different tradeoffs demanded by the relationships of memories at different levels of a hierarchy. 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a 0 1 2 3 4 5 6 7 TAGS DATA 0117X 35, 72, 55, 30, 64, 23, 16, 14 7620X 11, 31, 26, 22, 55, … 3656X 71, 72, 44, 50, … 1741X 33, 35, 07, 65, ... Line 01173 30 ADDRESS DATA Concept of Cache miss and Cache hit 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a teff : effective cache access time tcache : cache access time tmain : main memory access time h : hit ratio teff = htcache + (1-h)tmain 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Example Let tcache = 10 ns - 1- 4 clock cycles tmain = 50 ns - 8-32 clock cycles h = 0.95 teffect = ? 10 x 0.95 + 50 x 0.05 9.5 + 2.5 = 12 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Hit Ratio Need high enough (say > 90%) to obtain desirable level of performance Amplifying effect of changes Never a constant even for the same machine 2018/11/19 \course\cpeg323-05F\Topic7a

Sensitivity of Performance w.r.t h (hit ratio) teff = h tcache + (1-h) tmain = tcache [ h + (1-h) ] tcache [ 1 + (1-h) ] since 10, the magnifactor of h changes is 10 times. Conclusion: very sensitive tmain tcache tmain tcache tmain tcache ~ 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Remember: “h 1” Example: Let h = 0.90 if h = 0.05 (0.90 0.95) then (1 - h) = 0.05 then teff = tcache ( 1 + 0.5) ~ 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Basic Terminology Cache line (block) - size of a room 1 ~ 16 words Cache directory - key of rooms Cache may use associativity to find the “right directory” by matching “A collection of contiguous data that are treated as a single entity of cache storage.” The portion of a cache that holds the access keys that support associative access. 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Cache Organization Fully associative: an element can be in any block Direct mapping : an element can be in only one block Set-associative : an element can be in a group of block 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a An Example Mem Size = 256 k words x 4B/W = 1 MB Cache Size = 2 k words = 8 K byte Block Size = 16 word/block = 64 byte/block So Main M has = 16 K blocks (16,384) Cache has = 128 blocks addr = 18 bit + 2 = (28 x 210) x 22 256K 16 2K 16 (byte) 20 256 K words 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Fully Associative Feature any block in M can be in any block-frame in cache all entries (block frame) are compared simultaneously (by associative search) 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a A Special Case simplest example: a block = a word entire memory word address becomes “tag” 0 17 Address 027560 very “flixible” and higher probability to reside in cache. 0 17 Cache adv: no trashing (quick reorganizing) disadv: overhead of associative search: cost + time 027560 data 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Tag Block 0 Block 1 Block 127 … … … … … ... Block I Block 16382 Block 16383 . . . 14 bits Main memory address tag word 14 4 Recall: each block has 16 word – so you need 4 bits Fully associative cache organization 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Direct Mapping No associative match From M-addr, “directly” indexed to the block frame in cache where the block should be located. A comparison then is to used to determine if it is a miss or hit. 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Direct Mapping Cont’d Advantage: simplest: Disadvantage: “trashing” Fast (fewer logic) Low cost: (only one set comparator is needed hence can be in the form of standard M 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Direct Mapping Cont’d Example: since cache only has 128 block frames so the degree of multiplexing: Disadr: “trashing” for addressing the corresponding frame or set of size 1. Main Memory Size 16384 (block) 128 (27) 128 = = 27 block/frame the high-order 7 bit is used as tag. i.e. 27 blocks “fall” in one block frame. 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Main memory Cache Block 0 Block 1 Block 2 Block 127 Block 128 Block 129 Block 255 Block 256 Block 257 Block 4095 Block 4096 Block 16383 7 bits Tag ... Block 0 Block 1 Block 127 Tag ... … … … … … ... Tag ... Tag ... Main memory address 7 7 4 Tag Block Word Direct Mapping 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Direct Mapping Cont’d Mapping (indexing) block addr mod (# of blocks in cache – in this case: mod (27)) Adv: low-order log2 (cache size) bit can be used for indexing 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Set-Associative A compromises between direct/full-associative The cache is divided into S sets S = 2, 4, 8, … If the cache has M blocks than, all together, there are E = blocks/set # of buildings available for indexing M S In our example, S = 128/2 = 64 sets 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a The 6-bit will index to the right set, then the 8-bit tag will be used for an associative match. Main memory Cache Block 0 Block 1 Block 63 Block 64 Block 65 Block 4095 Block 16383 8 bits ... Tag Block 0 Block 1 Block 2 Block 3 Block 126 Block 127 Set 0 Tag Set 1 Tag ... Tag Set 63 Tag Tag Main memory address Tag Set Word 8 6 4 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a a 2-way set associative organization: 8 6 4 2 Set Word available for indexing thus or 6 bit used to index into the right “set” higher order 214 (16k) 26 = 28 block/set 28 block/per set of 2 blocks 2 way 8 bit used as tag hence an associative match of 8 bit with the tags of the 2 blocks is required Hence an associative matching of 8 bit with the tags of the 2 block is required. 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Sector Mapping Cache Sector (IBM 360/85) - 16 sector x 16 block/sector 1 sector = consecutive multiple blocks Cache miss: sector replacement Valid bit - one block is moved on demand Example: Sector block word (tag) 0 6 7 13 14 17 7 7 4 A sector in memory can be in any sector in cache 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a Valid bit Block 0 Block 1 Block 15 Block 16 Block 31 Block 16368 Block 16383 Sector 0 Sector 0 ... Tag Block 0 Block 1 Block 14 Block 15 Block 16 Block 31 Block 112 Block 127 Sector 0 ... Sector 1 Tag ... Sector 1 . . . Tag Sector 7 ... ... Sector 1023 Main memory address Sector Block Word (tag) 10 4 4 Sector mapping cache 2018/11/19 \course\cpeg323-05F\Topic7a

\course\cpeg323-05F\Topic7a cont’d 128 blocks 16 blocks/sector Cache has = 8 sector Main memory has = 1K sectors 16k 16 Sector mapping cache 2018/11/19 \course\cpeg323-05F\Topic7a

Address (showing bit positions) 31 30 29 28 27……..16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Byte offset Tag 20 10 Hit Data Index Valid Tag Data 1 2 . . . 1021 1022 1023 20 32 = MIPS Example 2018/11/19 \course\cpeg323-05F\Topic7a

Total # of Bits in a Cache (# of bits of a tag + # of bits of a block + # of bits in valid field) x Cache size For a MIPS example : = ((32-14-2) + 32 + 1) x 214 = 214 x 49 = 784 k bits ~ 100 kbytes = 64 K (bytes) = 214 blocks with Assuming a directly-mapped cache. 2018/11/19 \course\cpeg323-05F\Topic7a