Cache Memory By Tom Austin. What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

361 Computer Architecture Lecture 15: Cache Memory
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
Cache Here we focus on cache improvements to support at least 1 instruction fetch and at least 1 data access per cycle – With a superscalar, we might need.
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
Memory Organization.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Cache Memory By Sean Hunter.
Maninder Kaur CACHE MEMORY 24-Nov
Memory Systems Architecture and Hierarchical Memory Systems
Cache memory October 16, 2007 By: Tatsiana Gomova.
By: Aidahani Binti Ahmad
Chapter Twelve Memory Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CS 3410, Spring 2014 Computer Science Cornell University See P&H Chapter: , 5.8, 5.15.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 26 Fasih ur Rehman.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
CS2100 Computer Organisation Cache II (AY2015/6) Semester 1.
Lecture#15. Cache Function The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Cache Memory By Ed Martinez.  The fastest and most expensive memory on a computer system that is used to store collections of data.  Uses very short.
Cache Memory By Aung Oo. Cache Memory Cache Hard drive Cache Printer Cache CD-Rom Cache Browser Cache.
COMP SYSTEM ARCHITECTURE PRACTICAL CACHES Sergio Davies Feb/Mar 2014COMP25212 – Lecture 3.
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.
Cache Memory Yi-Ning Huang. Principle of Locality Principle of Locality A phenomenon that the recent used memory location is more likely to be used again.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Cache Memory.
COSC3330 Computer Architecture
Computer Architecture
How will execution time grow with SIZE?
Basic Performance Parameters in Computer Architecture:
Multiprocessor Cache Coherency
Morgan Kaufmann Publishers
William Stallings Computer Organization and Architecture 7th Edition
ECE 445 – Computer Organization
Chapter 6 Memory System Design
Chap. 12 Memory Organization
Adapted from slides by Sally McKee Cornell University
Part V Memory System Design
EE108B Review Session #6 Daxia Ge Friday February 23rd, 2007
CSC3050 – Computer Architecture
10/18: Lecture Topics Using spatial locality
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

Cache Memory By Tom Austin

What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute. With modern CPU speeds, accessing main memory would become a major performance bottleneck without cache memory.

Analogy for Cache An analogy can be made between cache & main memory and a desk & a bookshelf. Like a desk, cache allows you to access things quickly. A desk can hold only a few books at a time, but they are right at your fingertips. This can be thought of as providing better performance. Like a book shelf, main memory provides a large capacity. It takes longer to get a book from a bookshelf, but it can hold far more books. In other words, it has a much greater capacity.

Types of Cache Memory L1 cache, also known as internal cache, is part of the CPU. L2 cache, known as external cache, supplements the internal cache. It is usually not as fast as L1 cache, but has a larger capacity. L3 cache exists on some larger machines, usually on a separate chip between the L1 cache and L2 cache. Its benefits are seen mostly on extremely large data sets. It is very expensive, and is not used in most PCs today.

Cache Hierarchy Information is read from the L1 cache first. In “inclusive” caches, all of the information in the L1 cache is duplicated in the L2 cache. In “exclusive” caches, no block of memory will be in both caches.

Cache Read Operations Requested information is taken from the cache if it is available. This is known as a “hit”. If not available in the cache, the information will be taken from main memory instead. Also, the cache will be updated with this block from memory. This is a “miss”. Cache is organized into fixed length “lines”. Each block in main memory can be mapped to one or more lines in the cache.

Evaluating Cache Performance The “hit ratio” is the ratio of hits to total checks. Hit Ratio = Hits / (Hits + Misses) A well designed cache can have a hit ratio of close to one.

Replacement Policies Whenever there is a “miss”, the information must be read from main memory. In addition, the cache is updated with this new information. One line will be replaced with the new block of information. Policies for doing this vary. The three most commonly used are FIFO, LRU, and Random.

FIFO Replacement Policy First in, first out – Replaces the oldest line in the cache, regardless of the last time that this line was accessed. The main benefit is that this is easy to implement. The principle drawback is that you won’t keep any item in cache for long – you may find that you are constantly removing and adding the same block of memory.

LRU Replacement Policy Least Recently Used – The line that was accessed least recently is replaced with the new block of data. The benefit is that this keeps the most frequently accessed lines in the cache. The drawback is that this can be difficult and costly to implement, especially if there are lots of lines to consider.

Random Replacement Policy With this policy, the line that is replaced is chosen randomly. Performance is close to that of LRU, and the implementation is much simpler.

Cache Write Operations If the information to be updated is in the cache, this is known as a “write hit”. If not, the information must be updated in main memory. This is known as a “write miss”.

Write-Hit Policies Write through – Cache and main memory are immediately updated. Write back – Cache is immediately updated. Main memory is not updated until the time a block is read from memory. –Performance is better with this approach, particularly with an application that is performing a lot of writes. –An extra bit (called a “dirty bit”) keeps track of what lines in the cache need to update main memory.

Write-Miss Policies Write-allocate – On a write-miss, the updated block of memory will be added to the cache. Write-no-allocate – On a write-hit, the updated block of memory is not added to the cache. This is also known as a “write around” policy.

Multi-Processor Issues with Write Operations On multi-processor machines, each processor will usually have its own cache. When there is an update made to the data, information may become “stale” on the other caches, regardless of the write policy. One solution is to cache only non-shareable memory. Another solution is to update all caches on any write operation.

Memory to Cache Mapping Schemes There are a variety of ways to map blocks of memory to lines in the cache. We will discuss the following schemes: Associative mapping Direct mapping Set-associative mapping

Associative Mapping Also known as fully associative mapping, any block of memory can be mapped to any line in the cache. The block number is appended to the line as a tag. There is a bit to indicate whether the line is valid or not. This is difficult to implement. Associative memory is required, which is very expensive.

Direct Mapping Every block in main memory can be loaded to one and only one line in the cache. The lines in cache have a tag to indicate which block of main memory is in the cache. This is easy to implement, but inflexible. Care must be taken in how the blocks are divided.

Set-Associative Mapping Blocks can be mapped to a subset of the lines in cache. These mapping types are usually either “Two- way” or “Four-way”. This means that a block can be mapped to either 2 or 4 lines in cache. LRU is often used in this mapping process. Due to the low number of lines that must be considered, this becomes feasible. This is more flexible than direct mapping, but easier to implement than associative mapping.

Another way to look at associativity… Direct mapping can be thought of as one end of a spectrum, where a block can only be written to one line in the cache. Set-associative mapping is the middle ground, where a block can be stored to a subset of the lines in cache, but not to any line. (Fully) associative mapping is the opposite extreme – a block can be written to any line.

Associativity Illustration

Associativity tradeoff Checking more lines requires more power and area. However, caches with more associativity suffer fewer misses. A rule of thumb is that doubling the associativity has about the same effect on hit-rate as doubling the cache size. This rule breaks down beyond 4- way set associative mappings.

References “Assembly Language and Computer Architecture Using C++ and Java” by Anthony J. Dos Reis, section 14.8 (p ) module3b2.htmhttp:// module3b2.htm ~cs232/lectures/21-Cache-Performance.pdfhttp://www-courses.cs.uiuc.edu/ ~cs232/lectures/21-Cache-Performance.pdf