CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.

Slides:



Advertisements
Similar presentations
SE-292 High Performance Computing
Advertisements

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
Lecture 8: Memory Hierarchy Cache Performance Kai Bu
1 Recap: Memory Hierarchy. 2 Memory Hierarchy - the Big Picture Problem: memory is too slow and or too small Solution: memory hierarchy Fastest Slowest.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
7-1 Chapter 7 - Memory Computer Architecture and Organization by M. Murdocca and V. Heuring © 2007 M. Murdocca and V. Heuring Computer Architecture and.
Processor - Memory Interface
Spring 2003CSE P5481 Introduction Why memory subsystem design is important CPU speeds increase 55% per year DRAM speeds increase 3% per year rate of increase.
Overview of Cache and Virtual MemorySlide 1 The Need for a Cache (edited from notes with Behrooz Parhami’s Computer Architecture textbook) Cache memories.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
CS 524 (Wi 2003/04) - Asim LUMS 1 Cache Basics Adapted from a presentation by Beth Richardson
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Memory Hierarchy and Cache Design The following sources are used for preparing these slides: Lecture 14 from the course Computer architecture ECE 201 by.
Cache memory October 16, 2007 By: Tatsiana Gomova.
CMPE 421 Parallel Computer Architecture
Memory Hierarchy and Cache Memory Jennifer Tsay CS 147 Section 3 October 8, 2009.
Lecture 10 Memory Hierarchy and Cache Design Computer Architecture COE 501.
Chapter Twelve Memory Organization
Multilevel Memory Caches Prof. Sirer CS 316 Cornell University.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
CSE 241 Computer Engineering (1) هندسة الحاسبات (1) Lecture #3 Ch. 6 Memory System Design Dr. Tamer Samy Gaafar Dept. of Computer & Systems Engineering.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.
Computer Organization & Programming
Lecture 08: Memory Hierarchy Cache Performance Kai Bu
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
M E M O R Y. Computer Performance It depends in large measure on the interface between processor and memory. CPI (or IPC) is affected CPI = Cycles per.
COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
Memory Hierarchy How to improve memory access. Outline Locality Structure of memory hierarchy Cache Virtual memory.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Memory Hierarchy and Caches. Who Cares about Memory Hierarchy? Processor Only Thus Far in Course CPU-DRAM Gap 1980: no cache in µproc; level cache,
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
LECTURE 12 Virtual Memory. VIRTUAL MEMORY Just as a cache can provide fast, easy access to recently-used code and data, main memory acts as a “cache”
Constructive Computer Architecture Realistic Memories and Caches Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.
CMSC 611: Advanced Computer Architecture
Memory Hierarchy Ideal memory is fast, large, and inexpensive
CSE 351 Section 9 3/1/12.
Computer Architecture
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
CACHE MEMORY.
Lecture 08: Memory Hierarchy Cache Performance
Miss Rate versus Block Size
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Chapter Five Large and Fast: Exploiting Memory Hierarchy
Cache - Optimization.
Cache Memory and Performance
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra

LOCALITY PRINCIPAL OF LOCALITY is the tendency to reference data items that are near other recently referenced data items, or that were recently referenced themselves. TEMPORAL LOCALITY : memory location that is referenced once is likely to be referenced multiple times in near future. SPATIAL LOCALITY : memory location that is referenced once, then the program is likely to be reference a nearby memory location in near future.

CACHE MEMORY Principle of locality helped to speed up main memory access by introducing small fast memories known as CACHE MEMORIES that hold blocks of the most recently referenced instructions and data items. Cache is a small fast storage device that holds the operands and instructions most likely to be used by the CPU.

Memory Hierarchy of early computers: 3 levels – CPU registers – DRAM Memory – Disk storage

Due to increasing gap between CPU and main Memory, small SRAM memory called L1 cache inserted. L1 caches can be accessed almost as fast as the registers, typically in 1 or 2 clock cycle Due to even more increasing gap between CPU and main memory, Additional cache: L2 cache inserted between L1 cache and main memory : accessed in fewer clock cycles.

L2 cache attached to the memory bus or to its own cache bus Some high performance systems also include additional L3 cache which sits between L2 and main memory. It has different arrangement but principle same. The cache is placed both physically closer and logically closer to the CPU than the main memory.

CACHE LINES / BLOCKS Cache memory is subdivided into cache lines Cache Lines / Blocks: The smallest unit of memory than can be transferred between the main memory and the cache.

TAG / INDEX Every address field consists of two primary parts: a dynamic (tag) which contains the higher address bits, and a static (index) which contains the lower address bits The first one may be modified during run-time while the second one is fixed.

VALID BIT / DIRTY BIT When a program is first loaded into main memory, the cache is cleared, and so while a program is executing, a valid bit is needed to indicate whether or not the slot holds a line that belongs to the program being executed. There is also a dirty bit that keeps track of whether or not a line has been modified while it is in the cache. A slot that is modified must be written back to the main memory before the slot is reused for another line.

Example: Memory segments and cache segments are exactly of the same size. Every memory segment contains equally sized N memory lines. Memory lines and cache lines are exactly of the same size. Therefore, to obtain an address of a memory line, it needs to determine the number of its memory segment first and the number of the memory line inside of that segment second, then to merge both numbers. Substitute the segment number with the tag and the line number with the index, and you should have realized the idea in general.

Therefore, cache line's tag size depends on 3 factors: Size of cache memory; Associativity of cache memory; Cacheable range of operating memory. Stag — size of cache tag, in bits; Smemory — cacheable range of operating memory, in bytes; Scache — size of cache memory, in bytes; A — associativity of cache memory, in ways. Here,

CACHE HITS / MISSES Cache Hit: a request to read from memory, which can satisfy from the cache without using the main memory. Cache Miss: A request to read from memory, which cannot be satisfied from the cache, for which the main memory has to be consulted.

CACHE MEMORY : PLACEMENT POLICY There are three commonly used methods to translate main memory addresses to cache memory addresses. Associative Mapped Cache Direct-Mapped Cache Set-Associative Mapped Cache The choice of cache mapping scheme affects cost and performance, and there is no single best method that is appropriate for all situations

Associative Mapping a block in the Main Memory can be mapped to any block in the Cache Memory available (not already occupied) Advantage: Flexibility. An Main Memory block can be mapped anywhere in Cache Memory. Disadvantage: Slow or expensive. A search through all the Cache Memory blocks is needed to check whether the address can be matched to any of the tags.

Direct Mapping To avoid the search through all CM blocks needed by associative mapping, this method only allows # blocks in main memory. # blocks in cache memory blocks to be mapped to each Cache Memory block.

Advantage: Direct mapping is faster than the associative mapping as it avoids searching through all the CM tags for a match. Disadvantage: But it lacks mapping flexibility. For example, if two MM blocks mapped to same CM block are needed repeatedly (e.g., in a loop), they will keep replacing each other, even though all other CM blocks may be available.

Set-Associative Mapping This is a trade-off between associative and direct mappings where each address is mapped to a certain set of cache locations. The cache is broken into sets where each set contains "N" cache lines, let's say 4. Then, each memory address is assigned a set, and can be cached in any one of those 4 locations within the set that it is assigned to. In other words, within each set the cache is associative, and thus the name.

DIFFERENCE BETWEEN LINES, SETS AND BLOCKS In direct-mapped caches, sets and lines are equivalent. However in associative caches, sets and lines are very different things and the terms cannot be interchanged.

BLOCK: fixed sized packet of information that moves back and forth between a cache and main memory. LINE: container in a cache that stores a block as well as other information such as the valid bit and tag bits. SET: collection of one or more lines. Sets in direct-mapped caches consist of a single line. Set in fully associative and set associative caches consists of multiple lines.

ASSOCIATIVITY Associativity : N-way set associative cache memory means that information stored at some address in operating memory could be placed (cached) in N locations (lines) of this cache memory. The basic principle of logical segmentation says that there is only one line within any particular segment to be capable of caching information located at some memory address.

REPLACEMENT ALGORITHM Optimal Replacement: replace the block which is no longer needed in the future. If all blocks currently in Cache Memory will be used again, replace the one which will not be used in the future for the longest time. Random selection: replace a randomly selected block among all blocks currently in Cache Memory.

FIFO (first-in first-out): replace the block that has been in Cache Memory for the longest time. LRU (Least recently used): replace the block in Cache Memory that has not been used for the longest time. LFU (Least frequently used): replace the block in Cache Memory that has been used for the least number of times.

The optimal replacement : the best but is not realistic because when a block will be needed in the future is usually not known ahead of time. The LRU is suboptimal based on the temporal locality of reference, i.e., memory items that are recently referenced are more likely to be referenced soon than those which have not been referenced for a longer time. FIFO is not necessarily consistent with LRU therefore is usually not as good. The random selection, surprisingly, is not necessarily bad.

HIT RATIO and EFFECTIVE ACCESS TIMES Hit Ratio : The fraction of all memory reads which are satisfied from the cache

LOAD-THROUGH STORE-THROUGH Load-Through : When the CPU needs to read a word from the memory, the block containing the word is brought from MM to CM, while at the same time the word is forwarded to the CPU. Store-Through : If store-through is used, a word to be stored from CPU to memory is written to both CM (if the word is in there) and MM. By doing so, a CM block to be replaced can be overwritten by an in-coming block without being saved to MM.

WRITE METHODS Note: Words in a cache have been viewed simply as copies of words from main memory that are read from the cache to provide faster access. However this view point changes. There are 3 possible write actions: –Write the result into the main memory –Write the result into the cache –Write the result into both main memory and cache memory

Write Through: A cache architecture in which data is written to main memory at the same time as it is cached. Write Back / Copy Back: CPU performs write only to the cache in case of a cache hit. If there is a cache miss, CPU performs a write to main memory.

When the cache is missed : Write Allocate: loads the memory block into cache and updates the cache block No-Write allocation: this bypasses the cache and writes the word directly into the memory.

CACHE CONFLICT A sequence of accesses to memory repeatedly overwriting the same cache entry. This can happen if two blocks of data, which are mapped to the same set of cache locations, are needed simultaneously.

EXAMPLE: In the case of a direct mapped cache, if arrays A, B, and C map to the same range of cache locations, thrashing will occur when the following loop is executed: for (i=1; i<n; i++) C[i] = A[i] + B[i]; Cache conflict can also occur between a program loop and the data it is accessing.

CACHE COHERENCY The synchronization of data in multiple caches such that reading a memory location via any cache will return the most recent data written to that location via any (other) cache. Some parallel processors do not cache accesses to shared memory to avoid the issue of cache coherency.

If caches are used with shared memory then some system is required to detect when data in one processor's cache should be discarded or replaced because another processor has updated that memory location. Several such schemes have been devised.

REFERENCES Computer Architecture and Organization: An Integrated Approach by Miles J. Murdocca and Vincent P. Heuring Logic and Computer Design Fundamentals by M.Morris Mano and Charles R. Kime Computer Systems : A Programmers Perspective by Randal E. Bryant and David R. O’Hallaron Websites :