Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.

Slides:



Advertisements
Similar presentations
M. Mateen Yaqoob The University of Lahore Spring 2014.
Advertisements

Chapter 12 Memory Organization
Cache Memory Locality of reference: It is observed that when a program refers to memory, the access to memory for data as well as code are confined to.
CS2100 Computer Organisation Cache II (AY2014/2015) Semester 2.
Processor - Memory Interface
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Oct 31, 2005 Topic: Memory Hierarchy Design (HP3 Ch. 5) (Caches, Main Memory and.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy.
EENG449b/Savvides Lec /13/04 April 13, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
Chapter IX Memory Organization CS 147 Presented by: Duong Pham.
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
Computer ArchitectureFall 2008 © November 3 rd, 2008 Nael Abu-Ghazaleh CS-447– Computer.
The Memory System: Memory Hierarchy
VIRTUAL MEMORY. Virtual memory technique is used to extents the size of physical memory When a program does not completely fit into the main memory, it.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.
Maninder Kaur CACHE MEMORY 24-Nov
Cache performance CS 147 Prof. Lee Hai Lin Wu Cache performance  Introduction  Primary components –Cache hits Hit ratio –Cache misses  Average memory.
Memory Systems Architecture and Hierarchical Memory Systems
Cache memory October 16, 2007 By: Tatsiana Gomova.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Chapter 9: Virtual Memory Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel.
Chapter Twelve Memory Organization
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Computer Architecture And Organization UNIT-II Structured Organization.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 10 Memory Hierarchy.
Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)
L/O/G/O Cache Memory Chapter 3 (b) CS.216 Computer Architecture and Organization.
Computer Architecture Lecture 26 Fasih ur Rehman.
Chapter 9 Memory Organization By Jack Chung. MEMORY? RAM?
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
CS 1104 Help Session I Caches Colin Tan, S
Cache Memory By Tom Austin. What is cache memory? A cache is a collection of duplicate data, where the original data is expensive to fetch or compute.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Caches Where is a block placed in a cache? –Three possible answers  three different types AnywhereFully associativeOnly into one block Direct mappedInto.
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Nov. 15, 2000Systems Architecture II1 Machine Organization (CS 570) Lecture 8: Memory Hierarchy Design * Jeremy R. Johnson Wed. Nov. 15, 2000 *This lecture.
Review °We would like to have the capacity of disk at the speed of the processor: unfortunately this is not feasible. °So we create a memory hierarchy:
1  2004 Morgan Kaufmann Publishers Chapter Seven Memory Hierarchy-3 by Patterson.
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 5:
1 Chapter Seven. 2 Users want large and fast memories! SRAM access times are ns at cost of $100 to $250 per Mbyte. DRAM access times are ns.
Lecture 20 Last lecture: Today’s lecture: Types of memory
Cache Memory By Ed Martinez.  The fastest and most expensive memory on a computer system that is used to store collections of data.  Uses very short.
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
The Memory Hierarchy (Lectures #17 - #20) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Chapter 5 Large and Fast: Exploiting Memory Hierarchy.
CACHE MEMORY CS 147 October 2, 2008 Sampriya Chandra.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
COMP 3221: Microprocessors and Embedded Systems Lectures 27: Cache Memory - III Lecturer: Hui Wu Session 2, 2005 Modified.
Cache memory Replacement Policy Prof. Sin-Min Lee Department of Computer Science.
1 Contents Memory types & memory hierarchy Virtual memory (VM) Page replacement algorithms in case of VM.
نظام المحاضرات الالكترونينظام المحاضرات الالكتروني Cache Memory.
Chapter 9 Memory Organization. 9.1 Hierarchical Memory Systems Figure 9.1.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Memory Hierarchy Ideal memory is fast, large, and inexpensive
Computer Organization
CAM Content Addressable Memory
CSC 4250 Computer Architectures
How will execution time grow with SIZE?
Cache Memory Presentation I
Morgan Kaufmann Publishers Memory & Cache
Module IV Memory Organization.
How can we find data in the cache?
Memory Organization.
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Contents Memory types & memory hierarchy Virtual memory (VM)
CS-447– Computer Architecture Lecture 20 Cache Memories
Presentation transcript:

Chapter 9 Memory Organization By Nguyen Chau

Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping Cache memory with direct mapping Cache memory with set-associative mapping Replacing data in the cache Writing data to the cache Cache performance

Hierarchical memory systems A computer system is not constructed using a single type of memory. There are several types of memory are used. For examples: Level 1 cache (L1 cache) For examples: Level 1 cache (L1 cache) Level 2 cache (L2 cache) Level 2 cache (L2 cache) Physical Memory Physical Memory Virtual Memory Virtual Memory The most well known element of the memory subsystem is the physical memory, which is constructed using dynamic random access memory (DRAM) chips.

CPU with L1 cache CPU with L1 cache L2 cache L2 cache Physical memory Physical memory Virtual memory storage Virtual memory storage Generic Memory Hierarchy

Cache Memory Cache memory is constructed using static RAM (SRAM) chips. The goal of cache memory is to minimize the processor’s memory access time. A fast microprocessor with a clock frequency over 500 MHz, resulting in a clock period of less than 2 ns. A fast DRAM has access times about 30 times longer; around 60 ns (30 x 2). A computer with no cache memory would spend most of its time waiting for data. This is why cache memory is needed for it has access time of about 10 ns.

Associative Memory Cache memory can be constructed using either SRAM or associative memory (content addressable memory). Unlike other RAM, associative memory is accessed differently. To access data in associative memory, it searches all of its locations in parallel and marks the locations that match the specified data input. The matching data are then read out sequentially

Associative Memory cont. Consider a simple associative memory consisting of eight words, each with 16 bits. Note that each word has one additional bit labeled v. This is called the valid bit. 1 is for valid data. 0 is for not valid data. Data register Mask register Memory Output register Match register Data Read Write Data v

Associative Memory cont. Example: To accessed data in the associative memory that has 1010 as its four high order bits. To accessed data in the associative memory that has 1010 as its four high order bits. The CPU would load the value into the mask register. The CPU would load the value into the mask register. Each bit that is to be checked, regardless of the value it has is set to 1; all the other bits are set to zero. Each bit that is to be checked, regardless of the value it has is set to 1; all the other bits are set to zero. The CPU also loads the value 1010 xxxx xxxx xxxx into the data register. The CPU also loads the value 1010 xxxx xxxx xxxx into the data register. The four leading bits are to be matched and the rest can be anything. The four leading bits are to be matched and the rest can be anything. A match occurs if for every bit position that has a value of 1 in the mask register and the location of that valid bit is set to 1. Otherwise it’s set to zero. A match occurs if for every bit position that has a value of 1 in the mask register and the location of that valid bit is set to 1. Otherwise it’s set to zero.

Associative Memory cont. Writing data to associative simple. The CPU supplies data to the data register The CPU asserts the write signal. The associative memory checks valid bit. If it finds one, it will store that information into that location. If it find none, it must clear out a location before it can store that data.

Cache Memory with Associative Mapping Associative memory can be used to construct a cache with associative mapping, or an associative cache. An associative cache from associative memory that is 24-bit wide. The first 16-bit is the memory address. The last 8-bit would be data that is stored in physical memory. This works like the associative memory as described earlier. Data Register Address X Data Register Mask Register Output Register Match Register Memory Address Data 168 Valid bit Associative cache for 68k of 8-bit Memory system.

Cache Memory with Direct Mapping The associative memory is much more expensive than SRAM. This is where the direct mapping comes in. Direct Mapping is a cache mapping scheme that uses standard SRAM. This can be much more larger than associative cache and cost lesser. To illustrate this, we consider a 1k cache for the Relatively Simple (R.S) CPU as shown on the right. Since the cache is 1K, the 10 low- order address bits( index) select on specific location in the cache. As in associative cache, it contains a valid bit to denote whether or not the location has valid data. In addition, a tag field contains the high-order bits of the original address that were not a part of the index. Therefore, the six high-order bits are stored in the tag field. Last, the cached data value is stored as the value. Output Register From R.S. CPU 10 (A[9…0]) 6(A[15…10]) TagData Valid

Cache Memory with Direct Mapping cont. Example: Location of physical memory, which contains data Location of physical memory, which contains data This data can only be stored in one location in the cache where it has the same 10 low-order address bits as the original address, or This data can only be stored in one location in the cache where it has the same 10 low-order address bits as the original address, or However any address of the form xxxx xx would map to this same location. However any address of the form xxxx xx would map to this same location. This is the purpose of the tag field. This is the purpose of the tag field. In the previous picture, the tag value for this location is In the previous picture, the tag value for this location is This means that the data stored at location is actually the data from physical memory location , which is This means that the data stored at location is actually the data from physical memory location , which is Also, in the previous picture, we see a 1 in the valid section, if the bit was 0, none of this would be considered because the data in that location is not valid. Also, in the previous picture, we see a 1 in the valid section, if the bit was 0, none of this would be considered because the data in that location is not valid.

Cache Memory with Direct Mapping cont. Problem with direct Mapping: Although direct-mapped cache is much less expensive than the associative cache, it is also much less flexible. In associative cache any word of physical memory can occupy any word of cache. However, in direct-mapped cache, each word of physical memory can be mapped to only one specific location. This is a problems for certain of programs. A good compiler will allocate the code so this does not happen. However, it does illustrate a problem that can occur due to inflexibility of direct mapping. Set-associative mapping seeks to alleviate this problem while taking advantage of the strengths of direct-cache mapping method.

Cache Memory with Set-Associative Mapping Set-associative cache can makes use of relatively low-cost SRAM while trying to alleviate the problems of overwriting data inherent to direct mapping. This process is organized just like direct mapped cache except each address in cache can contain more than one data value. A cache in which each location can contain n bytes or words of data is called an n-way set- associative cache.

Cache Memory with Set-Associative Mapping Let consider the 1K, 2-way set-associative cache for the R.S. CPU. Each location contains two groups of fields, one for each way of the cache. The tag field is the same as in direct mapped cache except it’s 1 bit longer. Since the cache holds 1K data entries, and each location holds 2 data values, there are 512 locations total. The 9-bit address select the cache location and the remaining 7-bit specify the tag value. As before, the data field contains the data from the physical memory location. The count/valid field serves 2 purposes: (1) One bit of this field is a valid bit, just like the cache mapping schemes. (1) One bit of this field is a valid bit, just like the cache mapping schemes. (2) the count value used to keep track of when data was accessed. (2) the count value used to keep track of when data was accessed. This information determines which piece of data will be replaced when a new value is loaded into the cache.

7(A[15…..9]) 9(A[8….0]) F From R.S. CPU Tag Data Count/valid Tag Data Count/valid Two-way set-associative cache for the Relatively Simple CPU.

Replacing Data in the Cache When a computer is powered up, it performs several functions necessary to ensure its proper operation. Among those tasks, it must initialize its cache by set the valid bits to 0. When the program is executed by the computer, the computer then fetches instructions and data from memory and load it into the cache. This works well when the cache is empty or sparsely populated. However, the computer will need to move data into cache locations that are already occupied. Then the problems is to decide which data to move out of the cache and how to preserve that data in physical memory. Direct mapping offers the easiest solution to this problem. Since associative cache allows any location in physical memory to be mapped to any location in cache. It does not have to move data out of cache and back into physical memory unless it has no location without valid data.

Replacing Data in the Cache cont. There many replacement methods that can be use to do this. Here are a few popular methods that are used frequently: FIFO (First In First Out) LRU (Least Recently Used) Random

FIFO (First In First Out): This strategy fills the associative memory from its top location to its bottom location. This strategy fills the associative memory from its top location to its bottom location. When it copies data to its last location, the cache is full. When it copies data to its last location, the cache is full. It then goes back to the top location, replacing its data with the next value to be stored. It then goes back to the top location, replacing its data with the next value to be stored. This mechanism always replaces the data that was loaded into the cache first among all the data in the cache at that time. This mechanism always replaces the data that was loaded into the cache first among all the data in the cache at that time. This method requires nothing other than a register to hold a pointer to the next location to be replaced. This method requires nothing other than a register to hold a pointer to the next location to be replaced.

Replacing Data in the Cache cont. LRU (Least Recently Used): The LRU method keeps track of the relative order in which each location is accessed and replaces the least recently used value with the new data. The LRU method keeps track of the relative order in which each location is accessed and replaces the least recently used value with the new data. This requires a counter for each location in cache and generally not used with associative caches. This requires a counter for each location in cache and generally not used with associative caches. However, it is used frequently with set-associative cache memory. However, it is used frequently with set-associative cache memory.Random: As the name implies, this method randomly selects a location to use for the new data. As the name implies, this method randomly selects a location to use for the new data. In spite of the lack of logic to its selection of location, this replacement method produces good performance closed to that of the FIFO method. In spite of the lack of logic to its selection of location, this replacement method produces good performance closed to that of the FIFO method.

Writing Data to the Cache There are two methods called write-through and write- back. Write-through: In write-through, every time a value is written from the CPU into a location in the cache, it is also written into the corresponding location in physical memory. In write-through, every time a value is written from the CPU into a location in the cache, it is also written into the corresponding location in physical memory. This guarantees that physical memory always contains the correct value, but it requires additional time for the writes to physical memory. This guarantees that physical memory always contains the correct value, but it requires additional time for the writes to physical memory.

Writing Data to the Cache cont. Write-back: In write-back, the value written to the cache is not always written to physical memory. In write-back, the value written to the cache is not always written to physical memory. The value is written to physical memory only once, when the data is removed from the cache. The value is written to physical memory only once, when the data is removed from the cache. This saves time used by write-through caches to copy their data to physical memory, but also introduces a time frame during which physical memory holds invalid data. This saves time used by write-through caches to copy their data to physical memory, but also introduces a time frame during which physical memory holds invalid data.

Writing Data to the Cache cont. Example: Let consider a simple program loop: Let consider a simple program loop: for I = 1 to 1000 do for I = 1 to 1000 do x = x + I; x = x + I; During the loop, the CPU would write a value to x 1000 times. During the loop, the CPU would write a value to x 1000 times. However, the write-back method for this loop would only write the result to physical memory one time instead of 1000 times. However, the write-back method for this loop would only write the result to physical memory one time instead of 1000 times. This results, write-back offers a significant time savings This results, write-back offers a significant time savings

Writing Data to the Cache cont. However, performance is not the only consideration. Sometimes the currency of data also takes precedence. Another situation that must be addressed is how to write data to locations not currently loaded into the cache. This is called a write-miss. One possibility is to load the location into cache and then write the new value to cache using either write-back or write-through method. This is called write-allocate policy. Then there is the write-no allocate policy. This process updates the value in physical memory without loading it into the cache.

Cache Performance The two primary components of cache performance are cache hits and cache misses. Cache hits: Every time the CPU accesses memory, it checks the cache. Every time the CPU accesses memory, it checks the cache. If the requested data is in the cache, the CPU accesses the data in the cache, rather than physical memory If the requested data is in the cache, the CPU accesses the data in the cache, rather than physical memory Cache misses: If the requested data is not in the cache, the CPU accesses the data from main memory (and usually writes the data into the cache as well.) If the requested data is not in the cache, the CPU accesses the data from main memory (and usually writes the data into the cache as well.)

Cache Performance cont. Hit ratio is the percentage of memory accesses that are served from the cache, rather than from physical memory. The higher the hit ratio, the more times the CPU accesses the relatively fast cache memory and the better the system performance. The average memory access time(Tm) is the weighted average of the cache access time, Tc, plus the access time for physical memory, Tp. The weighing factor is the hit ratio h. Therefore, Tm can be expressed as: Tm = h Tc + (1 - h) Tp Tm = h Tc + (1 - h) Tp h TmTm 60 ns 55 ns 50 ns 45 ns 40 ns 35 ns 30 ns 25 ns 20 ns 15 ns 10 ns Hit ratios and average memory access times

Conclusion The primary reason for including cache memory in a computer is to improve system performance by reducing the time needed to access memory. This concluded my presentation.