Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.

Chapter 9 Memory Organization By Nguyen Chau

Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping Cache memory with direct mapping Cache memory with set-associative mapping Replacing data in the cache Writing data to the cache Cache performance

Hierarchical memory systems A computer system is not constructed using a single type of memory. There are several types of memory are used. For examples: Level 1 cache (L1 cache) For examples: Level 1 cache (L1 cache) Level 2 cache (L2 cache) Level 2 cache (L2 cache) Physical Memory Physical Memory Virtual Memory Virtual Memory The most well known element of the memory subsystem is the physical memory, which is constructed using dynamic random access memory (DRAM) chips.

CPU with L1 cache CPU with L1 cache L2 cache L2 cache Physical memory Physical memory Virtual memory storage Virtual memory storage Generic Memory Hierarchy

Cache Memory Cache memory is constructed using static RAM (SRAM) chips. The goal of cache memory is to minimize the processor’s memory access time. A fast microprocessor with a clock frequency over 500 MHz, resulting in a clock period of less than 2 ns. A fast DRAM has access times about 30 times longer; around 60 ns (30 x 2). A computer with no cache memory would spend most of its time waiting for data. This is why cache memory is needed for it has access time of about 10 ns.

Associative Memory Cache memory can be constructed using either SRAM or associative memory (content addressable memory). Unlike other RAM, associative memory is accessed differently. To access data in associative memory, it searches all of its locations in parallel and marks the locations that match the specified data input. The matching data are then read out sequentially

Associative Memory cont. Consider a simple associative memory consisting of eight words, each with 16 bits. Note that each word has one additional bit labeled v. This is called the valid bit. 1 is for valid data. 0 is for not valid data. Data register Mask register Memory Output register Match register Data Read Write Data v 0000 1111 0000 1011 1000 0000 1000 1000 0011 1101 1111 1111 0100 1001 1000 1000 0011 1101 0011 0000 1010 0000 1010 1101 0000 0111 1010 0000 0000 0000 1011011010110110

Associative Memory cont. Example: To accessed data in the associative memory that has 1010 as its four high order bits. To accessed data in the associative memory that has 1010 as its four high order bits. The CPU would load the value 1111 0000 0000 0000 into the mask register. The CPU would load the value 1111 0000 0000 0000 into the mask register. Each bit that is to be checked, regardless of the value it has is set to 1; all the other bits are set to zero. Each bit that is to be checked, regardless of the value it has is set to 1; all the other bits are set to zero. The CPU also loads the value 1010 xxxx xxxx xxxx into the data register. The CPU also loads the value 1010 xxxx xxxx xxxx into the data register. The four leading bits are to be matched and the rest can be anything. The four leading bits are to be matched and the rest can be anything. A match occurs if for every bit position that has a value of 1 in the mask register and the location of that valid bit is set to 1. Otherwise it’s set to zero. A match occurs if for every bit position that has a value of 1 in the mask register and the location of that valid bit is set to 1. Otherwise it’s set to zero.

Associative Memory cont. Writing data to associative simple. The CPU supplies data to the data register The CPU asserts the write signal. The associative memory checks valid bit. If it finds one, it will store that information into that location. If it find none, it must clear out a location before it can store that data.

Cache Memory with Associative Mapping Associative memory can be used to construct a cache with associative mapping, or an associative cache. An associative cache from associative memory that is 24-bit wide. The first 16-bit is the memory address. The last 8-bit would be data that is stored in physical memory. This works like the associative memory as described earlier. Data Register Address X Data Register Mask Register 1111 1111 1111 1111 0000 0000 Output Register Match Register 168 24 Memory Address Data 168 Valid bit Associative cache for 68k of 8-bit Memory system.

Cache Memory with Direct Mapping The associative memory is much more expensive than SRAM. This is where the direct mapping comes in. Direct Mapping is a cache mapping scheme that uses standard SRAM. This can be much more larger than associative cache and cost lesser. To illustrate this, we consider a 1k cache for the Relatively Simple (R.S) CPU as shown on the right. Since the cache is 1K, the 10 low- order address bits( index) select on specific location in the cache. As in associative cache, it contains a valid bit to denote whether or not the location has valid data. In addition, a tag field contains the high-order bits of the original address that were not a part of the index. Therefore, the six high-order bits are stored in the tag field. Last, the cached data value is stored as the value. Output Register From R.S. CPU 10 (A[9…0]) 6(A[15…10]) TagData Valid 000000101010101

Cache Memory with Direct Mapping cont. Example: Location 0000 0011 1111 1111 of physical memory, which contains data 1010 1010. Location 0000 0011 1111 1111 of physical memory, which contains data 1010 1010. This data can only be stored in one location in the cache where it has the same 10 low-order address bits as the original address, or 11 1111 1111. This data can only be stored in one location in the cache where it has the same 10 low-order address bits as the original address, or 11 1111 1111. However any address of the form xxxx xx11 1111 1111 would map to this same location. However any address of the form xxxx xx11 1111 1111 would map to this same location. This is the purpose of the tag field. This is the purpose of the tag field. In the previous picture, the tag value for this location is 00 0000. In the previous picture, the tag value for this location is 00 0000. This means that the data stored at location 11 1111 1111 is actually the data from physical memory location 0000 0011 1111 1111, which is 1010 1010. This means that the data stored at location 11 1111 1111 is actually the data from physical memory location 0000 0011 1111 1111, which is 1010 1010. Also, in the previous picture, we see a 1 in the valid section, if the bit was 0, none of this would be considered because the data in that location is not valid. Also, in the previous picture, we see a 1 in the valid section, if the bit was 0, none of this would be considered because the data in that location is not valid.

Cache Memory with Direct Mapping cont. Problem with direct Mapping: Although direct-mapped cache is much less expensive than the associative cache, it is also much less flexible. In associative cache any word of physical memory can occupy any word of cache. However, in direct-mapped cache, each word of physical memory can be mapped to only one specific location. This is a problems for certain of programs. A good compiler will allocate the code so this does not happen. However, it does illustrate a problem that can occur due to inflexibility of direct mapping. Set-associative mapping seeks to alleviate this problem while taking advantage of the strengths of direct-cache mapping method.

Cache Memory with Set-Associative Mapping Set-associative cache can makes use of relatively low-cost SRAM while trying to alleviate the problems of overwriting data inherent to direct mapping. This process is organized just like direct mapped cache except each address in cache can contain more than one data value. A cache in which each location can contain n bytes or words of data is called an n-way set- associative cache.

Cache Memory with Set-Associative Mapping Let consider the 1K, 2-way set-associative cache for the R.S. CPU. Each location contains two groups of fields, one for each way of the cache. The tag field is the same as in direct mapped cache except it’s 1 bit longer. Since the cache holds 1K data entries, and each location holds 2 data values, there are 512 locations total. The 9-bit address select the cache location and the remaining 7-bit specify the tag value. As before, the data field contains the data from the physical memory location. The count/valid field serves 2 purposes: (1) One bit of this field is a valid bit, just like the cache mapping schemes. (1) One bit of this field is a valid bit, just like the cache mapping schemes. (2) the count value used to keep track of when data was accessed. (2) the count value used to keep track of when data was accessed. This information determines which piece of data will be replaced when a new value is loaded into the cache.

7(A[15…..9]) 9(A[8….0]) F From R.S. CPU Tag Data Count/valid Tag Data Count/valid Two-way set-associative cache for the Relatively Simple CPU.

Replacing Data in the Cache When a computer is powered up, it performs several functions necessary to ensure its proper operation. Among those tasks, it must initialize its cache by set the valid bits to 0. When the program is executed by the computer, the computer then fetches instructions and data from memory and load it into the cache. This works well when the cache is empty or sparsely populated. However, the computer will need to move data into cache locations that are already occupied. Then the problems is to decide which data to move out of the cache and how to preserve that data in physical memory. Direct mapping offers the easiest solution to this problem. Since associative cache allows any location in physical memory to be mapped to any location in cache. It does not have to move data out of cache and back into physical memory unless it has no location without valid data.

Replacing Data in the Cache cont. There many replacement methods that can be use to do this. Here are a few popular methods that are used frequently: FIFO (First In First Out) LRU (Least Recently Used) Random

FIFO (First In First Out): This strategy fills the associative memory from its top location to its bottom location. This strategy fills the associative memory from its top location to its bottom location. When it copies data to its last location, the cache is full. When it copies data to its last location, the cache is full. It then goes back to the top location, replacing its data with the next value to be stored. It then goes back to the top location, replacing its data with the next value to be stored. This mechanism always replaces the data that was loaded into the cache first among all the data in the cache at that time. This mechanism always replaces the data that was loaded into the cache first among all the data in the cache at that time. This method requires nothing other than a register to hold a pointer to the next location to be replaced. This method requires nothing other than a register to hold a pointer to the next location to be replaced.

Replacing Data in the Cache cont. LRU (Least Recently Used): The LRU method keeps track of the relative order in which each location is accessed and replaces the least recently used value with the new data. The LRU method keeps track of the relative order in which each location is accessed and replaces the least recently used value with the new data. This requires a counter for each location in cache and generally not used with associative caches. This requires a counter for each location in cache and generally not used with associative caches. However, it is used frequently with set-associative cache memory. However, it is used frequently with set-associative cache memory.Random: As the name implies, this method randomly selects a location to use for the new data. As the name implies, this method randomly selects a location to use for the new data. In spite of the lack of logic to its selection of location, this replacement method produces good performance closed to that of the FIFO method. In spite of the lack of logic to its selection of location, this replacement method produces good performance closed to that of the FIFO method.

Writing Data to the Cache There are two methods called write-through and write- back. Write-through: In write-through, every time a value is written from the CPU into a location in the cache, it is also written into the corresponding location in physical memory. In write-through, every time a value is written from the CPU into a location in the cache, it is also written into the corresponding location in physical memory. This guarantees that physical memory always contains the correct value, but it requires additional time for the writes to physical memory. This guarantees that physical memory always contains the correct value, but it requires additional time for the writes to physical memory.

Writing Data to the Cache cont. Write-back: In write-back, the value written to the cache is not always written to physical memory. In write-back, the value written to the cache is not always written to physical memory. The value is written to physical memory only once, when the data is removed from the cache. The value is written to physical memory only once, when the data is removed from the cache. This saves time used by write-through caches to copy their data to physical memory, but also introduces a time frame during which physical memory holds invalid data. This saves time used by write-through caches to copy their data to physical memory, but also introduces a time frame during which physical memory holds invalid data.

Writing Data to the Cache cont. Example: Let consider a simple program loop: Let consider a simple program loop: for I = 1 to 1000 do for I = 1 to 1000 do x = x + I; x = x + I; During the loop, the CPU would write a value to x 1000 times. During the loop, the CPU would write a value to x 1000 times. However, the write-back method for this loop would only write the result to physical memory one time instead of 1000 times. However, the write-back method for this loop would only write the result to physical memory one time instead of 1000 times. This results, write-back offers a significant time savings This results, write-back offers a significant time savings

Writing Data to the Cache cont. However, performance is not the only consideration. Sometimes the currency of data also takes precedence. Another situation that must be addressed is how to write data to locations not currently loaded into the cache. This is called a write-miss. One possibility is to load the location into cache and then write the new value to cache using either write-back or write-through method. This is called write-allocate policy. Then there is the write-no allocate policy. This process updates the value in physical memory without loading it into the cache.

Cache Performance The two primary components of cache performance are cache hits and cache misses. Cache hits: Every time the CPU accesses memory, it checks the cache. Every time the CPU accesses memory, it checks the cache. If the requested data is in the cache, the CPU accesses the data in the cache, rather than physical memory If the requested data is in the cache, the CPU accesses the data in the cache, rather than physical memory Cache misses: If the requested data is not in the cache, the CPU accesses the data from main memory (and usually writes the data into the cache as well.) If the requested data is not in the cache, the CPU accesses the data from main memory (and usually writes the data into the cache as well.)

Cache Performance cont. Hit ratio is the percentage of memory accesses that are served from the cache, rather than from physical memory. The higher the hit ratio, the more times the CPU accesses the relatively fast cache memory and the better the system performance. The average memory access time(Tm) is the weighted average of the cache access time, Tc, plus the access time for physical memory, Tp. The weighing factor is the hit ratio h. Therefore, Tm can be expressed as: Tm = h Tc + (1 - h) Tp Tm = h Tc + (1 - h) Tp h 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 TmTm 60 ns 55 ns 50 ns 45 ns 40 ns 35 ns 30 ns 25 ns 20 ns 15 ns 10 ns Hit ratios and average memory access times

Conclusion The primary reason for including cache memory in a computer is to improve system performance by reducing the time needed to access memory. This concluded my presentation.

Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.

Similar presentations

Presentation on theme: "Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping.

Similar presentations

Presentation on theme: "Chapter 9 Memory Organization By Nguyen Chau Topics Hierarchical memory systems Cache memory Associative memory Cache memory with associative mapping."— Presentation transcript:

Similar presentations

About project

Feedback