Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter IX Memory Organization CS 147 Presented by: Duong Pham.

Similar presentations


Presentation on theme: "Chapter IX Memory Organization CS 147 Presented by: Duong Pham."— Presentation transcript:

1

2 Chapter IX Memory Organization CS 147 Presented by: Duong Pham

3 Introduction In chapter IV we look at two simple computers consisting of a CPU, I/O subsystem, and a memory subsystem. The memory of these computers was build using only ROM and RAM. This memory subsystem is fine for computers that perform a specific task: –examples: controlling a microwave oven – controlling a dishwasher, etc.. However, a complex computers cannot run on a memory subsystem consisting only of such physical memory because it would be relatively slow and somewhat limited.

4 Overview Hierarchy of Memory System Cache Memory –Associative Memory –Cache Memory with Associative Mapping –Cache Memory with Direct Mapping –Cache Memory with Set-Associative Mapping –Replacing Data in the Cache –Writing Data to the Cache –Cache Performance

5 Hierarchy of Memory System A computer system is not constructed using a single type of memory. In fact, several types of memory are used. –For examples: Level 1 cache (L1 cache) – Level 2 cache (L2 cache) – Physical Memory – Virtual Memory The most well known element of the memory subsystem is the physical memory, which is constructed using DRAM chips. There is also a cache controller which copies data from the physical memory to cache memory before or when the CPU needs it. In general, the closer a component is to the processor, the faster it is and the more expensive it is. Therefore, memory system tend to increase in size as they move away from the CPU.

6 CPU with L1 cache CPU with L1 cache L2 cache L2 cache Physical memory Physical memory Virtual memory storage Virtual memory storage This is the hierarchy of the memory system.

7 Cache Memory In general, the goal of cache memory is to minimize the processor’s memory access time at a reasonable cost. The main design of these cache memory is to move instructions and data into cache before the microprocessor’s tries to access them. This means that if we were to achieve this goal, system performance would improved greatly. This is the principle behind the Harvard architecture for computers. Instead of have separate caches for instructions and data, it may have one unified cache for both.

8 Associative Memory Cache memory can be constructed using either SRAM or associative memory (content addressable memory). Unlike other RAM, associative memory is accessed differently. To access data in associative memory, it searches all of its locations in parallel and marks the locations that match the specified data input. The matching data are then read out sequentially.

9 Associative Memory cont. To illustrate this, consider a simple associative memory consisting of eight words, each with 16 bits. Note that each word has one additional bit labeled v. This is called the valid bit. If a 1 is shown, it indicates that the word contains valid data. The 0 shows that the data is not valid. Data register Mask register Memory Output register Match register Data Read Write Data v 0000 1111 0000 1011 1000 0000 1000 1000 0011 1101 1111 1111 0100 1001 1000 1000 0011 1101 0011 0000 1010 0000 1010 1101 0000 0111 1010 0000 0000 0000 1011011010110110

10 Associative Memory cont. Example: –to accessed data in the associative memory that has 1010 as its four high order bits. –The CPU would load the value 1111 0000 0000 0000 into the mask register. – Each bit that is to be checked, regardless of the value it has is set to 1; all the other bits are set to zero. –The CPU also loads the value 1010 xxxx xxxx xxxx into the data register. –The four leading bits are to be matched and the rest can be anything. –A match occurs if for every bit position that has a value of 1 in the mask register and the location of that valid bit is set to 1. Otherwise it’s set to zero.

11 Associative Memory cont. Writing data to associative memory is straight forward. The CPU supplies data to the data register and asserts the write signal. The associative memory checks for a location whose valid bit is zero. If it finds one, it will store that information into that location. If it find none, it must clear out a location before it can store that data.

12 Cache Memory with Associative Mapping Associative memory can be used to construct a cache with associative mapping, or an associative cache. The figure shown at right is an associative cache for a 68K of 8-bit memory system. An associative cache from associative memory that is 24-bit wide. The first 16-bit is the memory address. The last 8-bit would be data that is stored in physical memory. It works just like the associative memory as I’ve describe before. Data Register Address X Data Register Mask Register 1111 1111 1111 1111 0000 0000 Output Register Match Register 168 24 Memory Address Data 168 Valid bit

13 Cache Memory with Direct Mapping Since associative memory is much more expensive than SRAM, a cache mapping scheme that uses standard SRAM can be much more larger than associative cache and still cost less. This is called direct mapping. To illustrate this, we consider a 1k cache for the Relatively Simple (R.S) CPU as shown on the right. Since the cache is 1K, the 10 low- order address bits( index) select on specific location in the cache. As in associative cache, it contains a valid bit to denote whether or not the location has valid data. In addition, a tag field contains the high-order bits of the original address that were not a part of the index. Therefore, the six high-order bits are stored in the tag field. Last, the cached data value is stored as the value. Output Register From R.S. CPU 10 (A[9…0]) 6(A[15…10]) TagData Valid 000000101010101

14 Cache Memory with Direct Mapping cont. For example, consider location 0000 0011 1111 1111 of physical memory, which contains data 1010 1010. This data can only be stored in one location in the cache. The location that has the same 10 low-order address bits as the original address, or 11 1111 1111. However, any address of the form xxxx xx11 1111 1111 would map to this same cache location. This is the purpose of the tag field. In the previous picture, the tag value for this location is 00 0000. This means that the data stored at location 11 1111 1111 is actually the data from physical memory location 0000 0011 1111 1111, which is 1010 1010. Also, in the previous picture, we see a 1 in the valid section, if the bit was 0, none of this would be considered because the data in that location is not valid.

15 Cache Memory with Direct Mapping cont. Although direct-mapped cache is much less expensive than the associative cache, it is also much less flexible. In associative cache any word of physical memory can occupy any word of cache. However, in direct-mapped cache, each word of physical memory can be mapped to only one specific location. This is a problems for certain of programs. A good compiler will allocate the cod so this does not happen. However, it does illustrate a problem that can occur due to inflexibility of direct mapping. Set-associative mapping seeks to alleviate this problem while taking advantage of the strengths of direct-cache mapping method. This brings us to the next topic.

16 Cache Memory with Set-Associative Mapping Set-associative cache can makes use of relatively low-cost SRAM while trying to alleviate the problems of overwriting data inherent to direct mapping. This process is organized just like direct mapped cache except each address in cache can contain more than one data value. A cache in which each location can contain n bytes or words of data is called an n-way set- associative cache.

17 Set-associative mapping cont. Let consider the 1K, 2-way set-associative cache for the R.S. CPU. Each location contains two groups of fields, one for each way of the cache. The tag field is the same as in direct mapped cache except it’s 1 bit longer. Since the cache holds 1K data entries, and each location holds 2 data values, there are 512 locations total. The 9-bit address select the cache location and the remaining 7-bit specify the tag value. As before, the data field contains the data from the physical memory location. The count/valid field serves 2 purposes: –(1) One bit of this field is a valid bit, just like the cache mapping schemes. –(2) the count value used to keep track of when data was accessed. This information determines which piece of data will be replaced when a new value is loaded into the cache.

18 7(A[15…..9]) 9(A[8….0]) F From R.S. CPU Tag Data Count/valid Tag Data Count/valid Two-way set-associative cache for the R.S. CPU.

19 Replacing Data in the Cache As you know, when a computer is powered up, it performs several functions necessary to ensure its proper operation. Among those tasks, it must initialize its cache. Therefore, it set the valid bits to 0, much like asserting a register’s clear input. When the computer begins to execute a program, it fetches instructions and data from memory and load it into the cache. It works well if the cache is empty or sparsely populated. However, the computer will need to move data into cache locations that are already occupied. Then the problems is to decide which data to move out of the cache and how to preserve that data in physical memory. Direct mapping offers the easiest solution to this problem.

20 Replacing Data in the Cache cont. Since associative cache allows any location in physical memory to be mapped to any location in cache. It does not have to move data out of cache and back into physical memory unless it has no location without valid data. There are a number of replacement method that can be use to do this. Here are a few of the more popular ones that are used frequently: –FIFO (First In First Out) –LRU (Least Recently Used) –Random

21 Replacing Data in the Cache cont. FIFO (First In First Out): –This replacement process fills the associative memory from its top location to its bottom location. –When it copies data to its last location, the cache is full. –It then goes back to the top location, replacing its data with the next value to be stored. –This algorithm always replaces the data that was loaded into the cache first among all the data in the cache at that time. –This method requires nothing other than a register to hold a pointer to the next location to be replaced. –Its performance is generally good.

22 Replacing Data in the Cache cont. LRU (Least Recently Used): –The LRU method keeps track of the relative order in which each location is accessed and replaces the least recently used value with the new data. –This requires a counter for each location in cache and generally not used with associative caches. –However, it is used frequently with set-associative cache memory. Random: –The name said it all. –Random method selects a location to use for the new data. –In spite of the lack of logic to its selection of location, this replacement method produces good performance closed to that of the FIFO method.

23 Writing Data to the Cache To write data to the cache, we use two methods called write-through and write-back. Write-through: –In write-through, every time a value is written from the CPU into a location in the cache, it is also written into the corresponding location in physical memory. –This guarantees that physical memory always contains the correct value, but it requires additional time for the writes to physical memory. Write-back: –In write-back, the value written to the cache is not always written to physical memory. –The value is written to physical memory only once, when the data is removed from the cache. –This saves time used by write-through caches to copy their data to physical memory, but also introduces a time frame during which physical memory holds invalid data.

24 Writing Data to the Cache cont. Example: –Let consider a simple program loop: – for I = 1 to 1000 do – x = x + I; –During the loop, the CPU would write a value to x 1000 times. –If we use the write-back method, this loop would only write the result to physical memory one time instead of 1000 times if we were to used write-through method. –Therefore, write-back offers a significant time savings.

25 Writing Data to the Cache cont. However, performance is not the only consideration. Sometimes the currency of data also takes precedence. Another situation that must be addressed is how to write data to locations not currently loaded into the cache. This is called a write-miss. One possibility is to load the location into cache and then write the new value to cache using either write-back or write-through method. This is called write-allocate policy. Then there is the write-no allocate policy. This process updates the value in physical memory without loading it into the cache.

26 Cache Performance The primary reason for including cache memory in a computer is to improve system performance by reducing the time needed to access memory. The two primary components of cache performance are cache hits and cache misses. Cache hits: –Every time the CPU accesses memory, it checks the cache. –If the requested data is in the cache, the CPU accesses the data in the cache, rather than physical memory Cache misses: –If the requested data is not in the cache, the CPU accesses the data from main memory (and usually writes the data into the cache as well.)

27 Cache Performance cont. Hit ratio is the percentage of memory accesses that are served from the cache, rather than from physical memory. The higher the hit ratio, the more times the CPU accesses the relatively fast cache memory and the better the system performance. The average memory access time(Tm) is the weighted average of the cache access time, Tc, plus the access time for physical memory, Tp. The weighing factor is the hit ratio h. Therefore, Tm can be expressed as: –Tm = h Tc + (1 - h) Tp h 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 TmTm 60 ns 55 ns 50 ns 45 ns 40 ns 35 ns 30 ns 25 ns 20 ns 15 ns 10 ns * This is the table for the hit ratios and average memory access times

28 Cache Performance cont. The rest of section 9.2 (pages 393-395) show the different methods of cache activity using all those method that I’ve been discussing so far. It uses the average memory access time (Tm) equation to generate results (hit ratio and average memory access time (Tm)) for each different methods. If you want to take a look at those examples to see how they were process and generate those results, take a look at those pages I’ve mention above. This concluded my presentation. Thank you.

29 Any questions?


Download ppt "Chapter IX Memory Organization CS 147 Presented by: Duong Pham."

Similar presentations


Ads by Google