Presentation is loading. Please wait.

Presentation is loading. Please wait.

S. Barua – CPSC 440 CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Topics to be covered – Principle.

Similar presentations


Presentation on theme: "S. Barua – CPSC 440 CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Topics to be covered – Principle."— Presentation transcript:

1 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Topics to be covered – Principle of locality – Memory hierarchy – Cache concepts and cache organization – Virtual memory concepts – Impact of cache & virtual memories on performance

2 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu PRINCIPLE OF LOCALITY Two types of locality inherent in programs are: 1. Temporal locality: Locality in time If an item is referenced, it will tend to be referenced again soon 2. Spatial locality: Locality in space If an item is referenced, items whose addresses are close by will tend to be referenced soon The principle of locality allows the implementation of memory hierarchy.

3 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu MEMORY HIERARCHY  Consists of multiple levels of memory with different speeds and sizes.  Goal is to provide the user with memory at a low cost, while providing access at the speed offered by the fastest memory.

4 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu MEMORY HIERARCHY (Continued) CPU Speed Size Cost/bit Implemented Using Cache Fastest Smallest Highest SRAM Memory Main DRAM Memory Secondary SlowestBiggest Lowest Magnetic Memory Disk Memory hierarchy in a computer

5 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CACHE MEMORY Cache represents the level of memory hierarchy between the main memory and the CPU. Terms associated with cache Hit: The item requested by the processor is found in some block in the cache. Miss: The item requested by the processor is not found in the cache.

6 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Terms associated with cache (Continued) Hit rate: The fraction of the memory access found in the cache. Used as a measure of performance of the cache. Hit rate = (Number of hits)  Number of access = (Number of hits)  (# hits + # misses) Miss rate: The fraction of memory access not found in cache. Miss rate = 1.0 - Hit rate

7 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Terms associated with cache (Continued) Hit time: Time to access the cache memory Includes the time needed to determine whether the access is a hit or miss. Miss penalty: Time to replace a cache block with the corresponding block from the memory  the time to deliver this block to the processor

8 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Cache Organizations Three types of cache organizations available  Direct-mapped cache  Set associative cache  Fully associative cache

9 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu DIRECT MAPPED CACHE Each main memory block is mapped to exactly one location in the cache. (It is assumed for right now that 1 block = 1word) For each block in the main memory, a corresponding cache location is assigned based on the address of the block in the main memory. Mapping used in a direct-mapped cache: Cache index = (Memory block address) modulo (Number of blocks in the cache)

10 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Example of a Direct-Mapped Cache Figure 7.5 A direct-mapped cache of 8 blocks

11 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Accessing a Cache Location and Identifying a Hit We need to know 1. Whether a cache block has valid information - done using valid bit and 2. Whether the cache block corresponds to the requested word - done using tags

12 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CONTENTS OF A CACHE MEMORY BLOCK A cache memory block consists of the data bits, tag bits and a valid (V)bit. V bit is set only if the cache block has valid information. Cache V Tag Data index

13 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CACHE AND MAIN MEMORY STRUCTURE Cache Memory index V Tag Block address Data 0 1 2 Block (K words) K-1 Block length (K words) Word length

14 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CACHE CONCEPT CPU Word transfer Cache Block transfer Main Memory

15 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu IDENTIFYING A CACHE HIT The index of a cache block and the tag contents of that block uniquely specify the memory address of the word contained in the cache block. Example: Consider a 32-bit memory address and a cache block size of one word. The cache has 1024 words. Compute the following. Cache index = ? bits Byte offset = ? bits Tag = ? bits Actual cache size = ? bits

16 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Example (Continued) Figure. 7.7 Identifying a hit on the cache block

17 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu A Cache Example The series of memory address references given as word addresses are 22, 26, 22, 26, 16, 3, 16, and 18. Assume a direct- mapped cache with 8 one-word blocks that is initially empty. Label each reference in the list as a hit or miss and show the contents of the cache after each reference.

18 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Index V Tag Data 000 000 001 001 010 010 011 (a) 011 (b) 100 100 101 101 110 110 111 111 Initial state of the cache A Cache Example (Continued)

19 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Index V Tag Data 000 000 001 001 010 010 011 (c) 011 (d) 100 100 101 101 110 110 111 111 A Cache Example (Continued)

20 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Index V Tag Data 000 000 001 001 010 010 011 (e) 011 (f) 100 100 101 101 110 110 111 111 A Cache Example (Continued)

21 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu HANDLING CACHE MISSES If the cache reports a miss, then the corresponding block has to be loaded into the cache from the main memory.  The requested word may be forwarded immediately to the processor as the block is being updated or  The requested word may be delayed until the entire block has been stored in the cache

22 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu HANDLING CACHE MISSES FOR INSTRUCTIONS  Decrement PC by 4  Fetch the block containing the missed instruction from the main memory and write the block into the cache  Write the instruction block into the data portion of the referenced cache block  Copy the upper bits of the referenced memory address into the tag field of the cache memory  Turn the valid (V) bit on  Restart the fetch cycle - this will refetch the instruction, this time finding it in cache

23 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu HANDLING CACHE MISSES FOR DATA  Read the block containing the missed data from the main memory and write the block into the cache  Write the data into the data portion of the referenced cache block  Copy the upper bits of the referenced memory address into the tag field of the cache memory  Turn the valid (V) bit on  The requested word may be forwarded immediately to the processor as the block is being updated or  The requested word may be delayed until the entire block has been stored in the cache

24 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CACHE WRITE POLICIES Two techniques used in handling a write to a cache block in response to a cache write miss:  Write-through Technique Updates both the cache and the main memory for each write  Write-back Technique Writes to cache only and postpones updating the main memory until the block is replaced in the cache

25 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu The write-back strategy usually employs a “dirty bit” associated with each cache block, much the same as the valid bit.  The dirty bit will be set the first time a value is written to the block.  When a block in the cache is to be replaced, its dirty bit is examined.  If the dirty bit has been set, the block is written back to the main memory  otherwise it is simply overwritten. CACHE WRITE POLICIES (Continued)

26 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu TAKING ADVANTAGE OF SPATIAL LOCALITY To take advantage of the spatial locality we should have a block that is larger than one word in length (multiple-word block). When a miss occurs, the entire block (consisting of multiple words that are adjacent) will be fetched from the main memory and brought into cache. The total number of tags and valid bits, in a cache with multiword block, is less because each tag and valid bit are used for several words.

27 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Cache with Multiple-word Blocks - Example Figure 7.9 A 16 KB cache using 16-word blocks

28 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Identifying a Cache Block for a Given Memory Address For a given memory address, the corresponding cache block can be determined as follows: Step 1: Identify the memory block that contains the given memory address Memory block address = (Word address) div (Number of words in the block) Memory block address = (Byte address) div (Number of bytes in the block) (Memory block address is essentially the block number in the main memory.) Step 2: Compute the cache index corresponding to the memory block Cache block address = (Memory Block address) Modulo (Number of blocks in cache)

29 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu HANDLING CACHE MISSES FOR A MULTIWORD BLOCK For a cache read miss, the corresponding block is copied into the cache from the memory. A cache write miss, in a multiword block cache, is handled in two steps:  Step 1: Copy the corresponding block from memory into cache  Step 2:Update the cache block with the requested word

30 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu EFFECT OF A LARGER BLOCK SIZE ON PERFORMANCE In general, the miss rate falls when we increase the block size.  The miss rate may actually go up, if the block size is made very large compared with the cache size, due to the following reasons:  The number of blocks that can be held in the cache will become small, giving rise to a great deal of competition for these blocks.  A block may be bumped out of the cache before many of its words can be used.  Increasing the block size increases the cost of a miss (miss penalty)

31 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu DESIGNING MEMORY SYSTEMS TO SUPPORT CACHES Three memory organizations are widely used:  One-word-wide memory organization  Wide memory organization  Interleaved memory organization Figure 7.11 Three options for designing the memory system

32 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 7.11 Three options for designing the memory system

33 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Example Consider the following memory access times: 1 clock cycle to send the address 10 clock cycles for each DRAM access initiated 1 clock cycle to send a word of data Assume we have a cache block of four words. Discuss the impact of the different organizations on miss penalty and bandwidth per miss.

34 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu MEASURING AND IMPROVING CACHE PERFORMANCE Total cycles CPU spends on a program equals: (Clock cycles CPU spends executing the program) + (Clock cycles CPU spends waiting for the memory system) Total CPU time = Total CPU cycles * Clock cycle time = (CPU execution clock cycles + Memory-stall clock cycles) * Clock cycle time

35 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu MEASURING AND IMPROVING CACHE PERFORMANCE (Continued) Memory-stall clock cycles = Read-stall cycles + Write-stall cycles Read-stall cycles = Number of reads * Read miss rate * Read miss penalty Write-stall cycles = Number of writes * Write miss rate * Write miss penalty Total memory access = Number of reads + Number of writes Therefore, Memory-stall cycles = Total memory accesses* Miss rate * Miss penalty

36 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu MEASURING AND IMPROVING CACHE PERFORMANCE Two ways of improving cache performance:  Decreasing the cache miss rate  Decreasing the cache miss penalty

37 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Example Assume the following: Instruction cache miss rate for gcc = 5% Data cache miss rate for gcc = 10% If a machine has a CPI of 4 without memory stalls and a miss penalty of 12 cycles for all misses, determine how much faster a machine would run with a perfect cache that never missed. The frequency of data transfer instructions in gcc is 33%.

38 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu REDUCING CACHE MISSES BY MORE FLEXIBLE PLACEMENT OF BLOCKS Direct Mapped Cache: A block could go in exactly one place Set Associative Cache: There are a fixed number of locations where each block can be placed.  Each block in the memory maps to a unique set in the cache given by the index field.  A block can be placed in any element of that set. The set corresponding to a memory block is given by: Cache set # = (Block address) modulo (Number of sets in the cache) A set associative cache with n possible locations for a block is called an n-way set associative cache.

39 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu REDUCING CACHE MISSES BY MORE FLEXIBLE PLACEMENT OF BLOCKS (Continued) Fully Associative Cache: A block can be placed in any location in the cache. To find a block in a fully associative cache, all the entries (blocks) in the cache must be searched Figure 7.14: Examples of direct-mapped, set associative and fully associative caches The miss rate decreases with the increase in the degree of associativity.

40 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 7.14: Examples of direct-mapped, set associative and fully associative caches

41 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu LOCATING A BLOCK IN THE CACHE Fig. 7.17: Locating a block in a four-way set associative cache

42 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CHOOSING WHICH BLOCK TO REPLACE Direct-mapped Cache When a miss occurs, the requested block can go in exactly one position. So the block occupying that position must be replaced.

43 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CHOOSING WHICH BLOCK TO REPLACE (Continued) Set associative or fully associative Cache When a miss occurs we have a choice of where to place the requested block, and therefore a choice of which block to replace.  Set associative cache: All the blocks in the selected set are candidates for replacement.  Fully associative cache: All blocks in the cache are candidates for replacement.

44 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Replacing a Block in Set Associative and Fully Associative Caches Strategies employed are:  First-in-first-out (FIFO): The block replace is the one that was brought in in first  Least-frequently used (LFU): The block replaced is the one that is least frequently used  Random: Blocks to be replaced are randomly selected  Least Recently Used (LRU): The block replaced is the one that has been unused for the longest time. LRU is the most commonly used replacement technique.

45 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu REDUCING THE MISS PENALTY USING MULTILEVEL CACHES To further close the gap between the fast clock rates of modern processors and the relatively long time required to access DRAMs, high-performance microprocessors support an additional level of caching. This second-level cache (often off chip in a separate set of SRAMs) will be accessed whenever a miss occurs in the primary cache. Since the access time for the second-level cache is significantly less than the access time of the main memory, the miss penalty of the primary cache is greatly reduced.

46 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Evolution of Cache organization 80386: No on-chip cache. Employs a direct-mapped external cache with a block size of 16 bytes (4, 32-bit words). Employs write-through technique. 80486: Has a single on-chip cache of 8 KByte with a block size of 16 bytes and a 4-way set associative organization. Employs write-through technique and LRU replacement algorithm. Pentium/Pentium Pro: Employs split instruction and data caches (2 on-chip caches, one for data and one for instructions). Each cache is 8 KByte with a block size of 32 bytes and a 4-way set associative organization. Employs a write-back policy and the LRU replacement algorithm. Supports the use of a 2-way set associative level 2 cache of 256 or 512 Kbytes with a block size of 32, 64, or 128 bytes. Employs a write-back policy and the LRU replacement algorithm. Can be dynamically configured to support write-through caching. In Pentium Pro, the secondary cache is on a separate die, but packaged together with the processor.

47 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Evolution of Cache organization (Continued) Pentium II: Employs split instruction and data caches. Each cache is 16 Kbytes. Supports the use of a level 2 cache of 512 Kbytes. PII Xeon: Employs split instruction and data caches. Each cache is 16 Kbytes. Supports the use of a level 2 cache of 1 Mbytes or 2 Mbytes. Celeron: Employs split instruction and data caches. Each cache is 16 Kbytes. Supports the use of a level 2 cache of 128 Kbytes. Pentium III: Employs split instruction and data caches. Each cache is 16 Kbytes. Supports the use of a level 2 cache of 512 Kbytes.

48 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Evolution of Cache Organization (Continued) Pentium 4: Employs split instruction and data caches (2 on-chip caches, one for data and one for instructions). Each cache is 8 KByte with a block size of 64 bytes and a 4-way set associative organization. Employs a write-back policy and the LRU replacement algorithm. Supports the use of a 2-way set associative level 2 cache of 256 Kbytes with a block size of 128 bytes. Employs a write-back policy and the LRU replacement algorithm. Can be dynamically configured to support write-through caching.

49 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Evolution of Cache organization (Continued) Power PC: Model 601 has a single on-chip 32 Kbytes, 8-way set associative cache with a block size of 32 bytes. Model 603 has two on-chip 8 Kbytes, 2-way set associative caches with a block size of 32 bytes. Model 604 has two on-chip 16 KByte, 4-way set associative caches with a block size of 32 bytes. Uses LRU replacement algorithm and write-through and write- back techniques. Employs a 2-way set associative level 2 cache of 256 or 512 Kbytes with a block size of 32 bytes. Model 620 has two on-chip 32 Kbytes, 8-way set associative caches with a block size of 64 bytes. Model G3 has two on-chip 32 Kbytes, 8-way set associative caches with a block size of 64 bytes. Model G4 has two on-chip 32 Kbytes, 8-way set associative caches with a block size of 32 bytes

50 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu ELEMENTS OF CACHE DESIGN The key elements that serve to classify and differentiate cache architectures are as follows: – Cache size – Mapping function  Direct  Set associative  Fully associative – Replacement algorithms (for set and fully associative)  Least-recently used (LRU)  First-in-first-out (FIFO)  Least-frequently used (LFU)  Random – Write policy  Write-through  Write-back – Block size – Number of caches  Single-level or two-level  Unified or split

51 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu CACHE SIZE: The size of the cache should be small enough so that the overall average cost per bit is close to that of the memory alone and large enough so that the overall average access time is close to that of the cache alone. Large caches tend to be slightly slower than small ones (because of the additional gates involved). Cache size is also limited by the available chip and board area. Because the performance of the cache is very sensitive to the nature of the workload, it is almost impossible to arrive at an “optimum” cache size. But studies have suggested that cache sizes of between 1K and 512K words would be optimum.

52 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu MAPPING FUNCTION: The choice of the mapping function dictates how the cache is organized. The direct mapping technique is simple and inexpensive to implement. The main disadvantage is that there is a fixed cache location for any given block. Thus, if for example a program happens to repeatedly reference words from two different blocks that map into the same cache location, then the blocks will be continually swapped in the cache, and the hit ratio will be very low. With (fully) associative mapping, there is flexibility as to which block to replace when a new block is read into the cache. Replacement algorithms are designed to maximize the hit ratio. The principal disadvantage is the complex circuitry required to examine the tags of all cache locations in parallel. Set associative mapping is a compromise that exhibits the strengths of both the direct and fully associative approaches without their disadvantages. The use of two blocks per set is the most common set associative organization. It significantly improves the hit ratio over direct mapping.

53 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu REPLACEMENT ALGORITHMS: For set associative and fully associative mapping, a replacement algorithm is required. To achieve high speed, such algorithms must be implemented in hardware. WRITE POLICY: The write-through technique, even though simple to implement, has the disadvantage that it generates substantial memory traffic and may create a bottleneck. The write-back technique minimizes memory writes. The disadvantage is that portions of the main memory are invalid, and hence access by I/O modules can be allowed only through the cache. This calls for complex circuitry and a potential bottleneck. BLOCK SIZE: Larger blocks reduce the number of blocks that fit into a cache. Because each block fetch overwrites older cache contents, a small number of cache blocks result in data being overwritten shortly after it is fetched. Also, as a block becomes larger, each additional word is farther from the requested word, therefore less likely to be needed in the near future. The relationship between block size and hit ratio is complex, depending on the locality characteristics of a given program. Studies have shown that a block size of from 4 to 8 addressable units (words or bytes) would be reasonably close to optimum.

54 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu NUMBER OF CACHES: Two aspects have to be considered here: number of levels of caches and the use of unified versus split caches. Single- Versus Two-Level Caches: As logic density has increased, it has become possible to have cache on the same chip as the processor: the on-chip cache. The on-chip caches reduces the processor’s external bus activity and therefore speeds up execution times and increase the overall system performance. If the system is also provided with an off-chip or external cache, then the system is said to have two-level cache, with the on-chip cache designated as level 1 (L1) and the external cache designated as level 2 (L2). In the absence of an external cache, for every on-chip cache miss, the processor will have to access the DRAM. Due to the typical slow bus speed and slow DRAM access time the overall performance of the system will go down. The potential savings due to the use of an L2 cache depends on the hit rates in both the L1 and L2 caches. Studies have shown that, in general, the use of L2 cache does improve performance. Unified Versus Split Cache: When the on-chip cache first made its appearance, many of the designs consisted of a single on-chip cache used to store both data and instructions. More recently, it has become common to split the on-chip cache into two: one dedicated to instructions and one dedicated to data. Unified cache has the following advantages: For a given cache size, a unified cache has a higher hit rate than split caches because it balances the load between instruction and data fetches automatically. Only one cache needs to be designed and implemented. The advantage of split cache is that it eliminates the contention for cache between the instruction fetch unit and the execution unit. This is extremely important in any design that implements pipelining of instructions.

55 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu VIRTUAL MEMORY Virtual memory permits each process to use the main memory as if it were the only user, and to extend the apparent size of accessible memory beyond its actual physical size. The virtual address generated by the CPU is translated into a physical address, which in turn can be used to access the main memory. The process of translating the virtual address into a physical address is called memory mapping or address translation. Page:A virtual memory block Page fault:A virtual memory miss Figure 7.19The virtual addressed memory with pages mapped to the main memory Figure7.20Mapping from a virtual to a physical address

56 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 7.19The virtual addressed memory with pages mapped to the main memory

57 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure7.20Mapping from a virtual to a physical address

58 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu PLACING A PAGE AND FINDING IT AGAIN Operating system must maintain a page table. Page Table  Maps virtual pages to physical pages or else to locations in the secondary memory  Resides in memory  Indexed with the page number from the virtual address  Contains the pp# of the corresponding vp# Each program has its own page table, which maps the virtual address space of that program to main memory. No tags are required in the page table because the page table contains a mapping for every possible virtual page.

59 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu PLACING A PAGE AND FINDING IT AGAIN (Continued) Page table register  Indicates the location of the page table in the memory  Points to the start of the page table. Figure 7.21 Mapping from a virtual to a physical address using the page table register and the page table

60 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 7.21 Mapping from a virtual to a physical address using the page table register and the page table

61 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu PAGE FAULTS If the valid bit for a virtual page is off, a page fault occurs.  Operating system is given control at this point (exception mechanism)  OS finds the page in the next level of the hierarchy (magnetic disc for example)  OS decide where to place the requested page in main memory OS also creates a data structure that tracks which processes and which virtual addresses use each physical page. When a page fault occurs, if all the pages in the main memory are in use, the OS has to choose a page to replace. The algorithm usually employed is the LRU replacement algorithm.

62 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu WRITES In a virtual memory system, writes to the disk take hundreds of thousands of cycles. Hence write-through is impractical. The strategy employed is called write-back (copy back). Write-back technique  Individual writes are accumulated into a page  When the page is replaced in the memory, it is copied back into the disk.

63 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu MAKING ADDRESS TRANSLATION FAST: THE TRANSLATION-LOOKASIDE BUFFER (TLB) If a CPU has to access a page table resident in the memory to translate every memory access, the virtual memory would have too much overhead. Instead a TLB cache can be used to implement the page table. Figure 7.23 TLB acts as a cache for page table references

64 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu Figure 7.23 TLB acts as a cache for page table references

65 S. Barua – CPSC 440 sbarua@fullerton.edu http://sbarua.ecs.fullerton.edu FIG. 7.24 INTEGRATING VIRTUAL MEMORY, TLBs, & CACHE


Download ppt "S. Barua – CPSC 440 CHAPTER 7 LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Topics to be covered – Principle."

Similar presentations


Ads by Google