1 204521 Digital Computer Architecture Lecture 9-1 Virtual Memory Pradondet Nilagupta Original Note By Prof. Mike Schulte Spring 2001.

1 204521 Digital Computer Architecture Lecture 9-1 Virtual Memory Pradondet Nilagupta Original Note By Prof. Mike Schulte Spring 2001

Lecture 9-1 Virtual Memory Original Note By Prof. Mike Schulte Present by Pradondet Nilagupta

Virtual Memory Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). VM address translation a provides a mapping from the virtual address of the processor to the physical address in main memory or on disk. VM provides the following benefits –Allows multiple programs to share the same physical memory –Allows programmers to write code as though they have a very large amount of main memory –Automatically handles bringing in data from disk Cache terms vs. VM terms –Cache block => page or segment –Cache Miss => page fault or address fault

4 204521 Digital Computer Architecture Virtual Memory Basics Programs reference “ virtual ” addresses in a non- existent memory –These are then translated into real “ physical ” addresses –Virtual address space may be bigger than physical address space Divide physical memory into blocks, called pages –Anywhere from 512 to 16MB (4k typical) Virtual-to-physical translation by indexed table lookup –Add another cache for recent translations (the TLB) Invisible to the programmer –Looks to your application like you have a lot of memory! –Anyone remember overlays?

5 204521 Digital Computer Architecture VM: Page Mapping Process 1 ’ s Virtual Address Space Process 2 ’ s Virtual Address Space Physical Memory Disk Page Frames

6 204521 Digital Computer Architecture VM: Address Translation Virtual page numberPage offset Physical page numberPage offset Page Table base Per-process page table Valid bit Protection bits Dirty bt Reference bit 12 bits20 bits Log 2 of pagesize To physical memory

7 204521 Digital Computer Architecture Typical Page Parameters It ’ s a lot like what happens in a cache –But everything (except miss rate) is a LOT worse ParameterValue Page Size4KB – 64KB L1 Cache Hit Time1-2 clock cycles Virtual Hit (e.g. mapped to DRAM) 50-400 clock cycles Miss Penalty (all the way to disk)700k-6M clock cycles Disk Access Time500k-4M clock cycles Page Transfer Time200k-2M clock cycles Page Fault Rate.001% -.00001% Main Memory Size4MB – 4GB

8 204521 Digital Computer Architecture Paging vs. Segmentation Pages are fixed sized blocks Segments vary from 1 byte to 2 32 (for 32bit addresses) bytes AspectPageSegment Words per addressOne – contains page and offset Two – possible large max-size, so need Seg and offset words Programmer visible? NoSometimes ReplacementTrivial – because of fixed size Hard, need to find contiguous space, use garbage collection Memory EfficiencyInternal Fragmentation External Fragmentation Disk EfficiencyYes – adjust page size to balance access and transfer time Not always – segment size varies

Cache and VM Parameters How is virtual memory different from caches? –Software controls replacement - why? –Size of virtual memory determined by size of processor address –Disk is also used to store the file system - nonvolatile

Paged and Segmented VM (Figure 5.38, pg. 442) Virtual memories can be catagorized into two main classes –Paged memory : fixed size blocks –Segmented memory : variable size blocks

Paged vs. Segmented VM Paged memory –Fixed sized blocks (4 KB to 64 KB) –One word per address (page number + page offset) –Easy to replace pages (all same size) –Internal fragmentation (not all of page is used) –Efficient disk traffic (optimize for page size) Segmented memory –Variable sized blocks (up to 64 KB or 4GB) –Two words per address (segment + offset) –Difficult to replace segments (find where segment fits) –External fragmentation (unused portions of memory) –Inefficient disk traffic (may have small or large transfers) Hybrid approaches –Paged segments: segments are a multiple of a page size –Multiple page sizes: (e.g., 8 KB, 64 KB, 512 KB, 4096 KB)

12 204521 Digital Computer Architecture Pages are Cached in a Virtual Memory System Can Ask the Same Four Questions we did about caches Q1: Block Placement –choice: lower miss rates and complex placement or vice versa miss penalty is huge so choose low miss rate ==> place page anywhere in physical memory similar to fully associative cache model Q2: Block Addressing - use additional data structure –fixed size pages - use a page table virtual page number ==> physical page number and concatenate offset tag bit to indicate presence in main memory

13 204521 Digital Computer Architecture Normal Page Tables Size is number of virtual pages Purpose is to hold the translation of VPN to PPN –Permits ease of page relocation –Make sure to keep tags to indicate page is mapped Potential problem: –Consider 32bit virtual address and 4k pages –4GB/4KB = 1MW required just for the page table! –Might have to page in the page table … Consider how the problem gets worse on 64bit machines with even larger virtual address spaces! Alpha has a 43bit virtual address with 8k pages … –Might have multi-level page tables

14 204521 Digital Computer Architecture Inverted Page Tables Similar to a set-associative mechanism Make the page table reflect the # of physical pages (not virtual) Use a hash mechanism –virtual page number ==> HPN index into inverted page table –Compare virtual page number with the tag to make sure it is the one you want –if yes check to see that it is in memory - OK if yes - if not page fault –If not - miss go to full page table on disk to get new entry implies 2 disk accesses in the worst case trades increased worst case penalty for decrease in capacity induced miss rate since there is now more room for real pages with smaller page table

15 204521 Digital Computer Architecture Inverted Page Table Page V Offset FrameOffset Frame Hash = OK Only store entries For pages in physical memory

16 204521 Digital Computer Architecture Address Translation Reality The translation process using page tables takes too long! Use a cache to hold recent translations –Translation Lookaside Buffer Typically 8-1024 entries Block size same as a page table entry (1 or 2 words) Only holds translations for pages in memory 1 cycle hit time Highly or fully associative Miss rate < 1% Miss goes to main memory (where the whole page table lives) Must be purged on a process switch

17 204521 Digital Computer Architecture Back to the 4 Questions Q3: Block Replacement (pages in physical memory) –LRU is best So use it to minimize the horrible miss penalty –However, real LRU is expensive Page table contains a use tag On access the use tag is set OS checks them every so often, records what it sees, and resets them all On a miss, the OS decides who has been used the least –Basic strategy: Miss penalty is so huge, you can spend a few OS cycles to help reduce the miss rate

18 204521 Digital Computer Architecture Last Question Q4: Write Policy –Always write-back Due to the access time of the disk So, you need to keep tags to show when pages are dirty and need to be written back to disk when they ’ re swapped out. –Anything else is pretty silly –Remember – the disk is SLOW!

19 204521 Digital Computer Architecture Page Sizes An architectural choice Large pages are good: –reduces page table size –amortizes the long disk access –if spatial locality is good then hit rate will improve Large pages are bad: –more internal fragmentation if everything is random each structure ’ s last page is only half full Half of bigger is still bigger if there are 3 structures per process: text, heap, and control stack then 1.5 pages are wasted for each process –process start up time takes longer since at least 1 page of each type is required to prior to start transfer time penalty aspect is higher

20 204521 Digital Computer Architecture More on TLBs The TLB must be on chip –otherwise it is worthless –small TLB ’ s are worthless anyway –large TLB ’ s are expensive high associativity is likely ==> Price of CPU ’ s is going up! –OK as long as performance goes up faster

Address Translation with Page Table (Figure 5.40, pg. 444) A page table translates a virtual page number into a physical page number The page offset remains unchaged Page tables are large –32 bit virtual address –4 KB page size –2^20 4 byte table entries = 4MB Page tables are stored in main memory => slow Cache table entries in a translation buffer

Fast Address Translation with Translation Buffer (TB) (Figure 5.41, pg. 446) Cache translated addresses in TB Alpha 21064 data TB –32 entries –fully associative –30 bit tag –21 bit physical address –Valid and read/write bits –Separate TB for instr. Steps in translation –compare page no. to tags –check for memory access violation –send physical page no. of matching tag –combine physical page no. and page offset

Selecting a Page Size Reasons for larger page size –Page table size is inversely proportional to the page size; therefore memory saved –Fast cache hit time easy when cache size < page size (VA caches); bigger page makes this feasible as cache size grows –Transferring larger pages to or from secondary storage, possibly over a network, is more efficient –Number of TLB entries are restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses Reasons for a smaller page size –Want to avoid internal fragmentation: don ’ t waste storage; data must be contiguous within page –Quicker process start for small processes - don ’ t need to bring in more memory than needed

Memory Protection With multiprogramming, a computer is shared by several programs or processes running concurrently –Need to provide protection –Need to allow sharing Mechanisms for providing protection –Provide Base and Bound registers: Base ฃ  Address ฃ  Bound –Provide both user and supervisor (operating system) modes –Provide CPU state that the user can read, but cannot write Branch and bounds registers, user/supervisor bit, exception bits –Provide method to go from user to supervisor mode and vice versa system call : user to supervisor system return : supervisor to user –Provide permissions for each flag or segment in memory

Alpha VM Mapping (Figure 5.43, pg. 451) “ 64-bit ” address divided into 3 segments –seg0 (bit 63=0) user code –seg1 (bit 63 = 1, 62 = 1) user stack –kseg (bit 63 = 1, 62 = 0) kernel segment for OS Three level page table, each one page –Reduces page table size –Increases translation time PTE bits; valid, kernel & user read & write enable

Alpha 21064 Memory Hierarchy The Alpha 21064 memory hierarchy includes –A 32 entry, fully associative, data TB –A 12 entry, fully associative instruction TB –A 8 KB direct-mapped physically addressed data cache –A 8 KB direct-mapped physically addressed instruction cache –A 4 entry by 64-bit instruction prefetch stream buffer –A 4 entry by 256-bit write buffer –A 2 MB directed mapped second level unified cache The virtual memory –Maps a 43-bit virtual address to a 34-bit physical address –Has a page size of 8 KB

Alpha Memory Performance: Miss Rates 8K 2M

Alpha CPI Components Largest increase in CPI due to –I stall: Instruction stalls from branch mispredictions –Other: data hazards, structural hazards

Pitfall: Address space to small One of the biggest mistakes than can be made when designing an architect is to devote to few bits to the address –address size limits the size of virtual memory –difficult to change since many components depend on it (e.g., PC, registers, effective-address calculations) As program size increases, larger and larger address sizes are needed – 8 bit: Intel 8080 (1975) –16 bit: Intel 8086 (1978) –24 bit: Intel 80286 (1982) –32 bit: Intel 80386 (1985) –64 bit: Intel Merced (1998)

Pitfall: Predicting Cache Performance of one Program from Another Program 4KB Data cache miss rate 8%,12%, or 28%? 1KB Instr cache miss rate 0%,3%, or 10%? Alpha vs. MIPS for 8KB Data: 17% vs. 10%

Pitfall: Simulating Too Small an Address Trace

Virtual Memory Summary Virtual memory (VM) allows main memory (DRAM) to act like a cache for secondary storage (magnetic disk). The large miss penalty of virtual memory leads to different stategies from cache –Fully associative, TB + PT, LRU, Write-back Designed as –paged: fixed size blocks –segmented: variable size blocks –hybrid: segmented paging or multiple page sizes Avoid small address size

Summary 2: Typical Choices OptionTLBL1 CacheL2 CacheVM (page) Block Size4-8 bytes (1 PTE) 4-32 bytes32-256 bytes4k-16k bytes Hit Time1 cycle1-2 cycles6-15 cycles10-100 cycles Miss Penalty10-30 cycles 8-66 cycles30-200 cycles 700k-6M cycles Local Miss Rate.1 - 2%.5 – 20%13 - 15%.00001 - 001% Size32B – 8KB1 – 128 KB256KB - 16MB Backing Store L1 CacheL2 CacheDRAMDisks Q1: Block Placement Fully or set associative DMDM or SAFully associative Q2: Block IDTag/block Table Q3: Block Replacemen t Random (not last) N.A. For DM Random (if SA) LRU/LFU Q4: WritesFlush on PTE write Through or back Write-back

1 204521 Digital Computer Architecture Lecture 9-1 Virtual Memory Pradondet Nilagupta Original Note By Prof. Mike Schulte Spring 2001.

Similar presentations

Presentation on theme: "1 204521 Digital Computer Architecture Lecture 9-1 Virtual Memory Pradondet Nilagupta Original Note By Prof. Mike Schulte Spring 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 204521 Digital Computer Architecture Lecture 9-1 Virtual Memory Pradondet Nilagupta Original Note By Prof. Mike Schulte Spring 2001.

Similar presentations

Presentation on theme: "1 204521 Digital Computer Architecture Lecture 9-1 Virtual Memory Pradondet Nilagupta Original Note By Prof. Mike Schulte Spring 2001."— Presentation transcript:

Similar presentations

About project

Feedback