Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu.

Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu Rappoport

Computer Architecture 2012 – virtual memory 2 http://www.youtube.com/watch?v=3ye2OXj32DM http://www.youtube.com/watch?v=3ye2OXj32DM (funny beginning)

Computer Architecture 2012 – virtual memory 3 DRAM (dynamic random-access memory)  Corsair 1333 MHz DDR3 Laptop Memory  Price (at amazon.com): –$43 for 4 GB –$79 for 8 GB  “The physical memory”

Computer Architecture 2012 – virtual memory 4 VM – motivation  Provides isolation between processes –Processes can concurrently run on a single machine –Vm prevents them from accessing the memory of one another –(But still allows for convenient sharing when required)  Provides illusion of large memory –VM size can be bigger than physical memory size –VM decouples program from real size (can differ across machines)  Provides illusion of contiguous memory –Programmers need not worry about where data is placed exactly  Allows for memory dynamic growth –Can add memory to processes at runtime as needed  Allows for memory overcommitment –Sum of VM spaces (across all processes) can be >= physical –DRAM often one of the most costly parts in the system

Computer Architecture 2012 – virtual memory 5 VM – terminology  Virtual address space –Space used by the programmer –“Ideal” = contagious & as big is you’d like  Physical address –The real, underlying physical memory address –Completely abstracted away by OS/HW

Computer Architecture 2012 – virtual memory 6 VM – basic idea  Divide memory (virtual & physical) into fixed size blocks –“page” = chunk of contagious data in virtual space –“frame” = physical memory exactly enough to hold one page –|page| = |frame| (= size) –page size = power of 2 = 2 k (bytes) –By default, k=12 almost always => page size is 4KB  While virtual address space is contiguous –Pages can be mapped into arbitrary frames  Pages can reside –In memory or on disk (hence, overcommitment)  All programs are written using vm address space –HW does on-the-fly translation from virtual and physical addresses –Use a page table to translate between virtual and physical addresses

Computer Architecture 2012 – virtual memory 7 VM – simplistic illustration  Memory acts as a cache for the secondary storage (disk)  Immediate advantages –Illusion of contiguity & of having more physical memory –Program actual location unimportant –Dynamic growth, isolation, & sharing are easy to obtain pages (virtual space) frames (DRAM) address translation disk

Computer Architecture 2012 – virtual memory 8 Translation – use a “page table” 63 page offset (12bit) 0 11 virtual page number (52bit) page offset (12bit) physical frame number (20bit) virtual address (64bit) physical address (32bit) 12 (page size is typically 2 12 byte = 4KB) how to map?

Computer Architecture 2012 – virtual memory 9 Translation – use a “page table” VD frameNumber 1 page table base register 0 valid bit dirty bit AC access control (page size is typically 2 12 byte = 4KB)

Computer Architecture 2012 – virtual memory 10 Translation – use a “page table” 63 page offset (12bit) 0 11 virtual page number (52bit) page offset (12bit) 11 0 physical frame number (20bit) 31 virtual address (64bit) physical address (32bit) VD frameNumber 1 page table base register 0 valid bit dirty bit 12 AC access control 12 (page size is typically 2 12 byte = 4KB)

Computer Architecture 2012 – virtual memory 11 Translation – use a “page table” VD frameNumber AC “PTE” (page table entry)

Computer Architecture 2012 – virtual memory 12 Page tables Valid 1 Physical Memory Disk Page Table points to memory frame or disk address 1 1 1 1 1 1 1 1 0 0 0 Virtual page number

Computer Architecture 2012 – virtual memory 13 Checks  If ( valid == 1 ) page is in main memory at frame address stored in table  Data is readily available (e.g., can copy it to the cache) else /*page fault */ need to fetch page from disk  causes a trap, usually accompanied by a context switch: current process suspended while page is fetched from disk  Access Control –R=read-only, R/W=read/write, X=execute –If ( access type incompatible with specified access rights )  protection violation fault  traps to fault-handler  Demand paging –Pages fetched from secondary memory only upon the first fault –Rather then, e.g., upon file open

Computer Architecture 2012 – virtual memory 14 Page replacement  Page replacement policy –Decided which page to place on disk  LRU (least recently used) –Typically too wasteful (updated upon each memory reference)  FIFO (first in first out) –Simplest: no need to update upon references, but ignores usage  Second-chance –Set per-page “was it referenced?” bit (can be done by HW or SW) –Swap out first page with bit = 0, FIFO order –When traversed, if bit = 1, set it to be 0 and push the associated page to end of the list (in FIFO terms, page becomes newest)  Clock –More efficient variant of second-chance –Pages are cyclically ordered (no FIFO); search clockwise for first page with bit=0; set bit=0 for pages that have bit=1

Computer Architecture 2012 – virtual memory 15 Page replacement – cont.  NRU (not recently used) –More sophisticated LRU approximation –HW or SW maintains per-page ‘referenced’ & ‘modified’ bits –Periodically (clock interrupt), SW turns ‘referenced’ off –Replacement algorithm partitions pages to  Class 0: not referenced, not modified  Class 1: not referenced, modified  Class 2: referenced, not modified  Class 3: referenced, modified –Choose at random a page from the lowest class for removal –Underlying principles (order is important):  Prefer keeping referenced over unreferenced  Prefer keeping modified over unmodified –Can a page be modified but not referenced?

Computer Architecture 2012 – virtual memory 16 Page replacement – advanced  ARC (adaptive replacement cache) –Factors not only recency (when latest access), but also frequency (how many times accessed) –User determines which factor has more weight –Better (but more wasteful) than LRU –Develop by IBM: Nimrod Megiddo & Dharmendra Modha –Details: http://www.usenix.org/events/fast03/tech/full_papers/megiddo/megiddo.pdf http://www.usenix.org/events/fast03/tech/full_papers/megiddo/megiddo.pdf  CAR (clock with adaptive replacement) –Similar to ARC, and comparable in performance –But, unlike ARC, doesn’t require user-specified parameters –Likewise developed by IBM: Sorav Bansal & Dharmendra Modha –Details: http://www.usenix.org/events/fast04/tech/full_papers/bansal/bansal.pdf http://www.usenix.org/events/fast04/tech/full_papers/bansal/bansal.pdf

Computer Architecture 2012 – virtual memory 17 Page faults  Page faults: the data is not in memory  retrieve it from disk –CPU detects the situation (valid=0) –But it cannot remedy the situation (doesn’t know disk; it’s the OS job) –Thus, it must trap to OS –OS loads page from disk  Possibly writing victim page to disk (if no room & if dirty)  Possibly avoids reading from disk due to OS “buffer cache” –OS updates page table (valid=1) –OS resumes process; now, HW will retry & succeed!  Page fault incurs a significant penalty –“Major” page fault = must go get page from disk –“Minor” page fault = page already resides in OS buffer cache  Possible only for files; not for “anonymous” spaces like the stack –=> pages shouldn’t be too small (as noted, typically 4KB)

Computer Architecture 2012 – virtual memory 18 Page size  Smaller page size (typically 4KB) –PROS: minimizes internal fragmentation –CONS: increase size of page table  Bigger size (called “superpages” if > 4K) –PROS:  Amortize disk access cost  May prefetch useful data  May discard useless data early –CONS:  Increased fragmentation  Might transfer unnecessary info at the expense of useful info  Lots of work to increase page size beyond 4K –HW supports it for years; OS is the “bottleneck” –Attractive because:  Bigger DRAMs, increasing memory/disk performance gap

Computer Architecture 2012 – virtual memory 19 TLB (translation lookaside buffer)  Page table resides in memory –Each translation requires a memory access –Might be required for each load/store!  TLB –Cache recently used PTEs –speed up translation –typically 128 to 256 entries –usually 4 to 8 way associative –TLB access time is comparable to L1 cache access time Yes No TLB Hit ? Access Page Table Virtual Address Physical Addresses TLB Access

Computer Architecture 2012 – virtual memory 20 TLB is a cache for recent address translations: Making Address Translation Fast

Computer Architecture 2012 – virtual memory 21 TLB Access TagSet Offset Set# Hit/Miss Way MUX PTE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = = = = Way 0 Way 1 Way 2 Way 3 Way 0 Way 1 Way 2 Way 3 Virtual page number

Computer Architecture 2012 – virtual memory 22 Unified L2  L2 is unified (no separation for data/inst) – as the main memory –In case of a miss in either: d-L1, i-L1, d-TLB, or i-TLB => try to get missed data from L2 –PTEs can and do reside in L2 L1 Instruction cache L1 Data Cache L2 cache Data TLB Instruction TLB translations Memory

Computer Architecture 2012 – virtual memory 23 VM & cache  TLB access is serial with cache access => performance is crucial!  Page table entries can be cached in L2 cache (as data) Yes No Access TLB Access Page Table In Memory Access Cache Virtual Address L1 Cache Hit ? Yes No Physical Addresses Data No Access Memory L2 Cache Hit ? TLB Hit ? L2 Cache Hit ? No

Computer Architecture 2012 – virtual memory 24 Overlapped TLB & cache access #Set is not contained within the Page Offset  The #Set is not known until the physical page number is known  Cache can be accessed only after address translation done VM view of a Physical Address Cache view of a Physical Address 0 Page offset 11 Physical Page Number 1229 0 disp 13 tag 14295 set 6

Computer Architecture 2012 – virtual memory 25 Overlapped TLB & cache access (cont) In the above example #Set is contained within the Page Offset  The #Set is known immediately  Cache can be accessed in parallel with address translation  Once translation is done, match upper bits with tags Limitation: Cache ≤ (page size × associativity) Virtual Memory view of a Physical Address Cache view of a Physical Address 0 Page offset 11 Physical Page Number 1229 0 5 disptagset 6 1112

Computer Architecture 2012 – virtual memory 26 Overlapped TLB & cache access (cont) TagSet Page offset Set# Virtual page number set disp Set# Physical page number TLB Hit/Miss Way MUX = = = = Cache Way MUX ======== Hit/Miss Data

Computer Architecture 2012 – virtual memory 27 Overlapped TLB & cache access (cont)  Assume cache is 32K Byte, 2 way set-associative, 64 byte/line –(2 15 / 2 ways) / (2 6 bytes/line) = 2 15-1-6 = 2 8 = 256 sets  In order to still allow overlap between set access and TLB access –Take the upper two bits of the set number from bits [1:0] of the VPN  Physical_addr[13:12] may be different than virtual_addr[13:12] –Tag is comprised of bits [31:12] of the physical address  The tag may mis-match bits [13:12] of the physical address –Cache miss  allocate missing line according to its virtual set address and physical tag 0 Page offset 11 Physical Page Number 1229 0 disp 13 12 11 tag 14295 set 6 VPN[1:0]

Computer Architecture 2012 – virtual memory 28 DMA (direct memory access)  DMA copies page from/to, e.g., disk controller (or other I/O dev) –Access memory without requiring CPU involvement –Assume we copy from memory to disk (swap out page) –Read each relevant block:  Snoop-invalidate if resides in cache (L1, L2), meaning: if it is modified, copy line from cache into memory invalidates cache line –Writes the line to the disk controller –This means that when a page is swapped-out of memory  All data in the caches which belongs to that page is invalidated  The page in the disk is up-to-date  In the page table –Assign 0 to valid bit in PTE of swapped-out pages –The rest of the PTE bits may be used by the OS for keeping the location of the page on disk –TLB entry of swapped out page is likewise invalidated

Computer Architecture 2012 – virtual memory 29 Context switch  Each process has its own address space –Akin to saying “each process has its own page table” –OS allocates frames for process => updates process's page table –If only one PTE points to frame throughput the system  Only the associated process can access the corresponding frame –Shared memory  Two PTEs of two processes point to the same frame  Upon context switching –Save current architectural state to memory:  Architectural registers, including  Register that holds the page table base address in memory –Flush TLB  As same virtual addresses are routinely reused  (Recently “VPID” added to TLB to some x86’s => no need to flush) –Load the new architectural state from memory  Architectural registers  Register that holds the page table base address in memory

Computer Architecture 2012 – virtual memory 30 Virtually-addressed cache  Cache uses virtual addresses (tags are virtual)  Only require address translation on cache miss –TLB not in path to cache hit! But…  Aliasing: >=2 virtual addresses mapped to same physical address –=> >=2 cache lines holding data of same physical address  –=> Must update all cache entries with same physical address  data Trans- lation Cache Main Memory VA hit PA CPU

Computer Architecture 2012 – virtual memory 31 Virtually-addressed cache  Cache must be flushed at task switch –Possible solution: include unique process ID (PID) in tag (like the VPID we discussed earlier)

Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu.

Similar presentations

Presentation on theme: "Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu.

Similar presentations

Presentation on theme: "Computer Architecture 2012 – virtual memory 1 Computer Architecture Virtual Memory (VM) By Dan Tsafrir, 10/6/2011 Presentation based on slides by Lihu."— Presentation transcript:

Similar presentations

About project

Feedback