Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture Virtual Memory

Similar presentations


Presentation on theme: "Computer Architecture Virtual Memory"— Presentation transcript:

1 Computer Architecture Virtual Memory
Dr. Lihu Rappoport

2 Virtual Memory Provides the illusion of a large memory
Different machines have different amount of physical memory Allows programs to run regardless of actual physical memory size The amount of memory consumed by each process is dynamic Allow adding memory as needed Many processes can run on a single machine Provide each process its own memory space Prevents a process from accessing the memory of other processes running on the same machine Allows the sum of memory spaces of all process to be larger than physical memory Basic terminology Virtual Address Space: address space used by the programmer Physical Address: actual physical memory address space

3 Virtual Memory: Basic Idea
Divide memory (virtual and physical) into fixed size blocks Pages in Virtual space, Frames in Physical space Page size = Frame size Page size is a power of 2: page size = 2k All pages in the virtual address space are contiguous Pages can be mapped into physical Frames in any order Some of the pages are in main memory (DRAM), some of the pages are on disk All programs are written using Virtual Memory Address Space The hardware does on-the-fly translation between virtual and physical address spaces Use a Page Table to translate between Virtual and Physical addresses

4 Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection Virtual Addresses Physical Addresses Address Translation Disk Addresses

5 Virtual to Physical Address translation
63 Page offset 11 Virtual Page Number Physical Frame Number 31 Virtual Address Physical Address V D Frame number 1 Page table base reg Valid bit Dirty bit 12 AC Access Control Page size: 212 byte =4K byte

6 Page Tables Page Table Physical Memory Disk Virtual page number
Physical Page Or Disk Address Virtual page number Physical Memory Valid 1 1 1 1 1 1 1 Disk 1 1

7 Address Mapping Algorithm
If V = 1 then page is in main memory at frame address stored in table  Fetch data else (page fault) need to fetch page from disk  causes a trap, usually accompanied by a context switch: current process suspended while page is fetched from disk Access Control (R = Read-only, R/W = read/write, X = execute only) If kind of access not compatible with specified access rights then protection_violation_fault  causes trap to hardware, or software fault handler Missing item fetched from secondary memory only on the occurrence of a fault  demand load policy

8 Page Replacement Algorithm
Not Recently Used (NRU) Associated with each page is a reference flag such that ref flag = 1 if the page has been referenced in recent past If replacement is needed, choose any page frame such that its reference bit is 0. This is a page that has not been referenced in the recent past Clock implementation of NRU: 1 0 page table entry Ref bit While (PT[LRP].NRU) { PT[LRP].NRU LRP++ (mod table size) } Possible optimization: search for a page that is both not recently referenced AND not dirty

9 Page Faults Page faults: the data is not in memory  retrieve it from disk The CPU must detect situation The CPU cannot remedy the situation (has no knowledge of the disk) CPU must trap to the operating system so that it can remedy the situation Pick a page to discard (possibly writing it to disk) Load the page in from disk Update the page table Resume to program so HW will retry and succeed! Page fault incurs a huge miss penalty Pages should be fairly large (e.g., 4KB) Can handle the faults in software instead of hardware Page fault causes a context switch Using write-through is too expensive so we use write-back

10 Optimal Page Size Minimize wasted storage Minimize transfer time
Small page minimizes internal fragmentation Small page increase size of page table Minimize transfer time Large pages (multiple disk sectors) amortize access cost Sometimes transfer unnecessary info Sometimes prefetch useful data Sometimes discards useless data early General trend toward larger pages because Big cheap RAM Increasing memory / disk performance gap Larger address spaces

11 Translation Lookaside Buffer (TLB)
Page table resides in memory  each translation requires a memory access TLB Cache recently used PTEs speed up translation typically 128 to 256 entries usually 4 to 8 way associative TLB access time is comparable to L1 cache access time Yes No TLB Hit ? Access Page Table Virtual Address Physical Addresses TLB Access

12 Making Address Translation Fast
TLB is a cache for recent address translations: Valid 1 Physical Memory Disk Virtual page number Page Table Valid Tag Physical Page TLB Physical Page Or Disk Address

13 TLB Access Virtual page number Offset Tag Set PTE Hit/Miss Way 0 Way 1
= = = = Way MUX PTE Hit/Miss

14 Processor Caches L2 is unified – as the main memory
In case of a miss in either the L1 data cache, L1 instruction cache, data TLB, or instruction TLB, try to get the missed data from the L2 first L1 Data Cache L1 Instruction cache L2 cache Memory translations translations Data TLB Instruction TLB

15 Virtual Memory And Cache
Yes No Access TLB Page Table In Memory Cache Virtual Address L1 Cache Hit ? Physical Addresses Data Memory L2 Cache TLB access is serial with cache access Page table entries can be cached in L2 cache (as data)

16 Overlapped TLB & Cache Access
Virtual Memory view of a Physical Address Page offset 11 Physical Page Number 12 29 Cache view of a Physical Address disp 13 tag 14 29 5 set 6 #Set is not contained within the Page Offset The #Set is not known until the physical page number is known Cache can be accessed only after address translation done

17 Overlapped TLB & Cache Access (cont)
Virtual Memory view of a Physical Address 29 12 11 Physical Page Number Page offset Cache view of a Physical Address 29 12 11 6 5 disp tag set In the above example #Set is contained within the Page Offset The #Set is known immediately Cache can be accessed in parallel with address translation Once translation is done, match upper bits with tags Limitation: Cache ≤ (page size × associativity)

18 Overlapped TLB & Cache Access (cont)
Virtual page number Page offset Tag Set set disp TLB Hit/Miss Way MUX = Cache Set# Set# Physical page number = = = = = = = = Way MUX Hit/Miss Data

19 Overlapped TLB & Cache Access (cont)
Assume cache is 32K Byte, 2 way set-associative, 64 byte/line (215/ 2 ways) / (26 bytes/line) = = 28 = 256 sets In order to still allow overlap between set access and TLB access Take the upper two bits of the set number from bits [1:0] of the VPN Physical_addr[13:12] may be different than virtual_addr[13:12] Tag is comprised of bits [31:12] of the physical address The tag may mis-match bits [13:12] of the physical address Cache miss  allocate missing line according to its virtual set address and physical tag 29 12 11 Physical Page Number Page offset 29 14 6 5 set disp tag VPN[1:0]

20 More On Page Swap-out DMA copies the page to the disk controller
Reads each byte: Executes snoop-invalidate for each byte in the cache (both L1 and L2) If the byte resides in the cache: if it is modified reads its line from the cache into memory invalidates the line Writes the byte to the disk controller This means that when a page is swapped-out of memory All data in the caches which belongs to that page is invalidated The page in the disk is up-to-date The TLB is snooped If the TLB hits for the swapped-out page, TLB entry is invalidated In the page table The valid bit in the PTE entry of the swapped-out pages set to 0 All the rest of the bits in the PTE entry may be used by the operating system for keeping the location of the page in the disk

21 Context Switch Each process has its own address space
Each process has its own page table When the OS allocates to each process frames in physical memory, and updates the page table of each process A process cannot access physical memory allocated to another process Unless the OS deliberately allocates the same physical frame to 2 processes (for memory sharing) On a context switch Save the current architectural state to memory Architectural registers Register that holds the page table base address in memory Flush the TLB Load the new architectural state from memory

22 Virtually-Addressed Cache
Cache uses virtual addresses (tags are virtual) Only require address translation on cache miss TLB not in path to cache hit Aliasing: 2 different virtual addr. mapped to same physical addr Two different cache entries holding data for the same physical address Must update all cache entries with same physical address data Trans- lation Cache Main Memory VA hit PA CPU

23 Virtually-Addressed Cache (cont).
Cache must be flushed at task switch Solution: include process ID (PID) in tag How to share memory among processes Permit multiple virtual pages to refer to same physical frame Problem: incoherence if they point to different physical pages Solution: require sufficiently many common virtual LSB With direct mapped cache, guarantied that they all point to same physical page

24 Paging in x86

25 x86 Paging – Virtual memory
A page can be Not yet loaded Loaded On disk A loaded page can be Dirty Clean When a page is not loaded (P bit clear)  Page fault occurs It may require throwing a loaded page to insert the new one OS prioritize throwing by LRU and dirty/clean/avail bits Dirty page should be written to Disk. Clean need not. New page is either loaded from disk or “initialized” CPU will set page “access” flag when accessed, “dirty” when written

26 Page Tables and Directories in 32bit Mode
x86 supports both 4-KByte or 4-MByte pages When extended (36-bit) physical addressing is being used, support either 4-KByte or 2-MByte pages or 4-Mbyte pages only CR3 points to the current Page Directory (changed per process) Page directory Holds up to 1024 page-directory entries (PDEs), each is 32 bits Each PDE contains a PS (page size) flag 0 – the entry points to a page table whose entries in turn point to 4-KByte pages 1 – the entry points directly to a 4-MByte (PSE=1, PS=1) or 2-MByte page (PAE=1, PS=1) Page table Holds up to 1024 page-table entries (PTEs), each is 32 bits Each PTE points to a 4KB page in physical memory

27 Linear Address Space (4K Page)
4KB Page Mapping 2-level hierarchical mapping Page directory and page tables All pages and page tables are 4KB aligned Address up to 220 4KB pages Linear address divided to Dir (10 bit) – points to a PDE in the Page Directory PS bit in PDE = 0  PDE provides a 20 bit, 4KB aligned base physical address of a page table Present in PDE = 0  page fault Table (10 bit) – points to a PTE in the Page Table PTE provides a 20 bit, 4KB aligned base physical address of a 4KB page Offset (12 bits) – offset within the selected 4KB page 31 DIR TABLE OFFSET Linear Address Space (4K Page) 11 21 1K entry Page Table 1K entry Page Directory PDE 4K Page data CR3 (PDBR) 10 12 PTE 20+12=32 (4K aligned) 20

28 Linear Address Space (4MB Page)
4MB Page Mapping PDE directly maps up to MB pages Linear address divided to Dir (10 bit) – points to a PDE in the Page Directory PS in the PDE = 1  PDE provides a 10 bit, 4MB aligned base physical address of a 4MB page Present in PDE = 0  page fault Offset (22 bits) – offset within selected 4MB page Mixing 4-KByte and 4-MByte Pages When PSE flag in CR4 is set, both 4-MByte pages and page tables for 4-Kbyte pages are supported If PSE is clear, 4M-pages are not supported (PS flag setting in PDEs is ignored) The processor maintains 4-MByte page entries and 4-KByte page entries in separate TLBs Placing often used code such as kernel in a large page, frees up 4-KByte-page TLB entries Reduces TLB misses and improve overall system performance 31 DIR OFFSET Linear Address Space (4MB Page) 21 Page Directory PDE 4MByte Page data CR3 (PDBR) 10 22 20+12=32 (4K aligned)

29 Reserved for future use (should be zero)
PDE and PTE Format Page Frame Address 31:12 AVAIL A PCD PWT U W P Present Writable User Write-Through Cache Disable Accessed Page Size (0: 4 Kbyte) Available for OS Use Page Dir Entry 4 1 2 3 5 7 9 11 6 8 12 31 D Dirty Page Table Reserved for future use (should be zero) - 20 bit pointer to a 4K Aligned address 12 bits flags Virtual memory Present Accessed, Dirt Protection Writable (R#/W) User (U/S#) 2 levels/type only Caching Page WT Page Cache Disabled 3 bit for OS usage

30 PTE (4K-Bbyte Page) Format
Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Table Attribute Index Global Page Available for OS Use Page Base Address 31:12 AVAIL G P A T D A PCD PWT U / S R /W P 31 12 11 - 9 8 7 6 5 4 3 2 1

31 PDE (4K-Bbyte Page Table ) Format
Present Writable User / Supervisor Write-Through Cache Disable Accessed Available Page Size (0 indicates 4 Kbytes) Global Page (ignored) Available for OS Use Page Table Base Address 31:12 AVAIL G P S A V L A PCD PWT U / S R /W P 31 12 11 - 9 8 7 6 5 4 3 2 1

32 PDE (4M-Bbyte Page) Format
Present Writable User / Supervisor Write-Through Cache Disable Accessed Dirty Page Size (1 indicates 4 Mbytes) Global Page (ignored) Available for OS Use Page Table Attribute Index Page Base Address 31:22 Reserved P A T AVAIL G P S D A PCD PWT U / S R /W P 31 22 21 13 12 11 - 9 8 7 6 5 4 3 2 1

33 Page Table – Virtual Mem. Attributes
Present (P) flag When set, the page is in physical memory and address translation is carried out When clear, the page is not in memory if the processor attempts to access the page, it generates a page-fault exception Bits 1 through 31 are available to software The processor does not set or clear this flag it is up to the OS to maintain the state of the flag If the processor generates a page-fault exception, the OS generally needs to carry out the following operations: Copy the page from disk into physical memory Load the page address into the PTE/PDE and set its present flag Other flags, such as the dirty and accessed flags, may also be set at this time Invalidate the current PTE in the TLB Return from the page-fault handler to restart the interrupted program (or task) Page size (PS) flag, in PDEs only Determines the page size When clear, the page size is 4KBytes and the PDE points to a page table When set, the page size is 4 Mbytes / 2 MBytes (PAE=1), and the PDE points to a page

34 Page Table – Virtual Mem. Attributes
Accessed (A) flag and Dirty (D) flag OS typically clears these flags when a page/PT is initially loaded into physical mem The processor sets the A-flag the first time a page/PT is accessed (read or written) The processor sets the D-flag the first time a page is accessed for a write operation The D-flag is not used in PDEs that point to page tables Both A and D flag are sticky Once set, the processor does not implicitly clear it – only software can clear it Used by OS to manage transfer of pages/PTs tables into and out of physical memory Global (G) flag Indicates a global page when set (+page global enable (PGE) in reg. CR4 is set) 1: PTE/PDE not invalidated in the TLB when register CR3 is loaded / task switch Prevents frequently used pages (e.g. OS) from being flushed from the TLB Only software can set or clear this flag Ignored for PDEs that point to page tables (global att. of a page is set in PTEs)

35 Page Table – Caching Attributes
Page-level write-through (PWT) flag Controls the write-through or write-back caching policy of the page / PT 1: enable write-through caching 0 : enable write-back caching Ignored if the CD (cache disable) flag in CR0 is set Page-level cache disable (PCD) flag Controls the caching of individual pages/PT 1: caching of the associated page/PT is prevented Used for pages that contain memory mapped I/O ports or that do not provide a performance benefit when cached 0: the page/PT can be cached Ignored (assumes as set) if the CD (cache disable) flag in CR0 is set Page attribute table index (PAT) flag Used along with the PCD and PWT flags to select an entry in the PAT, which in turn selects the memory type for the page

36 Page Table – Protection Attributes
Read/write (R/W) flag Specifies the read-write privileges for a page or group of pages (in the case of a PDE that points to a page table) 0: the page is read only 1: the page can be read and written into User/supervisor (U/S) flag Specifies the user-supervisor privileges for a page or group of pages (in case of a PDE that points to a page table) 0: supervisor privilege level 1: user privilege level

37 Misc Issues Memory Aliasing Base Address of the Page Directory
Memory aliasing supported by allowing two PDEs to point to a common PTE Software that implements memory aliasing must manage the consistency of the accessed and dirty bits in the PDE and PTE Inconsistency may lead to a processor deadlock Base Address of the Page Directory The physical address of the current page directory is stored in CR3 register Also called the page directory base register or PDBR PDBR is typically loaded with a new as part of a task switch The page directory pointed by PDBR may be swapped out of physical memory The OS must ensure that the page directory indicated by the PDBR image in a task's TSS is present in physical memory before the task is dispatched The page directory must also remain in memory as long as the task is active

38 PAE – Physical Address Extension
PAE (physical address extension) flag in CR4 Extends physical addresses to MAXPHYADDR Allows referencing physical addresses above 232 = 4GB Only used when paging is enabled When PAE=1 4-KByte and 2-Mbyte page sizes are supported Relies on an additional Page Directory Pointer Table Lies above the page directory in the translation hierarchy Has 4 entries of 64-bits each to support up to 4 page directories Each entry provides a 27-bit base address field to a page directory PTEs are increased to 64 bits to accommodate 36-bit base physical addresses Each 4-KByte page directory and page table can thus have up to 512 entries CR3 contains the page-directory-pointer-table base address Provides the 27 m.s.bits of the physical address of the first byte of the page-directory pointer table forcing the table to be located on a 32-byte boundary Can be referred to as the page-directory-pointer-table register (PDPTR)

39 4KB Page Mapping with PAE
Linear address divided to Page-directory-pointer-table entry Indexed by bits 30:31 of the linear addr. Provides an offset to one of 4 entries in the page-directory-pointer table The selected entry provides the base physical address of a page directory Dir (9 bits) – points to a PDE in the Page Directory PS in the PDE = 0  PDE provides a 27 bit, 4KB aligned base physical address of a page table Table (9 bit) – points to a PTE in the Page Table PTE provides a 24 bit, 4KB aligned base physical address of a 4KB page Offset (12 bits) – offset within the selected 4KB page 29 DIR TABLE OFFSET Linear Address Space (4K Page) 11 20 512 entry Page Table 512 entry Page Directory PDE 4KByte Page data 9 12 PTE CR3 (PDPTR) 32 (32B aligned) 24 27 21 Dir ptr 30 31 4 entry Page Directory Pointer Table Dir ptr entry 2

40 2MB Page Mapping with PAE
Linear address divided to Page-directory-pointer-table entry Indexed by bits 30:31 of the linear addr. Provides an offset to one of 4 entries in the page-directory-pointer table The selected entry provides the base physical address of a page directory Dir (9 bits) – points to a PDE in the Page Directory PS in the PDE = 1  PDE provides a 15 bit, 2MB aligned base physical address of a 2MB page Offset (21 bits) – offset within the selected 2MB page 29 DIR OFFSET Linear Address Space (2MB Page) 20 Page Directory PDE 2MByte Page data 9 21 CR3 (PDPTR) 32 (32B aligned) 15 Dir ptr 30 31 Pointer Table Dir ptr entry 27 2

41 PTE/PDE/PDP Entry Format with PAE
The major differences in these entries are as follows: A page-directory-pointer-table entry is added The size of the entries is increased from 32 bits to 64 bits The maximum number of entries in a page directory or page table is 512 The base physical address field in each entry is extended to 24 bits

42 Paging in 64 bit Mode PAE paging structures expanded
Potentially support mapping a 64-bit linear address to a 52-bit physical address First implementation supports mapping a 48-bit linear address into a 40-bit physical address A 4th page mapping table added: the page map level 4 table (PML4) The base physical address of the PML4 is stored in CR3 A PML4 entry contains the base physical address a page directory pointer table The page directory pointer table is expanded to byte entries Indexed by 9 bits of the linear address The size of the PDE/PTE tables remains 512 eight-byte entries each indexed by nine linear-address bits The total of linear-address index bits becomes 48 PS flag in PDEs selects between 4-KByte and 2-MByte page sizes CR4.PSE bit is ignored

43 4KB Page Mapping in 64 bit Mode
Linear Address Space (4K Page) 63 47 39 38 30 29 21 20 12 11 sign ext. PML4 PDP DIR TABLE OFFSET 9 9 9 4KByte Page 9 12 data 512 entry Page Directory Pointer Table 512 entry Page Table 512 entry Page Directory PTE 512 entry PML4 Table PDE 28 31 PDP entry 31 PML4 entry 31 CR3 (PDPTR) 40 (4KB aligned)

44 2MB Page Mapping in 64 bit Mode
Linear Address Space (2M Page) 63 47 39 38 30 29 21 20 sign ext. PML4 PDP DIR OFFSET 9 9 9 21 2MByte Page 512 entry Page Directory Pointer Table 512 entry Page Directory data 512 entry PML4 Table PDE 19 PDP entry 31 PML4 entry 31 CR3 (PDPTR) 40 (4KB aligned)

45 PTE/PDE/PDP/PML4 Entry Format – 4KB Pages

46 TLBs The processor saves most recently used PDEs and PTEs in TLBs
Separate TLB for data and instruction caches Separate TLBs for 4-KByte and 2/4-MByte page sizes OS running at privilege level 0 can invalidate TLB entries INVLPG instruction invalidates a specific PTE in the TLB This instruction ignores the setting of the G flag Whenever a PDE/PTE is changed (including when the present flag is set to zero), OS must invalidate the corresponding TLB entry All (non-global) TLBs are automatically invalidated when CR3 is loaded The global (G) flag prevents frequently used pages from being automatically invalidated in on a task switch The entry remains in the TLB indefinitely Only INVLPG can invalidate a global page entry

47 Paging in VAX

48 VM in VAX: Address Format
Virtual Address 31 30 29 9 8 Virtual Page Number Page offset P0 process space (code and data) P1 process space (stack) S0 system space S1 Physical Address 29 9 8 Physical Frame Number Page offset Page size: 29 byte = 512 bytes

49 VM in VAX: Virtual Address Spaces
Process Process1 Process Process3 P0 process code & global vars grow upward P1 process stack & local vars grow downward S0 system space grows upward, generally static 7FFFFFFF

50 Page Table Entry (PTE) V PROT M Z OWN S S Physical Frame Number
Physical Frame Number 31 20 Valid bit =1 if page mapped to main memory, otherwise page fault: Page on the disk swap area Address indicates the page location on the disk 4 Protection bits Modified bit 3 ownership bits Indicate if the line was cleaned (zero)

51 System Space Address Translation
00 10 Page offset 8 29 9 VPN SBR (System page table base physical address) + PTE physical address = PFN (from PTE) PFN Get PTE

52 System Space Address Translation
SBR VPN*4 PFN 10 offset 8 29 9 VPN 31

53 P0 Space Address Translation
00 Page offset 8 29 9 VPN P0BR (P0 page table base virtual address) + PTE S0 space virtual address = PFN (from PTE) PFN Get PTE using system space translation algorithm

54 P0 Space Address Translation (cont)
SBR P0BR+VPN*4 Offset’ 8 29 9 PFN’ 00 offset VPN 31 10 VPN’ VPN’*4 Physical addr of PTE PFN Offset

55 P0 space Address translation Using TLB
VPN offset Memory Access Process TLB Access Process TLB hit? No Calculate PTE virtual addr (in S0): P0BR+4*VPN Yes System TLB Access Get PTE of req page from the proc. TLB System TLB hit? Access Sys Page Table in SBR+4*VPN(PTE) No Yes PFN Get PTE from system TLB Calculate physical address PFN Get PTE of req page from the process Page table Access Memory

56 Backup

57 Why virtual memory? Generality Storage management Protection
ability to run programs larger than size of physical memory Storage management allocation/deallocation of variable sized blocks is costly and leads to (external) fragmentation Protection regions of the address space can be R/O, Ex, . . . Flexibility portions of a program can be placed anywhere, without relocation Storage efficiency retain only most important portions of the program in memory Concurrent I/O execute other processes while loading/dumping page Expandability can leave room in virtual address space for objects to grow. Performance

58 Address Translation with a TLB
p p–1 virtual page number page offset virtual address valid tag physical page number TLB . . . = TLB hit physical address tag byte offset index valid tag data Cache = cache hit data


Download ppt "Computer Architecture Virtual Memory"

Similar presentations


Ads by Google