2Main Points Address Translation Concept Flexible Address Translation How do we convert a virtual address to a physical address?Flexible Address TranslationBase and boundSegmentationPagingMultilevel translationEfficient Address TranslationTranslation Lookaside BuffersVirtually and Physically Addressed Caches
3Address Translation Concept Two views of memory:view from the CPU -- what program sees, virtual memoryview from memory -- physical memoryTranslation box converts between the two views.Translation helps implement protection because no way for program to even talk about other program's addresses; no way for them to touch operating system code or data.
5Address TranslationWhat can you do if you can (selectively) gain control whenever a program reads or writes a particular memory location?With hardware supportWith compiler-level supportMemory management is one of the most complex parts of the OSServes many different purposes
6Address Translation Uses Process isolationKeep a process from touching anyone else’s memory, or the kernel’sEfficient interprocess communicationShared regions of memory between processesShared code segmentsE.g., common libraries used by many different programsProgram initializationStart running a program before it is entirely in memoryDynamic memory allocationAllocate and initialize stack/heap pages on demand
7Address Translation (more) Cache managementPage coloringProgram debuggingData breakpoints when address is accessedZero-copy I/ODirectly from I/O device into/out of user memoryMemory mapped filesAccess file data using load/store instructionsDemand-paged virtual memoryIllusion of near-infinite memory, backed by disk or memory on other machines
8Address Translation (even more) Checkpointing/restartTransparently save a copy of a process, without stopping the program while the save happensPersistent data structuresImplement data structures that can survive system rebootsProcess migrationTransparently move processes between machinesInformation flow controlTrack what data is being shared externallyDistributed shared memoryIllusion of memory that is shared between machines
10Virtual Base and Bounds Pros?SimpleFast (2 registers, adder, comparator)Can relocate in physical memory without changing processCons?Can’t keep program from accidentally overwriting its own codeCan’t share code/data with other processesCan’t grow stack/heap as neededProvides level of indirection: OS can move bits around behind the program's back, for instance, if program needs to grow beyond its bounds, or if need to coalesce fragments of memory. Stop program, copy bits, change base and bounds registers, restart.Because OS manages the translation, we control the vertical, we control the horizontal.Only the OS gets to change the base and bounds! Clearly, user program can't, or else lose protection.With base&bounds system, what gets saved/restored on a context switch?contents of base and bounds register; or maybe even contents of memory!Hardware cost:2 registersadder, comparatorPlus, slows down hardware because need to take time to do add/compare on every memory reference.In 60's, thought to be too expensive!But get better use of memory, and you get protection. These are really important. CPU speed (especially now) is the easy part.
11Segmentation Segment is a contiguous region of memory Virtual or (for now) physical memoryEach process has a segment table (in hardware)Entry in table = segmentSegment can be located anywhere in physical memoryStartLengthAccess permissionProcesses can share segmentsSame start, length, same/different access permissions
12This should seem a bit strange: the virtual address space has gaps in it! Each segment gets mapped to contiguous locations in physical memory, but may be gaps between segments.This is a little like walking around in the dark, and there are huge pits in the ground where you die if you step in the pit.But! of course, a correct program will never step off into a pit, so ok.A correct program will never address gaps; if it does, trap to kernel and then core dump. Minor exception: stack, heap can grow. In UNIX, sbrk() increases size of heap segment. For stack, just take fault, system automatically increases size of stack.Detail: Protection mode in segmentation table entries. For example, code segment would be read-only (only execution and loads are allowed). Data and stack segment would be read-write (stores allowed).What must be saved/restored on context switch? Typically, segment table stored in CPU, not in memory, because it’s small.Contents must be saved/restored.
13Segment start length code 0x4000 0x700 data 0x500 heap - stack 0x2000 0x500heap-stack0x20000x10002 bit segment #12 bit offsetPhysical MemoryVirtual Memorymain: 240store #1108, r2244store pc+8, r31248jump 36024c…strlen: 360loadbyte (r2), r3420jump (r31)x: 1108a b c \0x: 108a b c \0…main: 4240store #1108, r24244store pc+8, r314248jump 360424cstrlen: 4360loadbyte (r2),r34420jump (r31)Initially, pc = 240.What happens first? Simulate machine, building up physical memory contents as we go.1. fetch 240? virtual segment #? 0; offset ? Physical address: 4240 from base = offset = Fetch value at Store address of x into r2 (first argument to procedure).2. fetch 244? virtual segment #? 0; offset ? Physical address: Fetch value at Instruction to Store PC + 8 into r31 (return value).5. What is PC? (Note PC is untranslated!) = 24c Instruction to execute after return.6. Fetch 248? Physical address Fetch instruction: jump 360.7. Fetch 360? Physaddr Fetch instruction: load (r2) into r3. Contents of r2?8. r2 is used as a pointer. Contents of r2 is a virtual address. Therefore, need to translate it! Seg # ? 1 offset ? Physaddr: Fetch value a. Do remainder of strlen.9. On return, jump to (r31) -- this holds return PC in *virtual space* c
14UNIX fork and Copy on Write Makes a complete copy of a processSegments allow a more efficient implementationCopy segment table into childMark parent and child segments read-onlyStart child process; return to parentIf child or parent writes to a segment, will trap into kernelmake a copy of the segment and resume
16Zero-on-ReferenceHow much physical memory do we need to allocate for the stack or heap?Zero bytes!When program touches the heapSegmentation fault into OS kernelKernel allocates some memoryHow much?Zeros the memoryavoid accidentally leaking information!Restart process
17Segmentation Pros? Cons? Can share code/data segments between processesCan protect code segment from being overwrittenCan transparently grow stack/heap as neededCan detect if need to copy-on-writeCons?Complex memory managementNeed to find chunk of a particular sizeMay need to rearrange memory from time to time to make room for new segment or growing segmentExternal fragmentation: wasted space between chunks
18Paged Translation Manage memory in fixed size units, or pages Finding a free page is easyBitmap allocation:Each bit represents one physical page frameEach process has its own page tableStored in physical memoryHardware registerspointer to page table startpage table lengthSimpler, because allows use of a bitmap. What's a bitmap?Each bit represents one page of physical memory -- 1 means allocated, 0 means unallocated. Lots simpler than base&bounds or segmentation
20Process View A Physical Memory B C D E 4 F 3 G H 1 I Page Table J K L Suppose page size is 4 bytesWhere is virtual address 6? 9?
21Paging QuestionsWhat must be saved/restored on a process context switch?Pointer to page table/size of page tablePage table itself is in main memoryWhat if page size is very small?What if page size is very large?Internal fragmentation: if we don’t need all of the space inside a fixed size chunkMeans lots of space taken up with page table entries.
22Paging and Copy on Write Can we share memory between processes?Set entries in both page tables to point to same page framesNeed core map of page frames to track which processes are pointing to which page framesUNIX fork with copy on write at page granularityCopy page table entries to new processMark all pages as read-onlyTrap into kernel on write (in child or parent)Copy page and resume execution
23Paging and Fast Program Start Can I start running a program before its code is in physical memory?Set all page table entries to invalidWhen a page is referenced for first timeTrap to OS kernelOS kernel brings in pageResumes executionRemaining pages can be transferred in the background while program is runningOn windows, the compiler reorganizes where procedures are in the executable, to put the initialization code in a few pages right at the beginning
24Sparse Address Spaces Might want many separate segments Per-processor heapsPer-thread stacksMemory-mapped filesDynamically linked librariesWhat if virtual address space is sparse?On 32-bit UNIX, code starts at 0Stack starts at 2^314KB pages => 500K page table entries64-bits => 4 quadrillion page table entriesIs there a solution that allows simple memory allocation, easy to share memory, and is efficient for sparse address spaces?
25Multi-level Translation Tree of translation tablesPaged segmentationMulti-level page tablesMulti-level paged segmentationAll 3: Fixed size page as lowest level unitEfficient memory allocationEfficient disk transfersEasier to build translation lookaside buffersEfficient reverse lookup (from physical -> virtual)Page granularity for protection/sharing
26Paged Segmentation Process memory is segmented Segment table entry: Pointer to page tablePage table length (# of pages in segment)Access permissionsPage table entry:Page frameShare/protection at either page or segment-level
29x86 Multilevel Paged Segmentation Global Descriptor Table (segment table)Pointer to page table for each segmentSegment lengthSegment access permissionsContext switch: change global descriptor table register (GDTR, pointer to global descriptor table)Multilevel page table4KB pages; each level of page table fits in one pageOnly fill page table if needed32-bit: two level page table (per segment)64-bit: four level page table (per segment)
30Multilevel Translation Pros:Allocate/fill only as many page tables as usedSimple memory allocationShare at segment or page levelCons:Space overhead: at least one pointer per virtual pageTwo or more lookups per memory reference
31PortabilityMany operating systems keep their own memory translation data structuresList of memory objects (segments)Virtual -> physicalPhysical -> virtualSimplifies porting from x86 to ARM, 32 bit to 64 bitInverted page tableHash from virtual page -> physical pageSpace proportional to # of physical pages
32Do we need multi-level page tables? Use inverted page table in hardware instead of multilevel treeIBM PowerPCHash virtual page # to inverted page table bucketLocation in IPT => physical page framePros/cons?
33Efficient Address Translation Translation lookaside buffer (TLB)Cache of recent virtual page -> physical page translationsIf cache hit, use translationIf cache miss, walk multi-level page tableCost of translation =Cost of TLB lookup +Prob(TLB miss) * cost of page table lookup
35Software Loaded TLB Do we need a page table at all? Pros/cons? MIPS processor architectureIf translation is in TLB, okIf translation is not in TLB, trap to kernelKernel computes translation and loads TLBKernel can use whatever data structures it wantsPros/cons?
41When Do TLBs Work/Not Work, part 3 What happens when the OS changes the permissions on a page?For demand paging, copy on write, zero on reference, …TLB may contain old translationOS must ask hardware to purge TLB entryOn a multicore: TLB shootdownOS must ask each CPU to purge TLB entry
48Memory Hierarchyi7 has 8MB as shared 3rd level cache; 2nd level cache is per-core
49Translation on a Modern Processor Note that the TLB miss walks the page table in the physical cache, so you may not even need to go to physical memory on a TLB miss
50Question What is the cost of a first level TLB miss? Second level TLB lookupWhat is the cost of a second level TLB miss?x86: 2-4 level page table walkHow expensive is a 4-level page table walk on a modern processor?
51QuestionsWith a virtual cache, what do we need to do on a context switch?What if the virtual cache > page size?Page size: 4KB (x86)First level cache size: 64KB (i7)Cache block size: 32 bytesNeed to invalidate the cache, or tag the cache with process ID’sIf tagged, and the virtual cache is > page size, means that it is possible to have multiple cache blocks that have different contents
52AliasingAlias: two (or more) virtual cache entries that refer to the same physical memoryWhat if we modify one alias and then context switch?Typical solutionOn a write, lookup virtual cache and TLB in parallelPhysical address from TLB used to check for aliases
54Multicore and Hyperthreading Modern CPU has several functional unitsInstruction decodeArithmetic/branchFloating pointInstruction/data cacheTLBMulticore: replicate functional units (i7: 4)Share second/third level cache, second level TLBHyperthreading: logical processors that share functional units (i7: 2)Better functional unit utilization during memory stallsNo difference from the OS/programmer perspectiveExcept for performance, affinity, …