Presentation is loading. Please wait.

Presentation is loading. Please wait.

Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001.

Similar presentations


Presentation on theme: "Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001."— Presentation transcript:

1 Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001

2 2 © Alvin R. Lebeck 2001 CPS 220 Outline Address Translation –basics –64-bit Address Space Managing memory OS Performance Throughout Review Computer Architecture Interaction with Architectural Decisions

3 3 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Core Chip Set Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts System Organization

4 4 © Alvin R. Lebeck 2001 CPS 220 Computer Architecture Interface Between Hardware and Software Hardware Software Operating System Compiler Applications CPUMemoryI/O MultiprocessorNetworks This is IT

5 5 © Alvin R. Lebeck 2001 CPS 220 Memory Hierarchy 101 P $ Memory Very fast 1ns clock Multiple Instructions per cycle SRAM, Fast, Small Expensive DRAM, Slow, Big,Cheap (called physical or main) => Cost Effective Memory System (Price/Performance) Magnetic, Really Slow, Really Big, Really Cheap

6 6 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory: Motivation Process = Address Space + thread(s) of control Address space = PA –programmer controls movement from disk –protection? –relocation? Linear Address space –larger than physical address space »32, 64 bits v.s. 28-bit physical (256MB) Automatic management Virtual Physical

7 7 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory Process = virtual address space + thread(s) of control Translation –VA -> PA –What physical address does virtual address A map to –Is VA in physical memory? Protection (access control) –Do you have permission to access it?

8 8 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory: Questions How is data found if it is in physical memory? Where can data be placed in physical memory? Fully Associative, Set Associative, Direct Mapped What data should be replaced on a miss? (Take CPS210 …)

9 9 © Alvin R. Lebeck 2001 CPS 220 Segmented Virtual Memory Virtual address (2 32, 2 64 ) to Physical Address mapping (2 30 ) Variable size, base + offset, contiguous in both VA and PA Virtual Physical 0x1000 0x6000 0x9000 0x0000 0x1000 0x2000 0x11000

10 10 © Alvin R. Lebeck 2001 CPS 220 Intel Pentium Segmentation Seg Selector Offset Logical Address Segment Descriptor Global Descriptor Table (GDT) Segment Base Address Physical Address Space

11 11 © Alvin R. Lebeck 2001 CPS 220 Pentium Segmention (Continued) Segment Descriptors –Local and Global –base, limit, access rights –Can define many Segment Registers –contain segment descriptors (faster than load from mem) –Only 6 Must load segment register with a valid entry before segment can be accessed – generally managed by compiler, linker, not programmer

12 12 © Alvin R. Lebeck 2001 CPS 220 Paged Virtual Memory Virtual address (2 32, 2 64 ) to Physical Address mapping (2 28 ) –virtual page to physical page frame Fixed Size units for access control & translation Virtual Physical 0x1000 0x6000 0x9000 0x0000 0x1000 0x2000 0x11000 Virtual page number Offset

13 13 © Alvin R. Lebeck 2001 CPS 220 Page Table Kernel data structure (per process) Page Table Entry (PTE) –VA -> PA translations (if none page fault) –access rights (Read, Write, Execute, User/Kernel, cached/uncached) –reference, dirty bits Many designs –Linear, Forward mapped, Inverted, Hashed, Clustered Design Issues –support for aliasing (multiple VA to single PA) –large virtual address space –time to obtain translation

14 14 © Alvin R. Lebeck 2001 CPS 220 Alpha VM Mapping (Forward Mapped) “64-bit” address divided into 3 segments –seg0 (bit 63=0) user code/heap –seg1 (bit 63 = 1, 62 = 1) user stack –kseg (bit 63 = 1, 62 = 0) kernel segment for OS Three level page table, each one page –Alpha 21064 only 43 unique bits of VA –(future min page size up to 64KB => 55 bits of VA) PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit) –What do you do for replacement? 21 10 POL3L2L1 base + 10 13 + + phys page frame number seg 0/1

15 15 © Alvin R. Lebeck 2001 CPS 220 Inverted Page Table (HP, IBM) One PTE per page frame –only one VA per physical frame Must search for virtual address More difficult to support aliasing Force all sharing to use the same VA Virtual page numberOffset VA PA,ST Hash Anchor Table (HAT) Inverted Page Table (IPT) Hash

16 16 © Alvin R. Lebeck 2001 CPS 220 Intel Pentium Segmentation + Paging Seg Selector Offset Logical Address Segment Descriptor Global Descriptor Table (GDT) Segment Base Address Linear Address Space Page Dir Physical Address Space DirOffsetTable Page Table

17 17 © Alvin R. Lebeck 2001 CPS 220 The Memory Management Unit (MMU) Input –virtual address Output –physical address –access violation (exception, interrupts the processor) Access Violations –not present –user v.s. kernel –write –read –execute

18 18 © Alvin R. Lebeck 2001 CPS 220 Translation Lookaside Buffers (TLB) Need to perform address translation on every memory reference –30% of instructions are memory references –4-way superscalar processor –at least one memory reference per cycle Make Common Case Fast, others correct Throw HW at the problem Cache PTEs

19 19 © Alvin R. Lebeck 2001 CPS 220 Fast Translation: Translation Buffer Cache of translated addresses Alpha 21164 TLB: 48 entry fully associative Page Number Page offset... vrwtag phys frame... 48:1 mux 12... 48 3 4

20 20 © Alvin R. Lebeck 2001 CPS 220 TLB Design Must be fast, not increase critical path Must achieve high hit ratio Generally small highly associative Mapping change –page removed from physical memory –processor must invalidate the TLB entry PTE is per process entity –Multiple processes with same virtual addresses –Context Switches? Flush TLB Add ASID (PID) –part of processor state, must be set on context switch

21 21 © Alvin R. Lebeck 2001 CPS 220 Hardware Managed TLBs Hardware Handles TLB miss Dictates page table organization Compilicated state machine to “walk page table” –Multiple levels for forward mapped –Linked list for inverted Exception only if access violation Control Memory TLB CPU

22 22 © Alvin R. Lebeck 2001 CPS 220 Software Managed TLBs Software Handles TLB miss Flexible page table organization Simple Hardware to detect Hit or Miss Exception if TLB miss or access violation Should you check for access violation on TLB miss? Control Memory TLB CPU

23 23 © Alvin R. Lebeck 2001 CPS 220 Kernel Mapping the Kernel Digital Unix Kseg –kseg (bit 63 = 1, 62 = 0) Kernel has direct access to physical memory One VA->PA mapping for entire Kernel Lock (pin) TLB entry –or special HW detection User Stack Kernel User Code/ Data Physical Memory 0 2 64 -1

24 24 © Alvin R. Lebeck 2001 CPS 220 Considerations for Address Translation Large virtual address space Can map more things –files –frame buffers –network interfaces –memory from another workstation Sparse use of address space Page Table Design –space –less locality => TLB misses OS structure microkernel => more TLB misses

25 25 © Alvin R. Lebeck 2001 CPS 220 Address Translation for Large Address Spaces Forward Mapped Page Table –grows with virtual address space »worst case 100% overhead not likely –TLB miss time: memory reference for each level Inverted Page Table –grows with physical address space »independent of virtual address space usage –TLB miss time: memory reference to HAT, IPT, list search

26 26 © Alvin R. Lebeck 2001 CPS 220 Hashed Page Table (HP) Combine Hash Table and IPT [Huck96] –can have more entries than physical page frames Must search for virtual address Easier to support aliasing than IPT Space –grows with physical space TLB miss –one less memory ref than IPT Virtual page numberOffset VA PA,ST Hashed Page Table (HPT) Hash

27 27 © Alvin R. Lebeck 2001 CPS 220 Clustered Page Table (SUN) Combine benefits of HPT and Linear [Talluri95] Store one base VPN (TAG) and several PPN values –virtual page block number (VPBN) –block offset VPBNOffset VPBN next PA0 attrib Hash Boff VPBN next PA0 attrib... PA1 attrib PA2 attrib PA3 attrib VPBN next PA0 attrib VPBN next PA0 attrib

28 28 © Alvin R. Lebeck 2001 CPS 220 Reducing TLB Miss Handling Time Problem –must walk Page Table on TLB miss –usually incur cache misses –big problem for IPC in microkernels Solution –build a small second-level cache in SW –on TLB miss, first check SW cache »use simple shift and mask index to hash table

29 29 © Alvin R. Lebeck 2001 CPS 220 Next Time More TLB issues Virtual Memory & Caches Multiprocessor Issues

30 Operating Systems & Memory Systems: Managing the Memory System CPS 220 Professor Alvin R. Lebeck

31 31 © Alvin R. Lebeck 2001 CPS 220 Review: Address Translation Map from virtual address to physical address Page Tables, PTE –va->pa, attributes –forward mapped, inverted, hashed, clustered Translation Lookaside Buffer –hardware cache of most recent va->pa translation –misses handled in hardware or software Implications of larger address space –page table size –possibly more TLB misses OS Structure –microkernels -> lots of IPC -> more TLB misses

32 32 © Alvin R. Lebeck 2001 CPS 220 Cache Memory 102 Block 7 placed in 4 block cache: –Fully associative, direct mapped, 2-way set associative –S.A. Mapping = Block Number Modulo Number Sets –DM = 1-way Set Assoc Cache Frame –location in cache Bit-selection 01 23 701 23 01 23 FA DM 7 mod 4 01 23 SA 7 mod 2 Set 0 Set 1 Main Memory

33 33 © Alvin R. Lebeck 2001 CPS 220 Cache Indexing Tag on each block –No need to check index or block offset Increasing associativity shrinks index, expands tag Fully Associative: No index Direct-Mapped: Large index Block offset Block Address TAGIndex

34 34 © Alvin R. Lebeck 2001 CPS 220 Address Translation and Caches Where is the TLB wrt the cache? What are the consequences? Most of today’s systems have more than 1 cache –Digital 21164 has 3 levels –2 levels on chip (8KB-data,8KB-inst,96KB-unified) –one level off chip (2-4MB) Does the OS need to worry about this? Definition: page coloring = careful selection of va->pa mapping

35 35 © Alvin R. Lebeck 2001 CPS 220 TLBs and Caches CPU TLB $ MEM VA PA Conventional Organization CPU $ TLB MEM VA PA Virtually Addressed Cache Translate only on miss Alias (Synonym) Problem CPU $TLB MEM VA PA Tags PA Overlap $ access with VA translation: requires $ index to remain invariant across translation VA Tags L2 $

36 36 © Alvin R. Lebeck 2001 CPS 220 Virtual Caches Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache Avoid address translation before accessing cache –faster hit time to cache Context Switches? –Just like the TLB (flush or pid) –Cost is time to flush + “compulsory” misses from empty cache –Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process I/O must interact with cache

37 37 © Alvin R. Lebeck 2001 CPS 220 I/O Bus Memory Bus Processor Cache Main Memory Disk Controller Disk Graphics Controller Network Interface Graphics Network interrupts I/O and Virtual Caches I/O Bridge Virtual Cache Physical Addresses I/O is accomplished with physical addresses DMA flush pages from cache need pa->va reverse translation coherent DMA

38 38 © Alvin R. Lebeck 2001 CPS 220 Aliases and Virtual Caches aliases (sometimes called synonyms); Two different virtual addresses map to same physical address But, but... the virtual address is used to index the cache Could have data in two different locations in the cache Kernel User Stack Kernel User Code/ Data Physical Memory 0 2 64 -1

39 39 © Alvin R. Lebeck 2001 CPS 220 If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag Limits cache to page size: what if want bigger caches and use same trick? –Higher associativity –Page coloring Index with Physical Portion of Address Page Address Page Offset Address Tag Index Block Offset

40 40 © Alvin R. Lebeck 2001 CPS 220 Page Coloring for Aliases HW that guarantees that every cache frame holds unique physical address OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame –one form of page coloring Page Address Page Offset Address Tag Index Block Offset

41 41 © Alvin R. Lebeck 2001 CPS 220 Virtual Memory and Physically Indexed Caches Notion of bin –region of cache that may contain cache blocks from a page Random vs careful mapping Selection of physical page frame dictates cache index Overall goal is to minimize cache misses CachePage frames

42 42 © Alvin R. Lebeck 2001 CPS 220 Careful Page Mapping [Kessler92, Bershad94] Select a page frame such that cache conflict misses are reduced –only choose from available pages (no replacement induced) static –“smart” selection of page frame at page fault time dynamic –move pages around

43 43 © Alvin R. Lebeck 2001 CPS 220 Page Coloring Make physical index match virtual index Behaves like virtual index cache –no conflicts for sequential pages Possibly many conflicts between processes –address spaces all have same structure (stack, code, heap) –modify to xor PID with address (MIPS used variant of this) Simple implementation Pick abitrary page if necessary

44 44 © Alvin R. Lebeck 2001 CPS 220 Bin Hopping Allocate sequentially mapped pages (time) to sequential bins (space) Can exploit temporal locality –pages mapped close in time will be accessed close in time Search from last allocated bin until bin with available page frame Separate search list per process Simple implementation

45 45 © Alvin R. Lebeck 2001 CPS 220 Best Bin Keep track of two counters per bin –used: # of pages allocated to this bin for this address space –free: # of available pages in the system for this bin Bin selection is based on low values of used and high values of free Low used value –reduce conflicts within the address space High free value –reduce conflicts between address spaces

46 46 © Alvin R. Lebeck 2001 CPS 220 Hierarchical Best bin could be linear in # of bins Build a tree –internal nodes contain sum of child values Independent of cache size –simply stop at a particular level in the tree

47 47 © Alvin R. Lebeck 2001 CPS 220 Benefit of Static Page Coloring Reduces cache misses by 10% to 20% Multiprogramming –want to distribute mapping to avoid inter-address space conflicts

48 48 © Alvin R. Lebeck 2001 CPS 220 Dynamic Page Coloring Cache Miss Lookaside (CML) buffer [Bershad94] –proposed hardware device Monitor # of misses per page If # of misses >> # of cache blocks in page –must be conflict misses –interrupt processor –move a page (recolor) Cost of moving page << benefit

49 49 © Alvin R. Lebeck 2001 CPS 220 Outline Page Coloring Page Size

50 50 © Alvin R. Lebeck 2001 CPS 220 A Case for Large Pages Page table size is inversely proportional to the page size –memory saved Fast cache hit time easy when cache <= page size (VA caches); –bigger page makes it feasible as cache size grows Transferring larger pages to or from secondary storage, possibly over a network, is more efficient Number of TLB entries are restricted by clock cycle time, –larger page size maps more memory –reduces TLB misses

51 51 © Alvin R. Lebeck 2001 CPS 220 A Case for Small Pages Fragmentation –large pages can waste storage –data must be contiguous within page Quicker process start for small processes(??)

52 52 © Alvin R. Lebeck 2001 CPS 220 Superpages Hybrid solution: multiple page sizes –8KB, 16KB, 32KB, 64KB pages –4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages Need to identify candidate superpages –Kernel –Frame buffers –Database buffer pools Application/compiler hints Detecting superpages –static, at page fault time –dynamically create superpages Page Table & TLB modifications


Download ppt "Operating Systems & Memory Systems: Address Translation CPS 220 Professor Alvin R. Lebeck Fall 2001."

Similar presentations


Ads by Google