CS 443 Advanced OS David R. Choffnes, Spring 2005 Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel,

Slides:



Advertisements
Similar presentations
1 Concurrency: Deadlock and Starvation Chapter 6.
Advertisements

Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Chapter 4 Memory Management 4.1 Basic memory management 4.2 Swapping
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
SE-292 High Performance Computing
Chapter 5 : Memory Management
Chapter 4 Memory Management Basic memory management Swapping
Project 5: Virtual Memory
Hash Tables.
1 Overview Assignment 4: hints Memory management Assignment 3: solution.
SE-292: High Performance Computing
CS 241 Spring 2007 System Programming 1 Memory Replacement Policies Lecture 32 Klara Nahrstedt.
4.4 Page replacement algorithms
Page Replacement Algorithms
Virtual Memory (Chapter 4.3)
Online Algorithm Huaping Wang Apr.21
Cache and Virtual Memory Replacement Algorithms
Chapter 3.3 : OS Policies for Virtual Memory
Chapter 11 – Virtual Memory Management
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Virtual Memory.
Chapter 9: Virtual Memory
Virtual Memory 3 Fred Kuhns
Chapter 4 Memory Management 4.1 Basic memory management 4.2 Swapping
Module 10: Virtual Memory
Chapter 3 Memory Management
Chapter 10: Virtual Memory
Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical.
Virtual Memory II Chapter 8.
Memory Management.
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
Chapter 8 Virtual Memory
Chapter 4 Memory Management Page Replacement 补充:什么叫页面抖动?
Chapter 6 File Systems 6.1 Files 6.2 Directories
Chapter 10: Virtual Memory
CS 241 Spring 2007 System Programming 1 Memory Implementation Issues Lecture 33 Klara Nahrstedt.
25 seconds left…...
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
SE-292 High Performance Computing
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
PSSA Preparation.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Nov. 17, 2003 Topic: Virtual Memory.
Virtual Memory Chapter 8.
1 Lecture 14: Virtual Memory Topics: virtual memory (Section 5.4) Reminders: midterm begins at 9am, ends at 10:40am.
Virtual Memory Chapter 8. Hardware and Control Structures Memory references are dynamically translated into physical addresses at run time –A process.
Virtual Memory Chapter 8.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
1 Lecture 14: Virtual Memory Today: DRAM and Virtual memory basics (Sections )
Practical, Transparent Operating System Support for Superpages J. Navarro Rice University and Universidad Católica de Chile S. Iyer, P. Druschel, A. Cox.
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
1 Lecture 13: Cache, TLB, VM Today: large caches, virtual memory, TLB (Sections 2.4, B.4, B.5)
1 Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel, Alan Cox OSDI 2002.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Virtual Memory Hardware.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
3.1 Advanced Operating Systems Superpages TLB coverage is the amount of memory mapped by TLB. I.e. the amount of memory that can be accessed without TLB.
Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel, Alan Cox OSDI 2002.
CS 3204 Operating Systems Godmar Back Lecture 18.
Logistics Homework 5 will be out this evening, due 3:09pm 4/14
Lecture: Large Caches, Virtual Memory
CS703 - Advanced Operating Systems
EECE.4810/EECE.5730 Operating Systems
PRACTICAL, TRANSPARENT OPERATING SYSTEM SUPPORT FOR SUPERPAGES
CSE 542: Operating Systems
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 542: Operating Systems
Lecture 9: Caching and Demand-Paged Virtual Memory
Presentation transcript:

CS 443 Advanced OS David R. Choffnes, Spring 2005 Practical, transparent operating system support for superpages Juan Navarro, Sitaram Iyer, Peter Druschel, Alan Cox (Rice University) Appears in: Fifth Symposium on Operating Systems Design and Implementation (OSDI 2002) Presented by: David R. Choffnes

2 Outline The superpage problem Related Approaches Design Implementation Evaluation Conclusion

3 Introduction TLB coverage –Definition –Effect on performance Superpages –Wasted memory –Fragmentation Contribution –General, transparent superpages –Deals with fragmentation –Contiguity-aware page replacement algo –Demotion/Eviction of dirty superpages

4 The Superpage Problem Factor of 1000 decrease in 15 years TLB miss overhead: 5% 5-10% 30% TLB coverage trend TLB coverage of % of main memory

5 The Superpage Problem Increasing TLB coverage –More TLB entries is expensive –Larger page size leads to internal fragmentation and increased I/O –Solution: use multiple page sizes Superpage definition Hardware-imposed constraints –Finite set of page sizes (subset of powers of 2) –Contiguity –Alignment

6 A superpage TLB base page entry (size=1) superpage entry (size=4) physical memory virtual memory virtual address TLB physical address Alpha: 8,64,512KB; 4MB Itanium: 4,8,16,64,256KB; 1,4,16,64,256MB

7 Superpage Issues and Tradeoffs Allocation –Relocation –Reservation

8 Issue 1: superpage allocation virtual memory physical memory superpage boundaries B B A A C C D D ABCD How / when / what size to allocate? How / when / what size to allocate?

9 Superpage Issues (Cont.) Promotion –Incremental –Timing (not too soon, not too late) Demotion and Eviction –Hardware reference and dirty bit limitation

10 Issue 2: promotion Promotion: create a superpage out of a set of smaller pages –mark page table entry of each base page When to promote? Create small superpage? May waste overhead. Wait for app to touch pages? May lose opportunity to increase TLB coverage. Forcibly populate pages? May cause internal fragmentation.

11 Superpage Issues: Fragmentation Fragmentation –Memory becomes fragmented due to use of multiple page sizes persistence of file cache pages scattered wired (non-pageable) pages –Contiguity as contended resource

12 Related Approaches HP-UX and IRIX Reservations –Not transparent Page Relocation –Used exclusively, leads to lower performance due to increased TLB misses Hardware Support –Talluri and Hill: Remove contiguity requirement This approach: Hybrid reservation and relocation system with page replacement that biases toward pages that contribute to contiguity

13 Design Reservation-based superpage management Multiple superpage sizes Demotion of sparsely referenced superpages Preservation of contiguity w/o compaction Efficient disk I/O for partially modified SPs Uses buddy allocator for contiguous regions

14 Key observation Once an application touches the first page of a memory object then it is likely that it will quickly touch every page of that object Example: array initialization Opportunistic policies –superpages as large and as soon as possible –as long as no penalty if wrong decision

15 Reservations Set of frames initially reserved at page fault –Fixed-size objects: largest aligned superpage that is not larger than the object –Dynamic objects: same as fixed, but reservation is allowed to extend beyond the end of the object Preemption –If no available memory for allocation request, system will preempt the reservation whose most recent page allocation occurred least recently

16 Managing reservations largest unused (and aligned) chunk best candidate for preemption at front: reservation whose most recently populated frame was populated the least recently reservation whose most recently populated frame was populated the least recently 1 2 4

17 Other Design Issues Fragmentation control –Coalescing –Contiguity-aware page replacement Incremental promotions –Occurs as soon as a superpage region is fully populated Speculative demotion –Occurs on eviction (recursively) –Occurs on first write to clean superpage Overhead too high for hash digests –Daemon periodically demotes pages speculatively Necessary due to reference bit limitation

18 Incremental promotions Promotion policy: opportunistic

19 More Design Issues Multi-list reservation scheme –One list of each page size supported by hardware –Reservations sorted by allocation recency –Preemption removes from head of list Reservation recursively broken into extents Fully populated extents are not put in reservation lists Population map –Reserved frame lookup –Overlap avoidance –Promotion decisions –Preemption assistance

20 Implementation Notes FreeBSD uses three lists of pages in A-LRU order: active, inactive, cache Contiguity-aware page daemon –Cache considered available for allocation –Daemon activated when contiguity falls low –Clean file-backed pages moved to inactive as soon as file is closed Wired page clustering Multiple mappings

21 Evaluation Setup –FreeBSD 4.3 –Alpha 21264, 500 MHz, 512 MB RAM –8 KB, 64 KB, 512 KB, 4 MB pages –128-entry DTLB, 128-entry ITLB –Unmodified applications

22 Best-Case Results TLB miss reduction usually above 95% SPEC CPU2000 integer –11.2% improvement (0 to 38%) SPEC CPU2000 floating point –11.0% improvement (-1.5% to 83%) Other benchmarks –FFT (200 3 matrix): 55% –1000x1000 matrix transpose: 655% 30%+ in 8 out of 35 benchmarks

23 Benefits of multiple page sizes Speedups TLB Miss Reduction

24 Sustained benefits Use Web server to fragment memory, then use FFTW to see how quickly memory is reclaimed FFTW reaches a speedup of almost 55%, Web server performance degrades only 1.6% on successive run Concurrent execution: only 3% degradation with modified page daemon

25 Fragmentation control time min normalized contiguity of free memory no frag control web serverFFT no speedup full speedup partial speedup web serverFFT frag control

26 Adversary applications Incremental promotion –Slowdown of 8.9%, 7.2% is hardware-specific Sequential access –0.1% degradation Preemption –1.1% degradation General overhead –Use superpage supporting mechanisms, but dont promote: 1-2% performance degradation

27 Cetera Dirty Superpages –Performance penalty of not demoting is a factor of 20 Scalability –Most operations O(1), O(S) or O(S*R) –Daemon, promotion, demotion and dirty/reference bit emulation are linear Promotion/Demotion is amortized to O(S) for programs the need to change page size only early in life Dirty/Reference bits: Motivates the need for clustered page tables either in OS or HW

28 Conclusion Effective, transparent and efficient support for superpages Demonstrates effectiveness of multiple page sizes Improved performance for nearly all applications Minimal overhead Scalable to large numbers of page sizes