Virtual Memory and I/O Mingsheng Hong. I/O Systems Major I/O Hardware Hard disks, network adaptors … Problems related with I/O Systems Various types of.

Slides:



Advertisements
Similar presentations
CSCC69: Operating Systems
Advertisements

MACHINE-INDEPENDENT VIRTUAL MEMORY MANAGEMENT FOR PAGED UNIPROCESSOR AND MULTIPROCESSOR ARCHITECTURES R. Rashid, A. Tevanian, M. Young, D. Golub, R. Baron,
OS Fall’02 Virtual Memory Operating Systems Fall 2002.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
May 7, A Real Problem  What if you wanted to run a program that needs more memory than you have?
Computer Organization CS224 Fall 2012 Lesson 44. Virtual Memory  Use main memory as a “cache” for secondary (disk) storage l Managed jointly by CPU hardware.
Lecture 34: Chapter 5 Today’s topic –Virtual Memories 1.
IO-Lite: A Unified Buffering and Caching System By Pai, Druschel, and Zwaenepoel (1999) Presented by Justin Kliger for CS780: Advanced Techniques in Caching.
IO-Lite: A Unified I/O Buffering and Caching System Vivek S. Pai, Peter Drusche Willy and Zwaenepoel 산업공학과 조희권.
1 A Real Problem  What if you wanted to run a program that needs more memory than you have?
The Memory Hierarchy (Lectures #24) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization.
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
Paging and Virtual Memory. Memory management: Review  Fixed partitioning, dynamic partitioning  Problems Internal/external fragmentation A process can.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Chapter 3.2 : Virtual Memory
Translation Buffers (TLB’s)
Virtual Memory and Paging J. Nelson Amaral. Large Data Sets Size of address space: – 32-bit machines: 2 32 = 4 GB – 64-bit machines: 2 64 = a huge number.
Memory Management April 28, 2000 Instructor: Gary Kimura.
Mem. Hier. CSE 471 Aut 011 Evolution in Memory Management Techniques In early days, single program run on the whole machine –Used all the memory available.
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
CS533 Concepts of OS Class 16 ExoKernel by Constantia Tryman.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Computer Architecture Lecture 28 Fasih ur Rehman.
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
Lecture 19: Virtual Memory
1 Chapter 3.2 : Virtual Memory What is virtual memory? What is virtual memory? Virtual memory management schemes Virtual memory management schemes Paging.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
July 30, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 8: Exploiting Memory Hierarchy: Virtual Memory * Jeremy R. Johnson Monday.
Virtual Memory Expanding Memory Multiple Concurrent Processes.
Memory Management Fundamentals Virtual Memory. Outline Introduction Motivation for virtual memory Paging – general concepts –Principle of locality, demand.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
1 A Secure Access Control Mechanism against Internet Crackers Kenichi Kourai* Shigeru Chiba** *University of Tokyo **University of Tsukuba.
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
CS333 Intro to Operating Systems Jonathan Walpole.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Processes and Virtual Memory
Full and Para Virtualization
CS2100 Computer Organisation Virtual Memory – Own reading only (AY2015/6) Semester 1.
Virtual Memory Ch. 8 & 9 Silberschatz Operating Systems Book.
Virtual Memory Review Goal: give illusion of a large memory Allow many processes to share single memory Strategy Break physical memory up into blocks (pages)
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a “cache” for secondary (disk) storage – Managed jointly.
CS203 – Advanced Computer Architecture Virtual Memory.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
CS161 – Design and Architecture of Computer
Translation Lookaside Buffer
Jonathan Walpole Computer Science Portland State University
Memory COMPUTER ARCHITECTURE
CS161 – Design and Architecture of Computer
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
CS703 - Advanced Operating Systems
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
CSE 153 Design of Operating Systems Winter 2018
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Evolution in Memory Management Techniques
Chapter 9: Virtual-Memory Management
Improving IPC by Kernel Design
Morgan Kaufmann Publishers Memory Hierarchy: Virtual Memory
Translation Buffers (TLB’s)
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
Paging and Segmentation
CS703 - Advanced Operating Systems
CSE 153 Design of Operating Systems Winter 2019
Virtual Memory.
Review What are the advantages/disadvantages of pages versus segments?
Virtual Memory 1 1.
Presentation transcript:

Virtual Memory and I/O Mingsheng Hong

I/O Systems Major I/O Hardware Hard disks, network adaptors … Problems related with I/O Systems Various types of Hardware – device drivers to provide OS with a unified I/O interface Typically much slower than CPU and memory speed – system bottleneck Too much CPU involvement in I/O operations

Techniques to Improve I/O Performance Buffering e.g. download a file from network DMA Caching CPU cache, TLB, file cache..

Other Techniques to Improve I/O Performance Virtual Memory Page Remapping (IO- Lite) Allows (cached) files and memory to be shared by different processes without extra data copy Prefetching Data (Software Pretching and Caching for TLBs) Prefetches and caches page table entries

Summary of First Paper IO-Lite: A Unified I/O Buffering and Caching System (Pai et al. Best Paper of 3 rd OSDI, 1999) A unified I/O System Uses immutable data buffers to store all I/O data (only one physical copy) Uses VM page remapping IPC file system (disk files, file cache) network subsystem

Summary of Second Paper Software Prefetching and Caching for Translation Lookaside buffers (Bala et al. 1994) A software approach to help reduce TLB misses Works well for IPC-intensive systems Bigger performance gain for future systems

Features of IO-Lite Eliminates redundant data copying Saves CPU work & avoids cache pollution Eliminates Multiple buffering Saves main memory => improves hit rate of file cache Enables cross-subsystem optimizations Cache Internet checksum Supports application-specific cache replacement policies

Related work before IO-Lite I/O APIs should preserve copy semantics Memory-mapped files Copy On Write Fbufs

Key Data Structures Immutable Buffers and Buffer Aggregates

Discussion I When we pass a buffer aggregate from process A to process B, how to efficiently do VM page remapping (modify B’s page table entries)? Possible Approach 1: find any empty entry, and modify the VM address contained in buffer aggregate Very inefficient Possible Approach 2: reserve the range of virtual addresses of buffers in the address space of each process Basically limited the total size of buffers – How about dynamically allocated buffers?

Impact of Immutable I/O Buffers Copy-On-Write Optimization Modified values are stored in a new buffer, as opposed to “in-place modification” Three situations when the data object is … Completely modified Allocates a new buffer Partially modified (modification localized) Chains unmodified and modified portions of data Partially modified (modification not localized) Compares the cost of writing an entire object with that of chaining; chooses the cheaper method

Discussion II How to measure the two costs? Heuristics needed Fragmented data v.s. clustered data Chained data increase reading cost Similar to shadow page technique used in System R Should the cost of retrieving data from buffer also be considered?

What does IO-Lite do? Reduces extra data copy in IPC file system (disk files, file cache) network subsystem Makes possible cross-subsystem optimization

IO-Lite and IPC Operations on Buffers & Aggregates When I/O data is transferred Pass related aggregates by value Associated buffers are passed by reference When buffer is deallocated Buffer returned to a memory pool Buffer’s VM page mappings persist When buffer is reused (by the same process) No further VM map changes required (Temporarily) grant write permission to associated producer process

Io-Lite and Filesystem IO-Lite I/O APIs Provided IOL_read(int fd, IOL_Agg **aggr, size_t size) IOL_write(int fd, IOL_Agg **aggr) IOL_write operations are atomic – concurrency support I/O functions in stdio library reimplemented Filesystem cache reorganized Buffer aggregates (pointers to data), instead of file data, are stored in cache Copy Semantics ensured Suppose a portion of a cached file is read, and then is overwritten

Copy Semantics Illustration 1

Copy Semantics Illustration 2

Copy Semantics Illustration 3

More on File Cache Management & VM Paging Cache replacement policy (can be customized) The eviction order is by current reference status & time of last file access Evict one entry when the file cache “appears” to be too large Added one entry on every file cache miss When a buffer page is paged out, data will be written back to swap space, and possibly to several other disk locations (for different files)

IO-Lite and Network Subsystem Access control and protection for processes ACL related with buffer pools Must determine the ACL of a data object prior to allocating memory for it Early demultiplexing technique to determine ACL for each incoming packet

A Cross-Subsystem Optimization Internet checksum caching Cache the computed checksum for each slice of a buffer aggregate Increment the version number when buffer is reallocated – can be used to check whether data changed Works well for static files. Also has a big benefit on the CGI programs that chain dynamic data with static data

Performance – Competitors Flash Web server – a high performance HTTP server Flash-Lite – A modified version of Flash using IO-Lite API Apache – representing the widely used Web server today

Performance – Static Content requesting

Performance – CGI Programs

Performance – Real Workload Average request size: 17KBytes

Performance – WAN Effects Memory for buffers = # clients * T ss

Performance – Other Applications

Conclusion on I/O-Lite A unified framework of I/O subsystems Impressive performance in Web applications due to copy-avoidance & checksum caching

Software Prefetching & Caching for TLBs Prefetching & Caching Never applied to TLB misses in a software approach Improves overall performance by up to 3% But has a great potential on newer architectures Clock Speed: 40MHz => 200 MHz

Issues in Virtual Memory User Address Space is typically huge … TLB to cache page tables Software support to help reduce TLB misses

Motivations TLB misses occur more frequently in Microkernel-based OS RISC computers handle TLB misses in software (trap) IPCs have a bigger impact on system performance

Approach Use a software approach to prefetch and cache TLB entries Experiments done on MIPS R3000- based (RISC) architecture with Mach 3.0 Applications chosen from standard benchmarks, as well as a synthetic IPC- intensive benchmark

Discussion The way the authors motivate their paper A right approach for a particular type of system A valid Argument for future computer systems regarding performance gain Figures of experimental results mostly showing the reduced number of TLB misses, instead of overall performance improvement A synthetic IPC-intensive application to support their approach

Prefetching: What entries to prefetch? L1U: user address spaces L1K: kernel data structures L2: user (L1U) page tables Stack segments Code segments Data segments L3: L1K and L2 page tables

Prefetching: Details On the first IPC call, probe hardware TLB on the PIC path and enter related TLB entries into PTLB On Subsequent IPC calls, entries are prefetched into PTLB by a hashed lookup Entries are stored in upmapped, cached physical memory

Prefetching: Performance

Rate of TLB misses?

Caching: Software Victim Cache Use a region of unmapped, cached memory to cache entries evicted from hardware TLB PTE lookup sequence: hardware TLB STLB generic trap handler

Caching: Benefits A faster trap path for TLB misses Avoids overhead of context switch Eliminates (reduces?) cascaded TLB misses

Caching: Performance Average STLB penalties Kernel TLB hit rates

Caching: Performance

Prefetching + Caching: Performance Worse than using PTLB alone! (Don’t understand the authors comment to justify it…)

Discussion SLTB (caching) is better than PLTB. So using it alone suffices. Is it possible to improve the IPC performance using both VM page remapping and software prefetching & caching?