CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.

Slides:



Advertisements
Similar presentations
Page-replacement policies On some standard algorithms for the management of resident virtual memory pages.
Advertisements

More on File Management
Memory management.
CSE506: Operating Systems Physical Page Reclamation.
File Systems.
Chapter 11: File System Implementation
File System Implementation
Other File Systems: LFS and NFS. 2 Log-Structured File Systems The trend: CPUs are faster, RAM & caches are bigger –So, a lot of reads do not require.
Computer ArchitectureFall 2008 © CS : Computer Architecture Lecture 22 Virtual Memory (1) November 6, 2008 Nael Abu-Ghazaleh.
G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.
Multiprocessing Memory Management
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Translation Buffers (TLB’s)
1  Caches load multiple bytes per block to take advantage of spatial locality  If cache block size = 2 n bytes, conceptually split memory into 2 n -byte.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Memory Management in Windows and Linux &. Windows Memory Management Virtual memory manager (VMM) –Executive component responsible for managing memory.
An Evaluation of Using Deduplication in Swappers Weiyan Wang, Chen Zeng.
1 File System Implementation Operating Systems Hebrew University Spring 2010.
CS 6560 Operating System Design Lecture 13 Finish File Systems Block I/O Layer.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Some VM Complications Extra memory accesses Page tables are huge
CS 162 Section Lecture 8. What happens when you issue a read() or write() request?
CS 153 Design of Operating Systems Spring 2015 Lecture 17: Paging.
Operating Systems ECE344 Ding Yuan Paging Lecture 8: Paging.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
File System Implementation
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Review °Apply Principle of Locality Recursively °Manage memory to disk? Treat as cache Included protection as bonus, now critical Use Page Table of mappings.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
CS333 Intro to Operating Systems Jonathan Walpole.
Jeff's Filesystem Papers Review Part I. Review of "Design and Implementation of The Second Extended Filesystem"
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
UNIX & Windows NT Name: Jing Bai ID: Date:8/28/00.
CS 3204 Operating Systems Godmar Back Lecture 21.
1 Chapter Seven CACHE MEMORY AND VIRTUAL MEMORY. 2 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4.
File and Page Caching CSE451 Andrew Whitaker. Oh, The Many Caches of CSE451 Virtual memory  OS maintains a cache of recently used memory pages File system.
4P13 Week 12 Talking Points Device Drivers 1.Auto-configuration and initialization routines 2.Routines for servicing I/O requests (the top half)
1 Appendix C. Review of Memory Hierarchy Introduction Cache ABCs Cache Performance Write policy Virtual Memory and TLB.
File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.
CDA 5155 Virtual Memory Lecture 27. Memory Hierarchy Cache (SRAM) Main Memory (DRAM) Disk Storage (Magnetic media) CostLatencyAccess.
W4118 Operating Systems Instructor: Junfeng Yang.
Translation Lookaside Buffer
Jonathan Walpole Computer Science Portland State University
Virtual Memory - Part II
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
File System Structure How do I organize a disk into a file system?
Filesystems.
Memory Hierarchy Virtual Memory, Address Translation
Journaling File Systems
CSCI206 - Computer Organization & Programming
Virtual Memory Hardware
File System Implementation
CSE 451: Operating Systems Autumn 2005 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451 Fall 2003 Section 11/20/2003.
CSE451 Virtual Memory Paging Autumn 2002
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
CSE 451: Operating Systems Autumn 2003 Lecture 10 Paging & TLBs
CSE 451: Operating Systems Autumn 2003 Lecture 9 Memory Management
Lecture 7: Flexible Address Translation
Lecture 13: Computer Memory
Lecture 9: Caching and Demand-Paged Virtual Memory
File System Performance
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Page Cache and Page Writeback
Lecture Topics: 11/20 HW 7 What happens on a memory reference Traps
Presentation transcript:

CSE506: Operating Systems Block Cache

CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file inode has an address space (0—file size) – So do block devices that cache data in RAM (0—dev size) – So does virtual memory of a process (0—16EB in 64-bit) All page mappings are (object, offset) tuple

CSE506: Operating Systems Types of Address Spaces “Anonymous” memory – no file backing it – e.g., the stack for a process – Not shared between processes Will discuss sharing and swapping later – How do we figure out virtual to physical mapping? Data structure to map virtual to physical addresses Some OSes (e.g, Linux) just walk the page tables File contents – backed by storage device – Kernel can map file pages into application

CSE506: Operating Systems Logical View Disk Hello! Foo.txt inode ? Process A ? Process B ? Process C ?

CSE506: Operating Systems Logical View Disk Hello! Foo.txt inode ? Process A Process B Process C File Descriptor Table FDs are process- specific

CSE506: Operating Systems Tracking Inode Pages What data structure to use for an inode? – No page tables for files Ex: What page stores the first 4k of file “foo” What data structure to use? – Hint: Files can be small or very, very large

CSE506: Operating Systems The Radix Tree A space-optimized trie – Trie without key in each node Traversal of parent(s) builds a prefix Tree with branching factor k > 2 is fast – Faster lookup for large files (esp. with tricks) Assume upper bound file size when building – Can rebuild later if we are wrong Ex: Max size is 256k, branching factor (k) = k / 4k pages = 64 pages – Need a radix tree of height 1 to represent these pages

CSE506: Operating Systems Tree of height 1 Root has 64 slots, can be null or a pointer to a page Lookup address X: – Shift off low 12 bits (offset within page) – Use next 6 bits as an index into these slots (2 6 = 64) – If pointer non-null, go to the child node (page) – If null, page doesn’t exist

CSE506: Operating Systems Tree of height n Shift off low 12 bits At each child, shift off 6 bits from middle … (starting at 6 * (distance to the bottom – 1) bits) – To find which of the 64 potential children to go to – Fixed height to figure out where to stop (use bits for offset) Observations: – “Key” at each node implicit based on position in tree – Lookup time constant in height of tree In a general-purpose radix tree, may have to check all k children – Higher lookup cost

CSE506: Operating Systems Fixed heights If the file size grows beyond max height – Grow the tree Add another root, previous tree becomes first child Scaling in height: – 1: 2 (6 ∙ 1)+12 = 256 KB – 2: 2 (6 ∙ 2)+12 = 16 MB – 3: 2 (6 ∙ 3)+12 = 1 GB – 4: 2 (6 ∙ 4)+12 = 16 GB – 5: 2 (6 ∙ 5)+12 = 4 TB

CSE506: Operating Systems From “Understanding the Linux Kernel”

CSE506: Operating Systems Block Cache (File Address Spaces) Cached inodes (files) have a radix tree – Points to physical pages – Tree is sparse: pages not in memory are missing Radix tree also supports tags – A tree node is tagged if at least one child also has the tag – Example: Tag a file’s page ‘dirty’ Must tag each parent in the radix tree as dirty When finished writing page back – Check all siblings » If none dirty, clear the parent’s dirty tag

CSE506: Operating Systems Logical View Disk Hello! Foo.txt inode Process A Process B Process C Address Space Radix Tree

CSE506: Operating Systems Reading Block Cache VFS does a read() – Looks up in inode’s radix tree – If found in block cache, can return data immediately If data not in block cache – Call’s FS-specific read() operation Schedules getting data from disk – Puts process to sleep until disk interrupt When disk read is done – Populate radix tree with pointer to page – Wake up process – Repeat VFS read attempt

CSE506: Operating Systems Dirty Pages OSes do not write file updates to disk immediately – Pages might be written again soon – Writes can be done later, “when convenient” OS instead tracks “dirty” pages – Ensures that write back isn’t delayed too long Lest data be lost in a crash Application can force immediate write back – sync() system calls (and some open/mmap options)

CSE506: Operating Systems Sync System Calls sync() – Flush all dirty buffers to disk fsync(fd) – Flush fd’s dirty buffers to disk – Including inode fdatasync(fd) – Flush fd’s dirty buffers to disk – Don’t bother with the inode

CSE506: Operating Systems How to implement sync? Goal: keep overheads of finding dirty blocks low – A naïve scan of all pages would work, but expensive – Lots of clean pages Idea: keep track of dirty data to minimize overheads – A bit of extra work on the write path, of course

CSE506: Operating Systems How to implement sync? Background: Each file system has a super block – All super blocks in a list Each in-memory super block keeps list of dirty inodes Inodes and superblocks both marked dirty upon use

CSE506: Operating Systems FS Organization SB / SB / SB /floppy SB /floppy SB /d1 SB /d1 One Superblock per FS One Superblock per FS inode Dirty list Dirty list of inodes Inodes and radix nodes/pages marked dirty separately

CSE506: Operating Systems Simple Dirty Traversal for each s in superblock list: if (s->dirty) writeback s for i in inode list: if (i->dirty) writeback i if (i->radix_root->dirty) : // Recursively traverse tree, writing // dirty pages and clearing dirty flag

CSE506: Operating Systems Asynchronous Flushing Kernel thread(s): pdflush – Task that runs in the kernel’s address space – 2-8 threads, depending on how busy/idle threads are When pdflush runs – Kernel maintains a total number of dirty pages – Heuristics/admin configures a target dirty ratio (say 10%) – When pdflush wakes up Figures out how many dirty pages are above target ratio Determines target number of pages to write back Writes back pages until it meets its goal or can’t write more back – (Some pages may be locked, just skip those)

CSE506: Operating Systems Writeback to Stable Storage We can find dirty pages in physical memory How does kernel know where on disk to write them? – And which disk for that matter? Superblock tracks device Inode tracks mapping from file offset to LBA – Note: this is FS’s inode, not VFS’s inode – Probably uses something like a radix tree