G22.3250-001 Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees.

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

More on File Management
Free Space and Allocation Issues
File Systems.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Allocation Methods - Contiguous
File Systems Examples.
COS 318: Operating Systems File Layout and Directories
Chapter 11: File System Implementation
CMPT 300: Final Review Chapters 8 – Memory Management: Ch. 8, 9 Address spaces Logical (virtual): generated by the CPU Physical: seen by the memory.
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
File System Implementation
File System Implementation
The design and implementation of a log-structured file system The design and implementation of a log-structured file system M. Rosenblum and J.K. Ousterhout.
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Chapter 12: File System Implementation
G Robert Grimm New York University Sprite LFS or Let’s Log Everything.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
File Systems. Main Points File layout Directory layout.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts– 8 th Edition Chapter 11: File System Implementation.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
File Systems (1). Readings r Silbershatz et al: 10.1,10.2,
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Chapter pages1 File Management Chapter 12.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials– 8 th Edition Chapter 10: File System Implementation.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
12.1 Silberschatz, Galvin and Gagne ©2003 Operating System Concepts with Java Chapter 12: File System Implementation Chapter 12: File System Implementation.
Chapter 11: File System Implementation Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 11: File System Implementation Chapter.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition File System Implementation.
CS333 Intro to Operating Systems Jonathan Walpole.
UNIX File System (UFS) Chapter Five.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 12: File System Implementation.
File Systems cs550 Operating Systems David Monismith.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Lecture Topics: 12/1 File System Implementation –Space allocation –Free Space –Directory implementation –Caching Disk Scheduling File System/Disk Interaction.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
File Systems.  Issues for OS  Organize files  Directories structure  File types based on different accesses  Sequential, indexed sequential, indexed.
FILE SYSTEM IMPLEMENTATION 1. 2 File-System Structure File structure Logical storage unit Collection of related information File system resides on secondary.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
W4118 Operating Systems Instructor: Junfeng Yang.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 12: File System Implementation.
File System Consistency
Chapter 11: File System Implementation
Chapter 11: File System Implementation
FileSystems.
File System Structure How do I organize a disk into a file system?
Filesystems.
Chapter 12: File System Implementation
Chapter 11: File System Implementation
Introduction to Operating Systems
Overview: File system implementation (cont)
File System Implementation
File-System Structure
Chapter 14: File System Implementation
Chapter 14: File-System Implementation
File System Implementation
SE350: Operating Systems Lecture 12: File Systems.
Presentation transcript:

G Robert Grimm New York University SGI’s XFS or Cool Pet Tricks with B+ Trees

Altogether Now: The Three Questions  What is the problem?  What is new or different?  What are the contributions and limitations?

Some Background  Inode  On-disk data structure containing a file’s metadata and pointers to its data  Defines FS-internal namespace  Inode numbers  Originated in Unix FFS  Vnode  Kernel data structure for representing files  Provides standard API for accessing different FS’s  Originated in Sun OS

Motivation  On one side: I/O bottleneck  It’s becoming harder to utilize increasing I/O capacity and bandwidth  GB disk drives provide 1 TB of storage  High end drives provide 500 MB/sec sustained disk bandwidth  On the other side: I/O intensive applications  Editing of uncompressed video  30 MB/sec per stream, 108 GB for one hour  Streaming compressed video on demand  2.7 TB for 1,000 movies, 200 movies require 100 MB/sec

Scalability Problems of Existing File Systems  Slow crash recovery  fsck needs to scan entire disk  No support for large file systems  32 bit block pointers address only 4 million blocks  At 8 KB per block, 32 TB  No support for large, sparse files  64 bit block pointers require more levels of indirection  Are also quite inefficient  Fixed-size extents are still too limiting

Scalability Problems of Existing File Systems (cont.)  No support for large, contiguous files  Bitmap structures for tracking free and allocated blocks do not scale  Hard to find large regions of contiguous space  But, we need contiguous allocation for good utilization of bandwidth  No support for large directories  Linear layout (inode number, name entries) does not scale  In-memory hashing imposes high memory overheads  No support for large numbers of files  Inodes preallocated during file system creation

XFS in a Nutshell  Use 64 bit block addresses  Support for larger files systems  Use B+ trees and extents  Support for larger number of files, larger files (which may be sparse or contiguous), larger directories  Better utilization of I/O bandwidth  Log metadata updates  Faster crash recovery

XFS Architecture  I/O manager  I/O requests  Directory manager  File system name space  Space manager  Free space, inode & file allocation  Transaction manager  Atomic metadata updates  Unified buffer cache  Volume manager  Striping, concatenation, mirroring

Storage Scalability  Allocation groups  Are regions with their own free space maps and inodes  Support AG-relative block and inode pointers  Reduce size of data structures  Improve (thread) parallelism of metadata management  Allow concurrent accesses to different allocation groups  Unlike FFS, are motivated (mostly) by scalability and parallelism and not by locality  Free space  Two B+ trees describing extents (what’s a B+ tree?)  One indexed by starting block (used when?)  One indexed by length of extent (used when?)

Storage Scalability (cont.)  Large files  File storage tracked by extent map  Each entry: block offset in file, length in blocks, starting block on disk  Small extent map organized as list in inode  Large extent map organized as B+ tree rooted in inode  Indexed by block offset in file  Large number of files  Inodes allocated dynamically  In chunks of 64  Inode locations tracked by B+ tree  Only points to inode chunks

Storage Scalability (cont.)  Large directories  Directories implemented as (surprisingly) B+ trees  Map 4 byte hashes to directory entries (name, inode number)  Fast crash recovery  Enabled by write ahead log  For all structural updates to metadata  E.g., creating a file  directory block, new inode, inode allocation tree block, allocation group header block, superblock  Independent of actual data structures (just binary data)  However, still need disk scavengers for catastrophic failures  The customers spoke…

Performance Scalability  Allocating files contiguously  On-disk allocation is delayed until flush  Uses (cheap and plentiful) memory to improve I/O performance  Typically enables allocation in one extent  Even for random writes (think memory-mapped files)  Avoids allocation for short-lived files  Extents have large range: 21 bit length field  Two million file system blocks  Block size can vary by file system  Small blocks for file systems with many small files  Large blocks for file systems with mostly large files  What prevents long-term fragmentation?

Performance Scalability (cont.)  Performing file I/O  Read requests issued for large I/O buffers  Followed by multiple read ahead requests for sequential reads  Writes are clustered to form larger, asynch. I/O requests  Delayed allocation helps with buffering writes  Direct I/O lets applications bypass cache and use DMA  Applications have control over I/O, while still accessing file system  But also need to align data on block boundaries and issue requests on multiples of block size  Reader/writer locking supports more concurrency  Several processes on different CPUs can access the same file  Direct I/O leaves serialization entirely to applications

Performance Scalability (cont.)  Accessing and updating metadata  Updates performed in asynchronous write-ahead log  Modified data still only flushed after log update has completed  But metadata not locked, multiple updates can be batched  Log may be placed on different device from file system  Including non-volatile memory (NV-RAM)  Log operation is simple, but log is centralized (think SMP)  Provide buffer space, copy, write out, notify  Copying can be done by processors performing transaction

Experiences

I/O Throughput  What can we conclude?  Read speed, difference between creates and writes, parallelism

Benchmark Results (The Marketing Dept. Speaketh)  Datamation sort  3.52 seconds (7 seconds previous record)  Indy MinuteSort  1.6 GB sorted in 56 seconds (1.08 GB previously)  SPEC SFS  8806 SPECnfs instead of 7023 SPECnfs with EFS  12% increase with mostly small & synchronous writes on similar hardware

Directory Lookups Why this noticeable break?

Are You Pondering What I Am Pondering?  Are these systems/approaches contradictory?  Recoverable virtual memory  Simpler is better and better performing  XFS  Way more complex is better and better performing

What Do You Think?