CS 4284 Systems Capstone Godmar Back Disks & File Systems.

Slides:



Advertisements
Similar presentations
4/8/14CS161 Spring FFS Recovery: Soft Updates Learning Objectives Explain how to enforce write-ordering without synchronous writes. Identify and.
Advertisements

4/8/14CS161 Spring Journaling File Systems Learning Objectives Explain log journaling/logging can make a file system recoverable. Discuss tradeoffs.
More on File Management
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
Concepts about the file system 2. The disk structure 3. Files in disk – The ext2 FS 4. The Virtual File System (c) 2013, Prof. Jordi Garcia.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
File Systems Examples.
Lecture 18 ffs and fsck. File-System Case Studies Local FFS: Fast File System LFS: Log-Structured File System Network NFS: Network File System AFS: Andrew.
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Chapter 19 Database Recovery Techniques. Slide Chapter 19 Outline Databases Recovery 1. Purpose of Database Recovery 2. Types of Failure 3. Transaction.
Ext3 Journaling File System “absolute consistency of the filesystem in every respect after a reboot, with no loss of existing functionality” chadd williams.
Crash recovery All-or-nothing atomicity & logging.
Lecture 17 FS APIs and vsfs. File and File Name What is a File? Array of bytes. Ranges of bytes can be read/written. File system consists of many files,
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
File System Reliability. Main Points Problem posed by machine/disk failures Transaction concept Reliability – Careful sequencing of file system operations.
The Design and Implementation of a Log-Structured File System Presented by Carl Yao.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.
IT 344: Operating Systems Winter 2008 Module 16 Journaling File Systems Chia-Chi Teng CTB 265.
CS 3204 Operating Systems Godmar Back Lecture 26.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
Log-structured File System Sriram Govindan
26-Oct-15CSE 542: Operating Systems1 File system trace papers The Design and Implementation of a Log- Structured File System. M. Rosenblum, and J.K. Ousterhout.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Reliability and Recovery CS Introduction to Operating Systems.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Shared Files Sharing files among team members A shared file appearing simultaneously in different directories Share file by link File system becomes.
CS 153 Design of Operating Systems Spring 2015 Lecture 22: File system optimizations.
CSE 451: Operating Systems Spring 2012 Journaling File Systems Mark Zbikowski Gary Kimura.
File Systems Operating Systems 1 Computer Science Dept Va Tech August 2007 ©2007 Back File Systems & Fault Tolerance Failure Model – Define acceptable.
CS333 Intro to Operating Systems Jonathan Walpole.
Lecture 21 LFS. VSFS FFS fsck journaling SBDISBDISBDI Group 1Group 2Group N…Journal.
Outline for Today Journaling vs. Soft Updates Administrative.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
File Systems 2. 2 File 1 File 2 Disk Blocks File-Allocation Table (FAT)
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Transactions and Reliability Andy Wang Operating Systems COP 4610 / CGS 5765.
Lecture 20 FSCK & Journaling. FFS Review A few contributions: hybrid block size groups smart allocation.
JOURNALING VERSUS SOFT UPDATES: ASYNCHRONOUS META-DATA PROTECTION IN FILE SYSTEMS Margo I. Seltzer, Harvard Gregory R. Ganger, CMU M. Kirk McKusick Keith.
File System Performance CSE451 Andrew Whitaker. Ways to Improve Performance Access the disk less  Caching! Be smarter about accessing the disk  Turn.
CS 3204 Operating Systems Godmar Back Lecture 25.
Journaling versus Softupdates Asynchronous Meta-Data Protection in File System Authors - Margo Seltzer, Gregory Ganger et all Presenter – Abhishek Abhyankar.
W4118 Operating Systems Instructor: Junfeng Yang.
CSE 451: Operating Systems Winter 2015 Module 17 Journaling File Systems Mark Zbikowski Allen Center 476 © 2013 Gribble, Lazowska,
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
File System Consistency
© 2013 Gribble, Lazowska, Levy, Zahorjan

FFS: The Fast File System
Jonathan Walpole Computer Science Portland State University
Transactions and Reliability
CS 5204 Operating Systems Disks & File Systems Godmar Back.
Journaling File Systems
CSE 451: Operating Systems Autumn Module 16 Journaling File Systems
CSE 451: Operating Systems Spring 2011 Journaling File Systems
Printed on Monday, December 31, 2018 at 2:03 PM.
CSE 451: Operating Systems Winter Module 16 Journaling File Systems
CSE 451: Operating Systems Spring Module 17 Journaling File Systems
CSE 451: Operating Systems Spring Module 16 Journaling File Systems
CSE 451 Fall 2003 Section 11/20/2003.
CSE 451: Operating Systems Spring 2008 Module 14
Database Recovery 1 Purpose of Database Recovery
CS703 - Advanced Operating Systems
CS703 - Advanced Operating Systems
CSE 451: Operating Systems Winter Module 16 Journaling File Systems
CSE 451: Operating Systems Spring 2010 Module 14
Presentation transcript:

CS 4284 Systems Capstone Godmar Back Disks & File Systems

CS 4284 Spring 2013 Filesystems Consistency & Logging

CS 4284 Spring 2013 Filesystems & Managing Faults Filesystem’s promise: –Persistence –Defined behavior in the presence of faults Need Failure Model –Define acceptable failures (disk head hits dust particle, scratches disk – you will lose some data) –Define which failure outcomes are unacceptable Need a strategy to avoid unacceptable failure outcomes –Proactive (avoid them) –Reactive (recover from them)

Proactive Methods Use atomic changes –construct larger atomic changes from the small atomic units available (i.e., single sector writes) Idea: use state duplication CS 4284 Spring 2013 write(data) version++ write_atomic(versionloc1, version) write_multiple(dataloc1, data) write_atomic(versionloc2, version) write_multiple(dataloc2, data) write(data) version++ write_atomic(versionloc1, version) write_multiple(dataloc1, data) write_atomic(versionloc2, version) write_multiple(dataloc2, data) read() v1 = read(versionloc1) v2 = read(versionloc2) data1 = read(dataloc1) data2 = read(dataloc2) return v1 == v2 ? data1 : data2; read() v1 = read(versionloc1) v2 = read(versionloc2) data1 = read(dataloc1) data2 = read(dataloc2) return v1 == v2 ? data1 : data2;

Reactive Methods Option 1: On failure, retry entire computation –not a good model for persistent filesystems Option 2: Define recovery procedure to deal with unacceptable failures: –Recovery moves from an incorrect state A to correct state B –Must understand possible incorrect states A after crash! –A is like “snapshot of the past” –Anticipating all states A is difficult For file systems, option 2 means to ensure that on- disk changes are ordered such that if crash occurs after any step, a recovery program can either undo change or complete it CS 4284 Spring 2013

Unix Filesystem Overview CS 4284 Spring 2013 i0i3i2i1 file2 inode table file1 dir1 dir2 (i1, “.”)(i3, “d”) (i0, “la”)(i3, “.”) (i0, “a”)

CS 4284 Spring 2013 Sensible Invariants In a Unix-style file system, ideally, we would want that: –File & directory names are unique within parent directory –Free list/map accounts for all free objects all objects on free list are really free –All data blocks belong to exactly one file (only one pointer to them) –Inode’s ref count reflects exact number of directory entries pointing to it –Don’t show previously deleted data to applications Some of these invariants –can be reestablished via fsck after crash –must be maintained proactively before crash (either because fsck couldn’t fix them or because it would lead to unacceptable state if user skipped fsck) –are not maintained because of cost

CS 4284 Spring 2013 Crash Recovery (fsck) After crash, fsck runs and performs the equivalent of mark-and-sweep garbage collection Follow, from root directory, directory entries –Count how many entries point to inode, adjust ref count Recover unreferenced inodes: –Scan inode array and check that all inodes marked as used are referenced by dir entry –Move others to /lost+found Recomputes free list: –Follow direct blocks+single+double+triple indirect blocks, mark all blocks so reached as used – free list/map is the complement In following discussion, keep in mind what fsck could and could not fix!

CS 4284 Spring 2013 Example 1: file create On create(“foo”), have to 1.Scan current working dir for entry “foo” (fail if found); else find empty slot in directory for new entry 2.Allocate an inode #in 3.Insert pointer to #in in directory: (#in, “foo”) 4.Write a) inode & b) directory back What happens if crash after 1, 2, 3, or 4a), 4b)? Does order of inode vs directory write back matter? Rule: never write persistent pointer to object that’s not (yet) persistent

CS 4284 Spring 2013 Example 2: file unlink To unlink(“foo”), must 1.Find entry “foo” in directory 2.Remove entry “foo” in directory 3.Find inode #in corresponding to it, decrement #ref count 4.If #ref count == 0, free all blocks of file 5.Write back inode & directory Q.: what’s the correct order in which to write back inode & directory? Q.: what can happen if free blocks are reused before inode’s written back? Rule: first persistently nullify pointer to any object before freeing it (object=freed blocks & inode)

CS 4284 Spring 2013 Example 3: file rename To rename(“foo”, “bar”), must 1.Find entry (#in, “foo”) in directory 2.Check that “bar” doesn’t already exist 3.Remove entry (#in, “foo”) 4.Add entry (#in, “bar”) Suppose crash after directory block containing old “foo” was written back, but before “bar” entry was written back

CS 4284 Spring 2013 Example 3a: file rename To rename(“foo”, “bar”), conservatively 1.Find entry (#i, “foo”) in directory 2.Check that “bar” doesn’t already exist 3.Increment ref count of #i 4.Add entry (#i, “bar”) to directory 5.Remove entry (#i, “foo”) from directory 6.Decrement ref count of #i Worst case: have old & new names to refer to file Rule: never nullify pointer before setting a new pointer

CS 4284 Spring 2013 Example 4: file growth Suppose file_write() is called. –First, find block at offset Case 1: metadata already exists for block (file is not grown) –Simply write data block Case 2: must allocate block, must update metadata (direct block pointer, or indirect block pointer) –Must write changed metadata (inode or index block) & data Both writeback orders can lead to acceptable failures: –File data first, metadata next – may lose some data on crash –Metadata first, file data next – may see previous user’s deleted data after crash (very expensive to avoid – would require writing all data synchronously)

CS 4284 Spring 2013 FFS’s Consistency Model Berkeley FFS (Fast File System) formalized rules for file system consistency FFS acceptable failures: –May lose some data on crash –May see someone else’s previously deleted data Applications must zero data out if they wish to avoid this + fsync –May have to spend time to reconstruct free list –May find unattached inodes  lost+found Unacceptable failures: –Inability to bring filesystem back into consistent state via fsck –After crash, get active access to someone else’s data Either by pointing at reused inode or reused blocks FFS uses 2 synchronous writes on each metadata operation that creates/destroys inodes or directory entries, e.g., creat(), unlink(), mkdir(), rmdir() –Note: assumption here is that eviction order in underlying buffer cache is not controllable, hence need for synchronous writes

CS 4284 Spring 2013 Write Ordering & Logging Problem (1) with using synchronous writes –Updates proceed at disk speed rather than CPU/memory speed Problem (2) with using fsck –complexity proportional to used portion of disk –takes several hours to check GB-sized modern disks In the early 90s, approaches were developed that –Reduced the need for synchronous writes –Avoided need for fsck after crash Two classes of approaches: –Write-ordering (aka Soft Updates) BSD – the elegant approach –Journaling (aka Logging) Used in VxFS, NTFS, JFS, HFS+, ext3, reiserfs

CS 4284 Spring 2013 Write Ordering Instead of synchronously writing, record dependency in buffer cache –On eviction, write out dependent blocks before evicted block: disk will always have a consistent or repairable image –Repairs can be done in parallel to using the filesystem – don’t require delay on system reboot Example: –Must write block containing new inode before block containing changed directory pointing at inode Can completely eliminate need for synchronous writes Can do deletes in background after zeroing out directory entry & noting dependency Can provide additional consistency guarantees: e.g., make data blocks dependent on metadata blocks

CS 4284 Spring 2013 Write Ordering: Cyclic Dependencies Tricky case: A should be written before B, but B should be written before A? … must unravel by introducing intermediate versions Recommended Reading: See [Ganger 2000] for more informationGanger 2000

CS 4284 Spring 2013 Logging File Systems Idea from databases: keep track of changes –“write-ahead log” or “journaling”: modifications are first written to log before they are written to actually changed locations –reads bypass log After crash, trace through log and –redo completed metadata changes (e.g., created an inode & updated directory) –undo partially completed metadata changes (e.g., created an inode, but didn’t update directory) Log must be kept in persistent storage

CS 4284 Spring 2013 Logging Issues How much does logging slow normal operation down? Log writes are sequential –Can be fast, especially if separate disk is used –Subtlety: log actually does not have to be written synchronously, just in-order & before the data to which it refers! Can trade performance for consistency – write log synchronously if strong consistency is desired Need to recycle log –After “sync()”, can restart log since disk is known to be consistent

CS 4284 Spring 2013 Physical vs Logical Logging What & how should be logged? Physical logging: –Store physical state that’s affected by a change before or after block (or both) –Choice: easier to redo (if after) or undo (if before) –Advantage: can retrofit logging independent of logic that manipulates metadata representation Logical logging: –Store operation itself as log entry (rename(“a”, “b”)) –More space-efficient, but can be tricky to implement since metadata operation logic is involved in log replay

CS 4284 Spring 2013 Summary File system consistency is important Any file system design implies metadata dependency rules Designer needs to reason about state of file system after crash & avoid unacceptable failures –Needs to take worst-case scenario into account – crash after every sector write Most current file systems use logging –Various degrees of data/metadata consistency guarantees