Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University.

Slides:



Advertisements
Similar presentations
Chapter 12: File System Implementation
Advertisements

FILE SYSTEM IMPLEMENTATION
More on File Management
Chapter 4 : File Systems What is a file system?
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Disks and RAID.
File Systems.
File Management Lecture 3.
Allocation Methods - Contiguous
File Systems Examples.
COS 318: Operating Systems File Layout and Directories
Chapter 11: File System Implementation
File System Implementation
Disks and Files Vivek Pai Princeton University. 2 Why Files Physical reality Block oriented Physical sector #s No protection among users of the system.
Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University.
File System Implementation: beyond the user’s view A possible file system layout on a disk.
Operating Systems File Systems (in a Day) Ch
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
Ceng Operating Systems
1 Outline File Systems Implementation How disks work How to organize data (files) on disks Data structures Placement of files on disk.
Chapter 12: File System Implementation
Disks and Files Vivek Pai Princeton University. 2 Gedankyou Imagine the following: A disk scheduling policy says “handle the request that is closest to.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
File Systems (1). Readings r Silbershatz et al: 10.1,10.2,
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
File Systems and Disk Management. File system Interface between applications and the mass storage/devices Provide abstraction for the mass storage and.
File Implementation. File System Abstraction How to Organize Files on Disk Goals: –Maximize sequential performance –Easy random access to file –Easy.
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
1Fall 2008, Chapter 11 Disk Hardware Arm can move in and out Read / write head can access a ring of data as the disk rotates Disk consists of one or more.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
CSC 322 Operating Systems Concepts Lecture - 20: by Ahmed Mumtaz Mustehsan Special Thanks To: Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
CSCI-375 Operating Systems Lecture Note: Many slides and/or pictures in the following are adapted from: slides ©2005 Silberschatz, Galvin, and Gagne Some.
File Storage Organization The majority of space on a device is reserved for the storage of files. When files are created and modified physical blocks are.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 12: File System Implementation File System Structure File System Implementation.
File System Implementation
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Module 4.0: File Systems File is a contiguous logical address space.
CS 153 Design of Operating Systems Spring 2015 Lecture 21: File Systems.
Disk & File System Management Disk Allocation Free Space Management Directory Structure Naming Disk Scheduling Protection CSE 331 Operating Systems Design.
12/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
CS 3204 Operating Systems Godmar Back Lecture 21.
File Systems 2. 2 File 1 File 2 Disk Blocks File-Allocation Table (FAT)
Operating Systems 1 K. Salah Module 4.0: File Systems  File is a contiguous logical address space (of related records)  Access Methods  Directory Structure.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
CS399 New Beginnings Jonathan Walpole. Disk Technology & Secondary Storage Management.
Lecture Topics: 11/22 HW 7 File systems –block allocation Unix and NT –disk scheduling –file caches –RAID.
W4118 Operating Systems Instructor: Junfeng Yang.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 12: File System Implementation.
File Systems and Disk Management
Chapter 11: File System Implementation
FileSystems.
Disks and RAID.
File System Structure How do I organize a disk into a file system?
Filesystems.
Chapter 11: File System Implementation
File Systems and Disk Management
CS510 Operating System Foundations
File Systems and Disk Management
File Systems and Disk Management
File Systems and Disk Management
File System Implementation
File Systems and Disk Management
Chapter 14: File-System Implementation
File Systems and Disk Management
File Systems and Disk Management
File System Implementation
Lecture Topics: 11/20 HW 7 What happens on a memory reference Traps
File Systems CSE 2431: Introduction to Operating Systems
Presentation transcript:

Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University

2 Diskgedanken  Assuming you back-up and restore files, what factors affect the time involved?  How are these factors changing?  What issues affect the rates of change?  How is total backup time changing over the years?  What is Occam’s razor?

3 Today’s Overview  Quiz recap  Finish up metadata, reliability  A little discussion of mounting, etc  Move on to performance

4 Quiz 1 Observations  I’m disappointed  Quizzes not yet graded, but…  Most people did poorly on question 1  Lots of dimensional analysis  Lots of sleepers, chatting, weird faces  Very little (too little) feedback in general  Open question – looking for a methodical approach

5 Occam’s Razor  From William of Occam (philosopher)  “entities should not be multiplied unnecessarily”  Often reduced to other statements  “one should not increase, beyond what is necessary, the number of entities required to explain anything”  “Make as few assumptions as possible”  “once you have eliminated all other possible explanations, what remains must be the answer”

6 A Reasonable Approach  Disk size: 40GB (20-80GB common)  File size: 10KB (5-20KB common)  Access time: 10ms (5-20ms common)  Assume 1 seek per file (reasonable)  100 files = 1MB, each access.01 sec  So, 40GB at 1MB/s = 40K sec = 11+ hours

7 Changes Over Time  Disk density doubling each year  Seek time dropping < 10%  File size growing slowly  Results  # of files grows faster than access time reduction  Backup time increases

8 Most Common Answer  Disk size / maximum transfer rate  In other words, read sectors, not files  Can this be done?  Yes, if you have access to “raw” disk  Which means that you have “root” permission  And that the system has raw disk support  Faster than file-based dump/restore  No concept of files, however  What happens if you restore to a disk with a different geometry?

9 Linked Files (Alto)  File header points to 1st block on disk  Each block points to next  Pros  Can grow files dynamically  Free list is similar to a file  Cons  random access: horrible  unreliable: losing a block means losing the rest File header null...

10 Contiguous Allocation  Request in advance for the size of the file  Search bit map or linked list to locate a space  File header  first sector in file  number of sectors  Pros  Fast sequential access  Easy random access  Cons  External fragmentation  Hard to grow files

11 Single-Level Indexed Files or Extent-based Filesystems  A user declares max size  A file header holds an array of pointers to point to disk blocks  Pros  Can grow up to a limit  Random access is fast  Cons  Clumsy to grow beyond limit  Periodic cleanup of new files  Up-front declaration a real pain File header Disk blocks

File Allocation Table (FAT)  Approach  A section of disk for each partition is reserved  One entry for each block  A file is a linked list of blocks  A directory entry points to the 1st block of the file  Pros  Simple  Cons  Always go to FAT  Wasting space foo 217 EOF FAT

13 Multi-Level Indexed Files (Unix)  13 Pointers in a header  10 direct pointers  11: 1-level indirect  12: 2-level indirect  13: 3-level indirect  Pros & Cons  In favor of small files  Can grow  Limit is 16G and lots of seek  What happens to reach block 23, 5, 340? 1 2 data data

14 Reliability In Disk Systems  Make sure certain actions have occurred before function completes  Known as “synchronous” operation  Ex: make sure new inode is on disk & that the directory has been modified before declaring a file creation is complete  Drawback: speed  Some ops easily asynchronous: access time  Some filesystems don’t care: Linux ext2fs

15 Recovery After Failure Need to ensure consistency  Does free bitmap match tree walk?  Do reference counts in inodes match directory entries?  Do blocks appear in multiple inodes? This kind of recovery grows with disk size  Clean shutdown – mark as such, no recovery

16 Reducing Synchronous Times  Write to a faster storage  Nonvolatile memory – expensive, requires some additional OS/firmware support  Write to a special disk or section – logging  Only have to examine log when recovering  Eventually have to put information in place  Some information dies in the log itself  Write in a special order  Write metadata in a way that is consistent but possibly recovers less

17 Challenges  Unix filesystem has great flexibility  Extent-based filesystems have speed  Seeks kill performance – locality  Bitmaps show contiguous free space  Linked lists easy to search  How do you perform backup/restore?

18 Bigger, Faster, Stronger  Making individual disks larger is hard  Throw more disks at the problem  Capacity increases  Effective access speed may increase  Probability of failure also increases  Use some disks to provide redundancy  Generally assume a fail-stop model  Fail-stop versus Byzantine failures

19 RAID ( Redundant Array of Inexpensive Disks )  Main idea  Store the error correcting codes on other disks  General error correcting codes are too powerful  Use XORs or single parity  Upon any failure, one can recover the entire block from the spare disk (or any disk) using XORs  Pros  Reliability  High bandwidth  Cons  The controller is complex RAID controller XOR

20 Synopsis of RAID Levels RAID Level 0: Non redundant (JBOD) RAID Level 1: Mirroring RAID Level 2: Byte-interleaved, ECC RAID Level 3: Byte-interleaved, parity RAID Level 4: Block-interleaved, parity RAID Level 5: Block-interleaved, distributed parity

21 Did RAID Work?  Performance: yes  Reliability: yes  Cost: no  Controller design complicated  Fewer economies of scale  High-reliability environments don’t care  Now also software implementations

22 RAID’s Real Benefit  Partly addresses the failure problem  Backup/restore less of an issue  Failed disk “rebuilt” at sector level  Lower performance during rebuild, but system still on-line  Still not perfect  Geographic problems  Failure during rebuild

23 Namespace  Basically, the filesystem hierarchy  Provides a convenient way of accessing things  Files  Devices  Pseudo-“filesystems”  In Unix, a nice, consistent namespace  No “drive names”

24 A Sample File Tree / bin/boot/proc/usr/ home/local/ mariah/vivek/

25 What If You Have Two Disks? / bin/boot/proc/usr/ home/local/ mariah/vivek/

26 As Mariah’s Files Grow? / bin/boot/proc/usr/ home/local/ mariah/vivek/

27 Mount Points / bin/boot/proc/usr/ home/local/ mariah/vivek/

28 Mount Points  Original directories get “hidden”  Traversal is transparent to user  OS keeps track of various disks (devices)  But what happens with big disks?  Partition (split) them into several logical devices – easier to manage, safer, etc  Home directories in one partition, startup- related files/programs in another, etc

29 Paths  Each process has “current directory”  Convenient shorthand  Paths that start with “/” are absolute  Paths without “/” are relative to current directory  Path lookup is potentially expensive  It’s also repetitive  Amenable to caching  Metadata cache from assigned reading

30 Finding Paths  In Unix, directory contains inode #  If two directories contain same #, file is accessible via different paths (and names)  Adding another name into the filespace is called “linking” (via ‘ln’ command)  But the directory is a file  What happens if a directory gets linked?

31 Consider The Following / bin/boot/proc/usr/ home/local/ mariah/vivek/

32 Various Solutions  Only allow “root” to link to directory  Can still be useful  Hopefully root knows when to do it  Limit the number of iterations  Pick some “large” maximum  Terminate traversal after that  Detect loops  Cost? Utility?

33 Does It “Do What You Want”  I create ~vivek/work/cal/now/mtgs  I create a link to it via ~vivek/mtgs  The month advances, and ~vivek/work/cal/now/mtgs becomes ~vivek/cal/Sep01/mtgs  Create new ~vivek/work/cal/now/mtgs  To what does ~vivek/mtgs point?

34 Symbolic Link  Created via “ln –s” command  Dynamically interpreted each use  Does not cause a standard directory entry to target. Instead  Link is a file containing the file/path  May be stored in inode if link is short  Standard looping rules apply