Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University.

Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University

2 Diskgedanken  Assuming you back-up and restore files, what factors affect the time involved?  How are these factors changing?  What issues affect the rates of change?  How is total backup time changing over the years?  What is Occam’s razor?

3 Today’s Overview  Quiz recap  Finish up metadata, reliability  A little discussion of mounting, etc  Move on to performance

4 Quiz 1 Observations  I’m disappointed  Quizzes not yet graded, but…  Most people did poorly on question 1  Lots of dimensional analysis  Lots of sleepers, chatting, weird faces  Very little (too little) feedback in general  Open question – looking for a methodical approach

5 Occam’s Razor  From William of Occam (philosopher)  “entities should not be multiplied unnecessarily”  Often reduced to other statements  “one should not increase, beyond what is necessary, the number of entities required to explain anything”  “Make as few assumptions as possible”  “once you have eliminated all other possible explanations, what remains must be the answer”

6 A Reasonable Approach  Disk size: 40GB (20-80GB common)  File size: 10KB (5-20KB common)  Access time: 10ms (5-20ms common)  Assume 1 seek per file (reasonable)  100 files = 1MB, each access.01 sec  So, 40GB at 1MB/s = 40K sec = 11+ hours

7 Changes Over Time  Disk density doubling each year  Seek time dropping < 10%  File size growing slowly  Results  # of files grows faster than access time reduction  Backup time increases

8 Most Common Answer  Disk size / maximum transfer rate  In other words, read sectors, not files  Can this be done?  Yes, if you have access to “raw” disk  Which means that you have “root” permission  And that the system has raw disk support  Faster than file-based dump/restore  No concept of files, however  What happens if you restore to a disk with a different geometry?

9 Linked Files (Alto)  File header points to 1st block on disk  Each block points to next  Pros  Can grow files dynamically  Free list is similar to a file  Cons  random access: horrible  unreliable: losing a block means losing the rest File header null...

10 Contiguous Allocation  Request in advance for the size of the file  Search bit map or linked list to locate a space  File header  first sector in file  number of sectors  Pros  Fast sequential access  Easy random access  Cons  External fragmentation  Hard to grow files

11 Single-Level Indexed Files or Extent-based Filesystems  A user declares max size  A file header holds an array of pointers to point to disk blocks  Pros  Can grow up to a limit  Random access is fast  Cons  Clumsy to grow beyond limit  Periodic cleanup of new files  Up-front declaration a real pain File header Disk blocks

12 217 File Allocation Table (FAT)  Approach  A section of disk for each partition is reserved  One entry for each block  A file is a linked list of blocks  A directory entry points to the 1st block of the file  Pros  Simple  Cons  Always go to FAT  Wasting space 619 399 foo 217 EOF FAT 0 399 619

13 Multi-Level Indexed Files (Unix)  13 Pointers in a header  10 direct pointers  11: 1-level indirect  12: 2-level indirect  13: 3-level indirect  Pros & Cons  In favor of small files  Can grow  Limit is 16G and lots of seek  What happens to reach block 23, 5, 340? 1 2 data...... 11 12 13 data....................................

14 Reliability In Disk Systems  Make sure certain actions have occurred before function completes  Known as “synchronous” operation  Ex: make sure new inode is on disk & that the directory has been modified before declaring a file creation is complete  Drawback: speed  Some ops easily asynchronous: access time  Some filesystems don’t care: Linux ext2fs

15 Recovery After Failure Need to ensure consistency  Does free bitmap match tree walk?  Do reference counts in inodes match directory entries?  Do blocks appear in multiple inodes? This kind of recovery grows with disk size  Clean shutdown – mark as such, no recovery

16 Reducing Synchronous Times  Write to a faster storage  Nonvolatile memory – expensive, requires some additional OS/firmware support  Write to a special disk or section – logging  Only have to examine log when recovering  Eventually have to put information in place  Some information dies in the log itself  Write in a special order  Write metadata in a way that is consistent but possibly recovers less

17 Challenges  Unix filesystem has great flexibility  Extent-based filesystems have speed  Seeks kill performance – locality  Bitmaps show contiguous free space  Linked lists easy to search  How do you perform backup/restore?

18 Bigger, Faster, Stronger  Making individual disks larger is hard  Throw more disks at the problem  Capacity increases  Effective access speed may increase  Probability of failure also increases  Use some disks to provide redundancy  Generally assume a fail-stop model  Fail-stop versus Byzantine failures

19 RAID ( Redundant Array of Inexpensive Disks )  Main idea  Store the error correcting codes on other disks  General error correcting codes are too powerful  Use XORs or single parity  Upon any failure, one can recover the entire block from the spare disk (or any disk) using XORs  Pros  Reliability  High bandwidth  Cons  The controller is complex RAID controller XOR

20 Synopsis of RAID Levels RAID Level 0: Non redundant (JBOD) RAID Level 1: Mirroring RAID Level 2: Byte-interleaved, ECC RAID Level 3: Byte-interleaved, parity RAID Level 4: Block-interleaved, parity RAID Level 5: Block-interleaved, distributed parity

21 Did RAID Work?  Performance: yes  Reliability: yes  Cost: no  Controller design complicated  Fewer economies of scale  High-reliability environments don’t care  Now also software implementations

22 RAID’s Real Benefit  Partly addresses the failure problem  Backup/restore less of an issue  Failed disk “rebuilt” at sector level  Lower performance during rebuild, but system still on-line  Still not perfect  Geographic problems  Failure during rebuild

23 Namespace  Basically, the filesystem hierarchy  Provides a convenient way of accessing things  Files  Devices  Pseudo-“filesystems”  In Unix, a nice, consistent namespace  No “drive names”

24 A Sample File Tree / bin/boot/proc/usr/ home/local/ mariah/vivek/

25 What If You Have Two Disks? / bin/boot/proc/usr/ home/local/ mariah/vivek/

26 As Mariah’s Files Grow? / bin/boot/proc/usr/ home/local/ mariah/vivek/

27 Mount Points / bin/boot/proc/usr/ home/local/ mariah/vivek/

28 Mount Points  Original directories get “hidden”  Traversal is transparent to user  OS keeps track of various disks (devices)  But what happens with big disks?  Partition (split) them into several logical devices – easier to manage, safer, etc  Home directories in one partition, startup- related files/programs in another, etc

29 Paths  Each process has “current directory”  Convenient shorthand  Paths that start with “/” are absolute  Paths without “/” are relative to current directory  Path lookup is potentially expensive  It’s also repetitive  Amenable to caching  Metadata cache from assigned reading

30 Finding Paths  In Unix, directory contains inode #  If two directories contain same #, file is accessible via different paths (and names)  Adding another name into the filespace is called “linking” (via ‘ln’ command)  But the directory is a file  What happens if a directory gets linked?

31 Consider The Following / bin/boot/proc/usr/ home/local/ mariah/vivek/

32 Various Solutions  Only allow “root” to link to directory  Can still be useful  Hopefully root knows when to do it  Limit the number of iterations  Pick some “large” maximum  Terminate traversal after that  Detect loops  Cost? Utility?

33 Does It “Do What You Want”  I create ~vivek/work/cal/now/mtgs  I create a link to it via ~vivek/mtgs  The month advances, and ~vivek/work/cal/now/mtgs becomes ~vivek/cal/Sep01/mtgs  Create new ~vivek/work/cal/now/mtgs  To what does ~vivek/mtgs point?

34 Symbolic Link  Created via “ln –s” command  Dynamically interpreted each use  Does not cause a standard directory entry to target. Instead  Link is a file containing the file/path  May be stored in inode if link is short  Standard looping rules apply

Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University.

Similar presentations

Presentation on theme: "Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University.

Similar presentations

Presentation on theme: "Filesystems – Metadata, Paths, & Caching Vivek Pai Princeton University."— Presentation transcript:

Similar presentations

About project

Feedback