Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Systems Directories Revisited Shared Files

Similar presentations


Presentation on theme: "File Systems Directories Revisited Shared Files"— Presentation transcript:

1 File Systems Directories Revisited Shared Files
Buffer Cache and File System Consistency

2 Disk Map in Unix 1 2 3 4 5 6 7 8 9 10 11 12 The purpose of the disk-map portion of the inode is to represent where the blocks of a file are on disk. I.e., it maps block numbers relative to the beginning of a file into block numbers relative to the beginning of the file system. Each block is 1024 (1K) bytes long. (It was 512 bytes long in the original Unix file system.) The data structure allows fast access when a file is accessed sequentially, and, with the help of caching, reasonably fast access when the file is used for paging (and other “random” access). The disk map consists of 13 pointers to disk blocks, the first 10 of which point to the first 10 blocks of the file. Thus the first 10Kb of a file are accessed directly. If the file is larger than 10Kb, then pointer number 10 points to a disk block called an indirect block. This block contains up to 256 (4-byte) pointers to data blocks (i.e., 256KB of data). If the file is bigger than this (256K +10K = 266K), then pointer number 11 points to a double indirect block containing 256 pointers to indirect blocks, each of which contains 256 pointers to data blocks (64MB of data). If the file is bigger than this (64MB + 256KB + 10KB), then pointer number 12 points to a triple indirect block containing up to 256 pointers to double indirect blocks, each of which contains up to 256 pointers pointing to single indirect blocks, each of which contains up to 256 pointers pointing to data blocks (potentially 16GB, although the real limit is 2GB, since the file size, a signed number of bytes, must fit in a 32-bit word). This data structure allows the efficient representation of sparse files, i.e., files whose content is mainly zeros. Consider, for example, the effect of creating an empty file and then writing one byte at location 2,000,000,000. Only four disk blocks are allocated to represent this file: a triple indirect block, a double indirect block, a single indirect block, and a data block. All pointers in the disk map, except for the last one, are zero. All bytes up to the last one read as zero. This is because a zero pointer is treated as if it points to a block containing all zeros: a zero pointer to an indirect block is treated as if it pointed to an indirect block filled with zero pointers, each of which is treated as if it pointed to a data block filled with zeros. However, one must be careful about copying such a file, since commands such as cp and tar actually attempt to write all the zero blocks! Copyright © 2002 Thomas W. Doeppner. All rights reserved.

3 Additional Enhancements
Performance depends on: How many disk accesses are needed to read a file? Store some data in the inode itself Perhaps the whole file will fit in! Need only 1 disk access for a small file Increase block size

4 Questions What is an inode? Does a directory have an inode?
What’s contained in a directory?

5 Unix Directory (V7) Directories are files whose data is a list of filenames & inodes filename (14 bytes) inode number (2 bytes) . 12 .. 14 etc 134 mail 346 crash 5 init 175 mount 586 Example inode Owner snt Group cpre308 Type regular file Perms rwxr-xr-x Accessed oct pm Modified …. Inode modified … Size bytes Disk addresses Max filename size = 14 chars

6 The steps in looking up /usr/ast/mbox
The UNIX V7 File System The steps in looking up /usr/ast/mbox

7 Sharing of Files In UNIX: ln src dest Two ways of linking files
“hard” links Symbolic links

8 Hard links Both files point to the same inode links=2
ln /home/guan/f1 /home/guan/f2 Inode 134 links=2 . 12 .. 14 f1 134 f2

9 Hard Links (a) Situation prior to linking
(b) After the link is created (c) After the original owner removes the file

10 Symbolic links Files point to different inodes
ln –s /home/guan/f1 /home/guan/f2 inode 134 . 12 .. 14 f1 134 f2 208 special file data /home/guan/f1

11 Performance of File System
Where does your data go after a write() system call? Where does the data come for a read()? Think about performance

12 Speeding up file operations
Cache Block read ahead Reduce disk-head motion

13 The Buffer Cache User Process Buffer Cache Buffer
File I/O in Unix is not done directly to the disk drive, but through an intermediary, the buffer cache. The buffer cache has two primary functions. The first, and most important, is to make possible concurrent I/O and computation within a Unix process. The second is to insulate the user from physical block boundaries. From a user thread’s point of view, I/O is synchronous. By this we mean that when the I/O system call returns, the system no longer needs the user-supplied buffer. For example, after a write system call, the data in the user buffer has either been transmitted to the device or copied to a kernel buffer—the user can now scribble over the buffer without affecting the data transfer. Because of this synchronization, from a user thread’s point of view, no more than one I/O operation can be in progress at a time. Thus user-implemented multibuffered I/O is not possible (in a single-threaded process). The buffer cache provides a kernel implementation of multibuffering I/O, and thus concurrent I/O and computation are possible even for single-threaded processes. Buffer Cache Copyright © 2002 Thomas W. Doeppner. All rights reserved.

14 Buffer Cache Read(block) See if block present in buffer cache
If yes, then return buffer Initiate disk read for the block Sleep till read is complete Return buffer

15 Read Ahead Process i-1 i i+1 read( … ) previous block current block
The use of read-aheads and write-behinds makes possible concurrent I/O and computation: if the block currently being fetched is block i and the previous block fetched was block i-1, then block i+1 is also fetched. Modified blocks are normally written out not synchronously but instead sometime after they were modified, asynchronously. previous block current block probable next block Copyright © 2002 Thomas W. Doeppner. All rights reserved.

16 Buffer Cache – Write Write(block) {Assume block in cache}
(Usually) Write to cache and return; the write to disk is done later (write-back cache) (Sometimes) Write to cache, schedule a write to disk and return (write-through cache) (Exceptional cases) Write to cache, do a synchronous (blocking) write to disk, and return

17 Write Write-back more efficient than write-through
A disk crash might cause a more serious problem with write-back What happens when The system is turned off without a shutdown A floppy is removed from the drive without unmounting System to the rescue: Every 30 seconds or so, a sync is done, writing all cache contents to disk

18 Structure of Cache Memory allocated by the system Lookup:
hash tables Page Replacement: LRU Keep a list sorted according to time of use

19 Structure of the Cache

20 File-System Consistency (1)
2 New Node 3 New Node In the event of a crash, the contents of the file system may well be inconsistent with any view of it the user might have. For example, a programmer may have carefully added a node to the end of the list, so that at all times the list structure is well-formed. Copyright © 2002 Thomas W. Doeppner. All rights reserved.

21 File-System Consistency (2)
1 2 New Node Not on disk 3 New Node Not on disk 4 CRASH!!! 5 But, if the new node and the old node are stored on separate disk blocks, the modifications to the block containing the old node might be written out first; the system might well crash before the second block is written out. Copyright © 2002 Thomas W. Doeppner. All rights reserved.

22 Keeping It Consistent 2) Then write this asynchronously
New Node 1) Write this synchronously To deal with this problem, one must make certain that the target of a pointer is safely on disk before the pointer is set to point to it. This is done for certain system data structures (e.g., directory entries, inodes, indirect blocks, etc.). No such synchronization is done for user data structures: not enough is known about the semantics of user operations to make this possible. However, a user process called update executes a sync system call every 30 seconds, which initiates the writing out to disk of all dirty buffers. Alternatively, the user can open a file with the synchronous option so that all writes are waited for; i.e, the buffer cache acts as a write-through cache (N.B.: this is expensive!). Copyright © 2002 Thomas W. Doeppner. All rights reserved.

23 File Systems do Crash Bring to a consistent state using “fsck” on Unix
Make sure every disk block in exactly one file or on the free list Go through all directories, and count the number of links per file – check for inconsistencies Might prompt the user before taking action

24 Log-Structured File Systems
If there’s lots of caching, then most operations to the file system are writes Writes are quickest when there is no need to do a seek Thus: perform writes wherever the disk head happens to be Log-structured file systems attempt to maximize disk performance by minimizing the non-sequential use of disks. As much as possible, reads from disks are avoided and writes are done at the current head position. This would seem to be a rather tall order, but it works out fairly well in practice. To avoid reads, a very large amount of buffer space is allocated for the file system. The theory here is that most disk reads are of recently written data. Doing writes at the current head position means that whenever a file is updated, rather than modifying the original block (which would involve a seek), the new version of the block is written at the end of the log, i.e., a large region of free space on the disk. As usual, a file’s blocks are represented via a map. These maps are written to the log periodically, but are also kept in primary storage (or, at least, the maps for the most recently used files are kept in primary storage); thus it is rarely necessary to go to disk to fetch a map. There are, of course, a number of issues that need to be addressed in order to make this work. For further information, see “The Design and Implementation of a Log-Structured File System,” by Mendel Rosenblum and John Ousterhout, Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, October, 1991. File 1 File 2 Rosenblum and Ousterhout, Berkeley 1991 Copyright © 2002 Thomas W. Doeppner. All rights reserved.


Download ppt "File Systems Directories Revisited Shared Files"

Similar presentations


Ads by Google