Presentation on theme: "File-System Implementation"— Presentation transcript:
1 File-System Implementation Operating System Conceptschapter 11CS 355Operating SystemsDr. Matthew Wright
2 File-System Structure Two design problems:How should the file system look to the user?How are logical files mapped onto the physical storage devices?Physical storage devices are usually magnetic disks, which are useful for storing multiple files because:It is possible to read from a disk and write back into the same place that was just readA disk can directly access any location on the disk (just move the read-write heads and wait for the disk to rotate).Transfers between memory and disk are performed in units of blocks.
3 File Systems—Examples YearMax File SizeMax Volume SizeUNIX/POSIX permissionsLinksPrimary OS or useFAT3219964 GB2 – 16 TBnoWindowsNTFS 3.1200116 TB16 EBsimilarhard & softext3199916 GB – 2 TByesUNIXext420061 EBHFS Plus19988 EBMac OSISO 966019884 GB – 8 TBoptical discsZFS2004Sungeneral characteristics of some common file systems(particular characteristics depend on implementation)
4 Layers of File System Abstraction Application ProgramsLogical File System: manages metadata (all of the file-system structure except the actual contents of files) and directory structureFile-Organization Module: translates logical addresses to physical addresses for the basic file systemBasic File System: able to issue generic commands to device drivers, manages memory buffers and cachesI/O Control: device drivers and interrupt handlers to transfer information between main memory and disk systemStorage Devices
5 File-System Implementation Implementing a file system requires various memory structures.On disk, a file system contains:Boot control block contains info needed by system to boot OS from that volumeVolume control block contains volume details, such as number of blocks, size of blocks, and count of free blocksDirectory structure organizes the filesPer-file File Control Block (FCB) contains many details about the file; known as an inode on most UNIX systemsIn-memory data structuresMount table contains information about each mounted volumeDirectory cache holds information about recently-accessed directoriesSystem-wide open file table contains a copy of the FCB for each open filePer-process open-file table contains a pointer to the entry in the system-wide open file table for each file the process has openBuffers hold file-system blocks while they are being read from or written to disk
6 File Open A file must be opened via the open() system call. If the file is already in the system-wide open file table, an entry is created in the per-process open file table pointing to the entry in the system-wide table.Otherwise, the directory is searched for the given file, and the FCB is copied into the system-wide open file table and an entry is created in the per-process open file table.open() returns a pointer to the entry in the per-process open file table (a file descriptor in UNIX; a file handle in Windows).
7 File Close A file is closed via the close() system call. The entry in the per-process open file table is removed.The open count in the system-wide entry is decremented.If all users have closed the file, then the entry in the system-wide open file table is removed.
8 Partitions and Mounting Raw disk is used where no file system is appropriate.UNIX swap space uses a raw partition, and it uses its own format for what it stores in this space.Some databases uses raw disk and format the data to suit their needs.Boot information can be stored in its own partition, usually as a series of blocks that are loaded into memory at startup.Execution starts at a predefined location, where the boot loader is stored.The boot loader knows enough about the file-system structure to find and load the kernel, and start it running.In a dual-boot system, the boot loader can also give the user a choice of which operating system to boot.The root partition, containing the kernel, is mounted at boot time.Other volumes can be mounted at boot time or later, and the OS determines the file system on each one.
9 Virtual File SystemsHow does an operating system allow multiple types of file systems to be integrated into a directory structure?Virtual file system (VFS) uses object-oriented techniques to simplify and modularize the implementation.VFS allows the same system call interface (the API) to be used for different types of file systems.VFS allows files to be uniquely represented throughout a network.
10 Directory Implementation Linear list of file names with pointer to the data blocks.Simple to programTime-consuming to execute, since sequential searches are slowHash Table: use a hash function to return a pointer to each file in the linear listFaster than linear list for directory searchesMust make provision for collisions (situations where two file names hash to the same location)Hash table is usually of some fixed size
11 Allocation Methods How do we allocate space for files on a disk? We desire:To use disk space efficientlyTo be able to access files quicklyThree common methods:Contiguous allocationLinked allocationIndexed allocation
12 Contiguous Allocation Each file occupies a set of contiguous blocks on the diskAdvantages:Simple to read or write a fileDirectory is simpleProvides random access within a fileDisadvantages:Must find contiguous space for each new fileDynamic storage-allocation problem: use best-fit or worst-fit algorithms to fit files in holesExternal fragmentation (defragmentation is costly)Files cannot grow easilyOne solution is to use extents: extra blocks that may be appended to a file and not stored contiguously with the original file
13 Linked AllocationEach file is a linked list of disk blocks, which may be scattered on the diskDirectory contains a pointer to the first and last blocks, and each block contains a pointer to the next blockAdvantages:No external fragmentationEasy to expand the size of a fileDisadvantages:Not suitable for random access within a filePointers take up some disk spaceDifficult to recover a file if a pointer is lost or damagedBlocks may be collected into clusters of several blocksFewer pointers are necessaryFewer disk seeks to read an entire fileGreater internal fragmentation
14 File-Allocation Table (FAT) File-allocation table (FAT) is a variant of linked allocation used in MS-DOS and OS/2.A section of disk at the beginning of each volume contains the allocation table.Table has one entry per disk block, indexed by block number.Each entry contains the block number of the next block in the file.The table can be used to quickly find a block within a file, improving random access time within files.If the FAT is not cached, the disk heads must move frequently between files and the FAT.Best performance requires that the entire FAT be in memory.FAT is efficient for small disks, but not for large disks because the table itself grows very large.
16 Indexed AllocationIndexed allocation brings all pointers together into one location called the index block.Each file has its own index block, which is an array of disk-block addresses (address i is the address of the ith block of the file).Advantages:Supports direct accessNo external fragmentationDoes not require keeping a large FAT in memoryDisadvantages:Wasted space within index blocksData blocks may be spread all over the volume, resulting in many read/write head movements
18 Indexed Allocation How big should the index blocks be? A small block cannot contain enough pointers for a large file.A large block wastes space with each small file.Linked scheme: for large files, link together several index blocksMultilevel index: a first-level index block points to a set of second-level index blocks, which contain pointers to file blocks.Combined scheme: Suppose we can store 15 pointers of the index block are stored in the FCB (or inode, on UNIX).First 12 of these are pointers to direct blocks (that contain file data)Next 3 are pointers to indirect blocks (that contain pointers)First points to a single indirect blockSecond points to a double indirect blockThird points to a triple indirect blockThis allows very large file sizes (UNIX implementations of this scheme support files that are terabytes in size).
20 Practice (11.1)Consider a file currently consisting of 100 blocks. Assume that the file-control block (and the index block, in the case of indexed allocation) is already in memory. Calculate how many disk I/O operations are required for contiguous, linked, and indexed (single-level) allocation strategies, if, for one block, the following conditions hold. In the contiguous-allocation case, assume that there is no room to grow at the beginning but there is room to grow at the end. Also assume that the block information to be added is stored in memory.The block is added at the beginning.The block is added in the middle.The block is added at the end.The block is removed from the beginning.The block is removed from the middle.The block is removed from the end.
21 Practice (11.15)Consider a file system that uses inodes to represent files. Disk blocks are 8-KB in size and a pointer to a disk block requires 4 bytes. This file system has 12 direct disk blocks, plus single, double, and triple indirect disk blocks. What is the maximum size of a file that can be stored in this file system?
22 Free-Space Management System maintains a free-space list of free disk blocks.The free-space list can be stored in various waysBit Vector: free-space list is often implemented as a bit vector (or bit map)Each block is represented by one bit.If the block is free, the bit is a 1; otherwise the bit is a 0.Example: indicates that blocks 3, 6, 8, 11, 13, and 15 are free, and the others are occupiedAdvantage: simplicityDisadvantage: Fast access requires that the bit vector be kept in memory, which could consume many megabytes of memory (32 MB bit map for a 1 TB disk with 4 KB blocks).
23 Free-Space Management Linked List: free-space list could be stored as a linked listKeep the pointer to the first free block in a special memory locationEach free block contains a pointer to the next free blockAdvantage: free-space list stored in free spaceDisadvantage: finding multiple free blocks is slowGrouping: pointers to free blocks can be grouped in blocksIf block size is n, then use one block to store the addresses of n – 1 free blocks, followed by the address of the next block of free-block addresses.Advantage: Addresses of a large number of free blocks can be found quicklyDisadvantage: sequential file data will often be stored in noncontiguous blocks.
24 Free-Space Management Counting:Often, several contiguous blocks are freed simultaneously.We can store the address of a free block followed by the number of consecutive blocks that are free.Advantage: Less fragmentation of files; free-space list is shorterDisadvantage: each free-space entry requires more space; inefficient for fragmented disks.Space Maps:Sun’s ZFS file system was designed for huge numbers of large files and directories.ZFS creates metaslabs to divide disk space into manageable chunks.Each metaslap has a space map—a log of all block activity.The log indicates which blocks are free.
25 Synchronous vs. Asynchronous Writes Most disks include a cache that stores information read from or to be written to the disk.Synchronous writesAre not cachedAre written in the order that they are receivedThe calling routine waits for the data to reach diskUseful for database writes and other atomic transactionsAsynchronous writesAre cachedAre written in any orderThe calling routine does not waitAre most common
26 Recovering from System Crashes A system crash that occurs when data is being written to disk (or in cache waiting to be written to disk) can cause inconsistencies among directory structures and FCBs.Consistency checkingThe system can scan the metadata on the file system to check the consistency of the system.Inconsistencies may or may not be reparable.Consistency checking can take a very long time.Many modern file systems are log-based transaction-oriented (or journaling) file systemsAll metadata changes are written in a log.A transaction is considered committed once it is in the log, and the user process may continue executing.After the transaction is (asynchronously) carried out on disk, the log entry is deleted.If the system crashes, log indicates transactions to be performed.
27 Recovering from System Crashes Network Appliance’s WAFL file system and Sun’s ZFS file system use a different alternative to consistency checking:The system never overwrites old blocks with new data.All transactions are written to new blocks.When the writes are complete, the metadata structures pointing to the old blocks are updated to point to new blocks.The old blocks are then made available for re-use.ZFS also provides check-summing of all metadata and data blocks.ZFS has no consistency checker.Since all devices eventually fail, backing up files to other storage media is essential for preserving data.
28 Example: NFSThe Sun Network File System (NFS) is an implementation and a specification of a file system for accessing remote files across LANs.Allows remote directories to be mounted over local directoriesThe mount request requires the hostname and directory name for the remote machine.Mounting is subject to access-rights control.Once mounted, a remote directory integrates seamlessly into the local file system and directory structure.NFS protocol provides remote procedure calls (RPCs) for remote file and directory operations.
29 Example: NFS Three independent file systems on different machines: Cascading mounts:effect of mounting S2:/usr/dir2over U:/usr/local/dir1Effect of mounting S1:/usr/shared over U:/usr/local
30 Example: WAFLNetwork Appliance’s WAFL file system is optimized for random writes on network file servers.WAFL: Write-Anywhere File LayoutServes files to clients via NFS, CIFS, ftp, and httpFile system:Block-based, with inodes to describe filesAll metadata is stored in files: inodes, free-block map, etc.New data is written to new blocks.Writes are fast, sincethey can occur at thefree block nearest tothe read-write heads.
31 Example: WAFLWAFL can easily take a snapshot of the system at any time:A snapshot involves copying the root inode.As new data is written, the root inode is updated.Since blocks are not overwritten, the copy of the root inode preserves the system at the time the copy was made.What should WAFL do when the disk fills up?