Chapter 13: File-System Interface

Chapter 13: File-System Interface

Chapter 13: File-System Interface
File Concept Access Methods Disk and Directory Structure File-System Mounting File Sharing Protection

Objectives To explain the function of file systems
To describe the interfaces to file systems To discuss file-system design tradeoffs, including access methods, file sharing, file locking, and directory structures To explore file-system protection

File System Components
Physical Reality File System Abstraction Block oriented Byte oriented Physical sector #’s Named files No protection Users protected from each other Data might be corrupted Robust to machine failures if machine crashes Disk management: how to arrange collection of disk blocks into files Naming: user gives file name, not track 50, platter 5, etc. Protection: keep information secure Reliability/durability: when system crashes, lose stuff in memory, but want files to be durable.

User vs. System View of a File
User’s view: Durable data structures – executable com file (static data region, relocation table, code, etc.) Memory-mapped files: operations – read/write to mem Serialization (also pointer swizzling, marshalling) Systems’ view (system call interface): Collection of bytes (UNIX) System’s view (inside OS): Collection of blocks a block is a logical transfer unit, while a sector is the physical transfer unit. Block size >= sector size; in UNIX, block size is 4KB.

Translating from user to system view
What happens if user says: give me bytes 2 – 12? a. Fetch block corresponding to those bytes b. Return just the correct portion of the block What about: write bytes 2 – 12? a. Fetch block b. Modify portion c. Write out block Everything inside file system is in whole size blocks. For example, getc, putc => buffers 4096 bytes, even if interface is one byte at a time. From now on, file is collection of blocks.

File Concept A file is a named collection of related information that is recorded on secondary storage Files represent: Data – numeric, alphabetic, alphanumeric, binary Programs (source and object) Files may be free form, such as text files, or may be formatted rigidly. In general, a file is a sequence of bits, bytes, lines, or records Contents defined by file’s creator, different types of information stored in a file: Numeric, text, source code, executable code, photos, music, videos, etc. A file has a certain defined structure, which depends on its type. A text file is a sequence of characters organized into lines (and pages). A source file is a sequence of functions, each of which is further organized as declarations followed by executable statements. An executable file is a series of code sections that the loader can bring into memory and execute.

File Attributes Name – symbolic file name is the only information kept in human-readable form Identifier – unique tag (number) identifies file within file system; it is the non-human-readable name for the file. Type – needed for systems that support different types Location – pointer to file location on device Size – current file size (in bytes, words, or blocks), possibly maximum size Protection – access-control information determines who can do reading, writing, executing Timestamps and user identification – information kept for creation, last modification, and last use, useful for protection, security, and usage monitoring Many variations, including extended file attributes including character encoding of the file and security features such as file checksum Information about files are kept in the directory structure, which is maintained on the disk Directory entry consists of the file's name and its unique identifier, the identifier in turn locates the other file attributes

File info Window on Mac OS X

File Operations File is an abstract data type
Create - Two steps, first, space in the file system for the file. Second, an entry for the new file in a directory. Open - all operations except create and delete require a file open(), returns a file handle that is used as an argument in the other calls Write – a system call specifying both the open file handle and the information to be written to the file at write pointer location Read – a system call that specifies the file handle and where (in memory) the next block of the file at read pointer location Current-file-position pointer per-process for the current operation location for both read and write to reduce space and system complexity Reposition within file - The current-file-position pointer of the open file is repositioned to a given value, also called a seek Delete – search the directory and release all file space. For hard links—multiple names (directory entries) for the same file, actual file contents not deleted until the last link is deleted Truncate - erase the contents of a file but keep its attributes, reset length to 0, release file space

Open Files Several pieces of data are needed to manage open files:
Open-file table: tracks open files When a file operation is requested, the file is specified via an index into this table, so no searching of the directory is required Not in active use, closed by process, OS deletes entry, releasing lock The open() operation takes a file name and searches the directory, copying the directory entry into the open-file table. The open() call can also accept access-mode information—create, read-only, read–write, append-only, and so on. This mode is checked against the file's permissions, if allowed, the file is opened for the process. The open() system call returns a pointer to the entry in the open-file table, this pointer, not the actual file name, is used in all I/O operations, avoiding any further searching and simplifying the system-call interface.

Open Files The OS uses two levels of internal tables: a per-process table and a system-wide table. The per-process table tracks all files that a process has open, and contains the current file pointer for each file, access rights, and accounting information Each entry in the per-process table points to a system-wide open-file table, which contains process-independent information, such as the location of the file on disk, access dates, and file size. File-open count: number of processes having the file open Each close() decreases this open count, and when the open count reaches zero, the file is no longer in use, and the file's entry is removed from the open-file table.

Open Files Information associated with an open file:
File pointer - the system must track the last read–write location as a current-file-position pointer, unique to each process, kept separate from the on-disk file attributes. File-open count - tracks the number of opens and closes and reaches zero on the last close. The system can then remove the entry to reuse space. Location of the file - information needed to locate the file (mass storage, file server, RAM drive) is kept in memory so that the system does not have to read it from the directory structure for each operation. Access rights - each process opens a file in an access mode, this information is stored on the per-process table so the operating system can allow or deny subsequent I/O requests.

Open File Locking Provided by some operating systems and file systems
File locks allow one process to lock a file and prevent other processes from gaining access to it, useful for shared files Similar to reader-writer locks Shared lock similar to reader lock – several processes can acquire concurrently Exclusive lock similar to writer lock; only one process at a time can acquire such a lock Mandatory or advisory file-locking mechanisms: Mandatory – once a process acquires an exclusive lock, the operating system will prevent any other process from accessing the locked file Advisory – processes can find status of locks and decide what to do up to software developers to ensure that locks are appropriately acquired and released. Windows operating systems adopt mandatory locking, and UNIX systems employ advisory locks

File Locking Example – Java API
import java.io.*; import java.nio.channels.*; public class LockingExample { public static final boolean EXCLUSIVE = false; public static final boolean SHARED = true; public static void main(String arsg[]) throws IOException { FileLock sharedLock = null; FileLock exclusiveLock = null; try { RandomAccessFile raf = new RandomAccessFile("file.txt", "rw"); // get the channel for the file FileChannel ch = raf.getChannel(); // this locks the first half of the file - exclusive exclusiveLock = ch.lock(0, raf.length()/2, EXCLUSIVE); /** Now modify the data */ // release the lock exclusiveLock.release();

File Locking Example – Java API (Cont.)
// this locks the second half of the file - shared sharedLock = ch.lock(raf.length()/2+1, raf.length(), SHARED); /** Now read the data */ // release the lock sharedLock.release(); } catch (java.io.IOException ioe) { System.err.println(ioe); }finally { if (exclusiveLock != null) exclusiveLock.release(); if (sharedLock != null) }

File Types – Name, Extension
A common technique for implementing file types is to include the type as part of the file name. The name is split into two parts—a name and an extension, usually separated by a period

File Structure None - sequence of words, bytes Simple record structure
Lines Fixed length Variable length Complex Structures Formatted document Relocatable load file Can simulate last two with first method by inserting appropriate control characters Who decides: Operating system Program Some operating systems impose (and support) a minimal number of file structures, adopted in UNIX, Windows, and others. UNIX considers each file to be a sequence of 8-bit bytes; no interpretation of these bits is made by the operating system.

Sequential-access File
Information in the file is processed in order, one record after the other. This mode of access is by far the most common; for example, editors and compilers usually access files in this fashion. read_next() — reads the next portion of the file and automatically advances a file pointer, which tracks the I/O location. write_next() — appends to the end of the file and advances to the end of the newly written material (the new end of file). A file can be reset to the beginning, and on some systems, a program may be able to skip forward or backward n records for some integer n, perhaps only for n = 1.

Direct Access (Logical access)
A file is made up of fixed-length logical records that allow programs to read and write records rapidly in no particular order (Disks), e.g. databases read(n), where n is the block number, and write(n) An alternative approach is to retain read_next() and write_next() and to add an operation position_file(n) where n is the block number. Then, to effect a read(n), we would position_file(n) and then read_next() or write_next(). rewrite(n) n = relative block number Relative block numbers allow OS to decide where file should be placed

Simulation of Sequential Access on Direct-access File

Other Access Methods Can be built on top of a Direct Access method
Generally involve creation of an index for the file Keep index in memory for fast determination of location of data to be operated on consider UPC code plus record of data about that item If too large, index (in memory) of the index (on disk) IBM indexed sequential-access method (ISAM) Small master index, points to disk blocks of secondary index File kept sorted on a defined key To find a particular item, we first make a binary search of the master index, which provides the block number of the secondary index. This block is read in, and again a binary search is used to find the block containing the desired record. Finally, this block is searched sequentially. VMS operating system provides index and relative files as another example (see next slide)

Example of Index and Relative Files

File Usage Patterns Most files are small (for example, .login, .c files) Large files use up most of the disk space Large files account for most of the bytes transferred to/from disk Bad news: need everything to be efficient. Need small files to be efficient, since lots of them. Need large files to be efficient, since most of the disk space, most of the I/O due to them

Directory Structure A collection of nodes containing information about all files Directory Files F 1 F 2 F 3 F 4 F n Both the directory structure and the files reside on disk

Disk Structure Disk can be subdivided into partitions
Disks or partitions can be RAID protected against failure Disk or partition can be used raw – without a file system, or formatted with a file system Partitions also known as minidisks, slices Entity containing file system known as a volume Each volume containing file system also tracks that file system’s info in device directory or volume table of contents As well as general-purpose file systems there are many special-purpose file systems, frequently all within the same operating system or computer

Directory Structure The directory can be viewed as a symbol table that translates file names into their file control blocks. The directory organization must allow to insert entries, delete entries, search for a named entry, and list all the entries in the directory. Operations on a directory: Search for a file Create a file Delete a file: a delete leaves a hole in the directory structure and the file system may have a method to defragment the directory structure. List a directory Rename a file Traverse the file system

Directory Organization
The directory is organized logically to obtain : Efficiency – locating a file quickly Naming – convenient to users Two users can have same name for different files The same file can have several different names Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …)

Single-Level Directory
A single directory for all users Limitations Large number of files and multiple users Need unique names Grouping problem

Two-Level Directory Separate directory for each user
Each user has his own user file directory (UFD). At login, the system's master file directory (MFD) is searched. The MFD is indexed by user name or account number, and each entry points to the UFD for that user Can have the same file name for different users, but unique within user’s UFD Path name - Specifying a user name and a file name defines a path in the tree from the root (the MFD) to a leaf (the specified file). search path: sequence of directories searched when a file is named No grouping capability

Tree-Structured Directories

Tree-Structured Directories (Cont.)
In many implementations, a directory is simply another file, but it is treated in a special way. All directories have the same internal format. One bit in each directory entry defines the entry as a file (0) or as a subdirectory (1). Special system calls are used to create and delete directories. In this case the operating system (or the file system code) implements another file format, that of a directory. Efficient searching and grouping Capability Current directory (working directory) cd /spell/mail/prog type list When reference is made to a file, the current directory is searched. To change directories, a system call takes a directory name as a parameter and uses it to redefine the current directory. Other systems leave it to the application (say, a shell) to track and operate on a current directory, as each process could have different current directories.

Tree-Structured Directories (Cont)
Absolute or relative path name Absolute path name begins at the root (initial “/”) and follows a path down to the specified file, giving the directory names on the path. Relative path name defines a path from the current directory. If the current directory is /spell/mail, then the relative path name prt/first refers to the absolute path name /spell/mail/prt/first. Creating a new file is done in current directory

Tree-Structured Directories (Cont)
Policy decision on how to handle the deletion of a directory. If a directory is empty, its entry in the directory that contains it can simply be deleted. If the directory to be deleted is not empty, contains several files or subdirectories Some systems will not delete a directory unless it is empty. This approach can result in a substantial amount of work. An alternative approach, such as that taken by the UNIX rm command, is to provide an option: when a request is made to delete a directory, all that directory's files and subdirectories are also to be deleted. Delete a file rm <file-name> Creating a new subdirectory is done in current directory mkdir <dir-name> Example: if in current directory /mail mkdir count Deleting “mail”  deleting the entire subtree rooted by “mail”

Acyclic-Graph Directories
Have shared subdirectories and files

Acyclic-Graph Directories (Cont.)
A shared file (or directory) is not the same as two copies of the file. With two copies, each programmer can view the copy rather than the original, but if one programmer changes the file, the changes will not appear in the other's copy. With a shared file, only one actual file exists, so any changes made by one person are immediately visible to the other. Sharing is particularly important for subdirectories; a new file created by one person will automatically appear in all the shared subdirectories. Implementing shared files and subdirectories #1: create a new directory entry called a link. A link is effectively a pointer to another file or subdirectory A link may be implemented as an absolute or a relative path name. On a reference to a file, search the directory. In case of a link, the name of the real file is included in the link information. Resolve the link by using that path name to locate the real file. Links are effectively indirect pointers.

Implementing shared files and subdirectories (cont.) #2: duplicate all information about the shared files in both sharing directories. Thus, both entries are identical and equal. A major problem with duplicate directory entries is maintaining consistency when a file is modified. Problems in Acyclic-graph directories #1: A file may have multiple absolute path names; consequently, distinct file names may refer to the same file (aliasing). Traversing entire file system to gather statistics of all files might lead to traversing shared structured more than once #2: file deletion - when can the space allocated to a shared file be deallocated and reused? Remove the file whenever anyone deletes it; may leave dangling pointers to the now-nonexistent file, e.g. If dict deletes list  dangling pointer If the remaining file pointers contain actual disk addresses, and the space is subsequently reused for other files, these dangling pointers may point into the middle of other files.

Problems in Acyclic-graph directories (cont.) Symbolic links for implementing sharing Only the link is deleted If the file entry itself is deleted, the space for the file is deallocated, leaving the links dangling. search for these links and remove them as well, but this search can be expensive. Alternatively, leave the links until an attempt is made to use them, when it can be determined the file does not exist and fails to resolve the link name In UNIX and Windows, symbolic links are left when a file is deleted, and it is up to the user to realize that the original file is gone or has been replaced. Another approach to deletion is to preserve the file until all references to it are deleted. keep a count of the number of references; when the count is 0, the file can be deleted UNIX uses this approach for non-symbolic links (hard links)

General Graph Directory

General Graph Directory (Cont.)
Ensure acyclic-graph structure does not have cycles Adding new files and subdirectories to an existing tree-structured directory preserves the tree-structured nature. However, adding links destroys the tree structure, resulting in a simple graph structure Cycles could result in infinite loop continually searching through the cycle limit arbitrarily the number of directories that will be accessed during a search File deletion - when cycles exist, the reference count may not be 0 even when it is no longer possible to refer to a directory or file because of self referencing

General Graph Directory (Cont.)
How do we guarantee no cycles? Garbage collection - determine when the last reference has been deleted and the disk space can be reallocated Involves traversing the entire file system, marking everything that can be accessed. Then, a second pass collects everything that is not marked onto a list of free space. Extremely expensive for disk-based file systems; seldom attempted Every time a new link is added use a cycle detection algorithm to determine whether it is OK Computationally expensive on disk-based file system A simpler algorithm in the special case of directories and links is to bypass links during directory traversal. Cycles are avoided, and no extra overhead is incurred.

Protection Laptop - user name and password authentication to access it, encrypting the secondary storage and firewalling network access Multiuser system - advanced mechanisms to allow only valid access of the data. File owner/creator should be able to control: what can be done by whom Types of access Read Write Execute Append Delete List Attribute change

Access Lists and Groups
Mode of access: read, write, execute Three classes of users on Unix / Linux RWX a) owner access 7  RWX b) group access 6  1 1 0 c) public access 1  0 0 1 Ask manager to create a group (unique name), say G, and add some users to the group. For a particular file (say game) or subdirectory, define an appropriate access. Attach a group to a file chgrp G game

Windows 7 Access-Control List Management

A Sample UNIX Directory Listing

End of Chapter 13

Chapter 13: File-System Interface

Similar presentations

Presentation on theme: "Chapter 13: File-System Interface"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 13: File-System Interface

Similar presentations

Presentation on theme: "Chapter 13: File-System Interface"— Presentation transcript:

Similar presentations

About project

Feedback