Presentation is loading. Please wait.

Presentation is loading. Please wait.

File-System Management and Optimization & Example File Systems

Similar presentations


Presentation on theme: "File-System Management and Optimization & Example File Systems"— Presentation transcript:

1 File-System Management and Optimization & Example File Systems

2 File-System Management and Optimization
Section 4.4 in the Textbook (pg )

3 Disk-Space Management
Storing a file as a contiguous sequence of bytes is flawed. As file grows, it may have to be moved on the disk. Storing segments in memory has the same issue. However, moving a segment in memory is fast compared to moving a file from a disk position to another. For this reason, almost every file system chops files into blocks of fixed size. The obvious question is: how large should the size of each block be? Sectors, tracks, and cylinders seem like good choices for unit allocation. In paging systems, page size is also a major contender.

4 Block Size A large block size means that every file, even 1 byte files, takes up an entire block. Alternatively, a small block size means that files will be broken up across multiple blocks, leading to issues as each needs block must be read, reducing performance. Making a good decision about size requires knowledge of common file size distribution. We can use the result of studies to gauge this. Historically, 1-4KB sizes have been chosen, but with the increasing space on disks to tremendous values(over 1 TB on many), it may be better to soon move to 64 KB and accept the wasted space.

5

6 How to track free blocks?
Two methods are widely used: one involves making a linked list of disk blocks, the other involves utilizing bitmaps.

7 Disk Quotas To prevent hogging of disk space by individuals, multi user operating systems provide a mechanism for enforcing disk quotas. This entails the administrator assigning a maximum allotment of files and blocks per user, and the operating system ensures users do not exceed their quota. When a user opens a file, its attributes and disk addresses are loaded and put into an open file table in main memory. In the attributes is an entry telling who the owner is. Increases in file size will thereby be charged on the owners quota. The soft limit(the quota that has been set by an administrator) can be breached, which will trigger a warning. If the user ignores the warning too many times, they will not be able to log back in. However, the hard limit(total available disk space) may not be breached by any user.

8 File-System Backups If a computers file system is lost, recovery will be costly, difficult, time consuming, or even impossible. The data will be lost, and this can be catastrophic in some circumstances(Tax records, government data, military data, etc.) This can be overcome by creating backups, but this is not as simple as you may believe. Backups are made for two primary reasons: To recover from disaster, or to recover from stupidity. In the first case, data can be lost through physical means – a fire, a broken hard drive, spilling water onto a computer, etc. In the latter, users may accidentally delete important data . This problem is so common that the “Recycle Bin” has been added as a common data recovery feature. Making a backup takes a large amount of time and space, so doing it efficiently and conveniently is important. There are a few issues to consider, these are some of the three biggest ones: First, should the entire system be backed up, or only part of it? In common practice, it is typically only desirable to back up specific directories rather than the entire file system. Second, backing up files that have not changed since the last backup is wasteful, leading to the concept of incremental dumps. In its simplest form, this means making a complete dump(backup) periodically (Ex. Weekly) while making a daily dump of only the files that have changed since the last backup. Even better is to dump only those files that have changed since the last dump. Third, due to the possibility of compression making some portions of data unreadable, the decision of whether or not to compress before backing up data must be carefully considered.

9 How to make a backup? Two strategies: Physical dump and Logical Dump.
Physical dump starts at block 0 of the disk, writes all the blocks onto the output disk in order, and stops when it copies the last one. This is so simplistic that it could likely be made 100% bug free. However, there are some issues if you attempt to skip empty blocks or encounter bad blocks. Advantages of physical dumping are great speed and simplicity. Logical dump starts at one or more specified directories and recursively dumps all files and directories found there that have changed since a given base date. This produces a series of carefully identified directories and files, making it very easy to restore a specific file or directory. Logical dumping is the most common form of backup.

10

11 File-System Consistency
Inconsistent state When system crashes before modified blocks are written Critical when i-node blocks, directory blocks, or blocks containing the free list haven’t been written Utility Program Checks for file-system consistency UNIX uses fsck Windows uses sfc Checks blocks and files Build two tables containing counters initialized to 0 Keeps track of how many times each block is present in file How often each block is present in free list Read in all i-nodes using raw device Once read in then counter is incremented Examines free list or bitmap and updates the second table Checking blocks/directories used for efficiency reasons If number of i-nodes is larger than the number of i-nodes on disk, directory has been damaged

12 Figure 4-27. File system states.
(a) Consistent. (b) Missing block. (c) Duplicate block in free list. (d) Duplicate data block.

13 Four Cases A duplicate data block
Each block will have either a 1 in the first or second table file system is consistent Result of a crash where there is a missing block Reduces capacity of disk and wastes space Solution: file system checker adds them to free list A duplicate block in free list Occurs with free list but never with a bitmap Solution: rebuild free list A duplicate data block When one is removed, one is put on free list. Thus same block is both in use and free at same time OR both removed, is duplicated on free list Solution: file system checker allocates free block where duplicated block is copied into , inserted into one of the files. Except an error is reported to allow the user to inspect the damage

14 Checks Directory System
Has similar table of counters Starts in root directory Due to hard links, file may appear in two or more directories Symbolic links do not count Once done checking compares there numbers with link counts which is started in i-nodes themselves Counts start at 1 when a file is created and incremented each time a hard link is made Consistent file system, both counts will agree Two Errors Link count in i-node can be too high or too low High: not serious; wastes space on disk with files that are not many directory; fixed by setting link count in i-node to correct value Low: potentially catastrophic; Solution: force link count in i-node to actual value of directory entries

15 File-System Performance
Caching Block cache/buffer cache Most common technique used to reduce disk accesses Collection of blocks that logically belong on the disk but are being kept in memory for performance reasons Check all read requests If needed block is in cache, read requests is satisfied without disk access OR NOT, read into cache and copied where needed Usual way is having a hash table of disk address Cache vs. Paging Cache references are relatively infrequent in which can keep all blocks in exact LRU order with linked lists Also has a bidirectional list running through all the blocks in order of usage where least recently used block on the front of this list and the most recently used block at the end Now there is a situation where exact LRU is possible thus its undesirable Can cause inconsistency when reading in or i-node block that doesn’t get rewritten

16 Figure 4-28. The buffer cache data structures.
Cache vs. Paging Figure The buffer cache data structures.

17 Cache Algorithm Two Factors Block Read Ahead
Is block likely to be needed again soon? Is block essential to consistency of file system? Factors decide if they go to front or end UNIX has system call sync which forces all modified blocks out onto disk immediately Windows has system call Flush File Buffers Write-through caches: all modified blocks are written back to disk immediately Require more disk I/O than non write-through caches Get blocks into cache before they are needed to increase the hit rate Works for files that are actually being read sequentially Makes reasonable guesses if not then wastes a little bit of disk band width

18 Reducing Disk-Arm Motion
Puts blocks that are likely to be accessed in sequence close to each other (prefer in same cylinder) Solid-state disks(SSD) no moving parts On these the block can be only written only a limited number of times

19 Defragmenting Disks Performance can be restored by moving files around to make them contiguous and puts all the free space in one or more large contiguous regions on disk Called defrag on Windows Files can’t be moved are: paging file, hibernation file, and journaling log Because admin that would be required to do so is more trouble than worth Used mostly on Windows

20 Example File Systems Section 4.5 in the Textbook (pg )

21 The MS-DOS File System Used very early by IBM, up through Windows Vista period Used by other devices like media (mp3) players Makes a system call until the specified directory is found File size can be stored in up to 4GB Date values are stored in the directory in accuracy up to about 2 seconds

22 The MS-DOS directory entry.
Figure 4-30 The MS-DOS directory entry.

23 FAT-12, FAT-16, FAT-32 used for disk partitioning
Figure 4-31 FAT-12, FAT-16, FAT-32 used for disk partitioning

24 The UNIX V7 File System The directory has one entry that has two fields: (file name and number of “i- nodes”) The i-node contains other elements concerning times of modification/creation/etc as well as the size of the file it is storing Looks up directory by directory using links in the i-nodes This information is contained in strings of ASCII characters

25 Here is an example of the file directory from the textbook
Figure 4-33 Here is an example of the file directory from the textbook

26 CD-ROM File Systems These systems are particularly simple because they were designed for write- once media. They have no service for keeping track of free blocks. The ISO 9660 file system, the most common standard for CD-ROM file systems The Rock Ridge extension was created to make it possible to represent the UNIX file systems on a CD-ROM. The Joliet extensions was created by Microsoft to allow Window file systems to be copied to CD-ROM and then restored.

27 ISO 9660 file system One goal of this standard was to make every CD- ROM readable on every computer, independent of the byte ordering and the operating system used. CD-ROMS have a single continuous spiral containing the bits in a linear sequence. The bits along the spiral are divided into logical blocks of bytes. Some of these are for preambles, error correction, and other overhead. When used for music, CDs have leadins, leadouts, and intertrack gaps. Every CD-ROM begins with 16 blocks whose function is not defined by the ISO 9660 standard. Next comes one block containing the primary volume descriptor, which contains some general information about the CD-ROM. It also contains the root directory, telling where to find it on the CD-ROM. The root directory consists of a variable number of entries, the last contains a bit marking to mark it as the final one. Then comes the starting block of the file itself. With the time and date that the CD-ROM was recorded being stored in the next field, with separate bytes for the year, month, day, hour, minute, second and time zone.

28 ISO 9660 file system The flags field contains a few miscellaneous bits, including one to hide the entry in listings, one to distinguish an entry that is a file from an entry that is a directory, one to enable the use of the extended attributes, and one to mark the last entry in a directory. The next field deals with interleaving pieces of files. The next field tells which CD-ROM the file is located on, allows reference to a file located on another CD-ROM. The following field gives the size of the file name in bytes. It is followed by the file name itself. The base name can be up to 8 characters and the extension up to 3 characters. With the last two fields not always present, the Padding field is used to force every directory entry to be in even number of bytes, to align the numeric fields. The last is the System use field, its function and size is undefined, except that is must be an even number of bytes, and used differently by different systems. Entries within the directory are listed in alphabetical order except for the first two entries, the directory itself and its parent. No limit to the number of entries in the directory, but there is a limit to depth, being a maximum of 8. There is another limit that is defined by levels. Level 1 specifies that the characters for the file names are limited to 8 for the base and 3 for the extension. Level 2 relaxes the length restrictions. Level 3 relaxes the assumption that the files have to be contiguous.

29 Rock Ridge Rock Ridge CD-ROMs are readable on an computer, any system unaware of Rock Ridge extensions ignores them and sees a normal CD- ROM. The extensions are divided up into the following fields: PX – POSIX attributes the standard UNIX permission bits for owner, group, and others PN – major and minor device numbers associated with the file, to allow raw devices to be represented on a CD-ROM, to allow contents of the /dev directory can be written and reconstructed SL – Symbolic link, it allows a file on one file system to refer to a file on a different file system. NM – alternative name, which allows a second name to be associated with the file this name is not subject to the character set or length restrictions of ISO 9660, allows arbitrary UNIX file names on a CD-ROM The next three fields are used together to get around the ISO 9660 limit of directories that may be nested only eight deep. CL – Child location, PL – parent location, RE – relocation TF – time stamps, it contains the time the file was created, the time it was last modified, and the time it was last accessed Together these extensions make it possible to copy a UNIX file system to a CD-ROM and then restore them to a different system correctly.

30 Joliet Virtually all programs that run under Windows and use CD-ROMs can support Joliet. Major extensions provided by Joliet: Long file names – allows up to 64 characters, Unicode character set- enables use of unicode character set for other languages, Directory nesting deeper than eight levels- removes limitations for directory nesting, and Directory names with extensions – directory names can have extensions even though Windows rarely does this.

31 Video


Download ppt "File-System Management and Optimization & Example File Systems"

Similar presentations


Ads by Google