File Management – Implementation

File Management – Implementation
As we said, the user interests in the interface part of the file management such as; How to name the files , What are the operations allowed on these files , The structure view of the directories that contain these files, and so on. In this chapter, we will study the hidden activities of the file management that stand behind the file interface. These activities may consists of how to store the file on the storage media, the details of how to reach the different parts of the file , how to improve the efficiency and speed of the operation on the files. In short words, the file interface represents the logical view of the file management while the file implementation represents the physical view of the file management.

Figure in slide no. 3 shows a hierarchy levels which explains how the file manager relieves the user from the details and just let him sees what he needs only. level 1 : This is the I/O control level, Some times this level is classified as a part of the I/O management. Anyhow, this level contains the device drivers and the interrupt handlers to transfer information between the main memory and the disk. Shortly, this level deals directly with the hardware structure of the storage media. The device driver can be thought as a translator. For example, the input command may take the shape ( retrieve cylinder no. 73 , track no. 2 and sector no. 10 ) , then the disk drive will send specific bits pattern to the hardware controller to control the movement of the read/write head to reach the wanted information. Level 2 : This basic file system needs only to issue generic commands to the appropriate device driver to read or write physical blocks from/on the related disk, such as ( drive no. 2, cylinder no. 73, track no. 2, sector no. 10) Level 3 : This is the file organization module , it is responsible for the logical block number that stores the information into physical block number. Normally the logical blocks numbers are not matching with the physical blocks numbers as shown in the figure. Level 4 : This is the logical file system, It is responsible for all metadata information. Metadata information includes all file system structures that related to the managing of the file interfacing with the user, such as the file name, directory structures, the permissions, protection and security....etc.

k k k User Layers of file system 1 2 - 1 2 - 1 2
File as physical sequence of blocks Real addresses of the blocks on the disk as cylinder, track and sector File as logic sequence of blocks File as sequence of bytes or records 1 2 - k 1 2 - k 1 2 k 1 2 .... n User Hardware Controller logical file system file organization module basic file system I/O controller Layers of file system

Space allocation for the files
There are three main methods used to allocate space on the disk to store the file. Contiguous allocation Linked allocation Indexed allocation a) linked scheme b) multilevel index c) combined scheme

Contiguous allocation: In this method, each file occupy a set of contiguous blocks on the disk. Contiguous allocation of a file is defined by the disk address of the first block and the length ( in block units) of the file. If the file is n blocks long and starts at location b, then it occupies blocks b, b+1 , b+2, ....., b+n-1. The neighbouring figure shows an example for this type of allocation. The advantage of this method is the time required to access the blocks of the file That is because as we see the blocks are contiguous so the read/write head will move minimal distance to deal with the information of the file. There are two problems 1) when the file grows and there are no new enough contiguous blocks for that growing. 2) External fragmentation when there are no enough space for a new file.

Linked allocation : Here each file is a chain of linked blocks
Linked allocation : Here each file is a chain of linked blocks. And these blocks can be scattered anywhere on the disk as shown in the neighbouring figure. So as we see the entry of the directory contains a pointer to the first and the last blocks of the file. Also each block belongs to the file will contain a pointer to the next block. This method will solve the problems of the contiguous allocation method. The file can continue to grow as long as free blocks are available. There is no external fragmentation , since any free block can be used to satisfy a request. The problems of this method are The data in the block is no longer a power of two. For example, if the block size is 1k (1024) byte and the pointer ( disk address) is 2 bytes then the data in the block is in fact 1022 2) The access time is very expensive because we need to go through the scattered chain blocks to read the pointers (disk addresses) as it is clear in the figure

Indexed allocation: This method will solve the problems that we faced in the linked allocation method. That is done by bringing all the pointers together into one location called index block. Each file has it’s own index block as shown in the neighbouring figure. This index block is an array of disk-block addresses (pointers). The ith entry in the index block points to the ith block of the file. The directory contains the address of the index block. As shown in the figure the indexed allocation supports the direct access without suffering from the external fragmentation because any free block on the disk can be used for storing . The problem with this method the waste space that occurs when the index block is created. Especially , when the file is too small (say 2 blocks ) then this file needs just two pointers to point to it’s blocks. However, full index block must be allocated. When the file is too large, then one block is not enough to be used as index block to hold all the needed pointers for that large file. So, different modifications are suggested to improve the indexed allocation method.

Modifications of the indexed allocation method
a) Linked scheme : In this modification to allow for large files, we can link together several index blocks. For example, an index block might contain a set of the first 100 disk-block addressers ( pointers). The last word in the index block is a pointer ( if needed) to another index block ( for a large file) as shown below. 100 pointers to file blocks 100 pointers to files blocks b) Multilevel index : In this modification, two levels of indexing are used. The first level is an index block used to point to a set of second –level index blocks, which in turn point to the file blocks (data) . This scheme is shown below, but in general it could be more than two levels of pointers according to the maximum file size. Pointers pointers data For block size of 4 KB and 4 byte pointer ,two levels of indexes allow a file size up to 4 GB = 1 K * 1 K * 4 K.

c) Combined scheme : This modification is suggested in UNIX operating system. In this method The directory structure contains an entry for each file. This entry consists of the file name and a pointer to a special structure called i-node. Figure in slide no. 10 shows an entry for a file in a directory and the structure of the related i-node. The idea behind the use of the i-node to reach the data blocks of the file, is associated with the 15 pointers that are found in the body of the i-node itself. The first 12 of these pointers point directly to blocks that contain data of the file.Thus , for small files ( of no more than 12 blocks) the i-node itself is enough to locate the address of the blocks belong to that file. For example, if the block size is 1 KB, then up to 12 KB file can be accessed directly. The next three pointers which are called; single indirect , double indirect , and triple indirect. The single indirect pointer points to an index block containing not data but the addresses of blocks that do contain data. So, with 1KB block size and 4 byte pointer size ( block address) , the single indirect can address a file up to 268 block (12 block in the i-node , 256 block in the index block of the single indirect). The double indirect pointer points to an index block containing the addresses of blocks that in turn contain pointers to the actual data blocks. The double indirect is sufficient for files up to (12 block+ 256 block block) = (12 K K + 64 K). The last pointer which is the triple indirect, will point in tripe stages of addresses to reach the actual data blocks. So, the triple indirect is sufficient for files up to (12 block block + 64 K block block) = (12 K+ 256 K + 64 M * 220 K) = (12 K K + 64 M + 16 G). The strength of the UNIX scheme is that the indirect blocks are used only when they are needed. For file under 12 KB, n. o indirect blocks at all are needed. Note that for even the longest file, at most three disk references are needed to locate the disk address for any byte in the file, excluding the disk reference to get the i-node , which is fetched when the file is opened and kept in memory until it is closed

Structure of the i-node
Entry in the directory for a file Structure of the i-node

File System Reliability
Destruction of a file system is often a far greater disaster than destruction a computer. A computer can be replaced by purchasing a new one from a dealer. But, if the file system is destroyed whether due to hardware or software attack, restoring the information will be difficult , time consuming, and in many cases, impossible. In this section we will look at some of the issues involved in safeguarding the file system. 1) Any hard disk has bad blocks right from the start. It is just too expensive to manufacture them completely free of all defects. In fact , most hard disks are supplied with lists of the bad blocks that are dicovered on testing. Two solutions to the bad block problem, one hardware and one software. The hardware solution is to dedicate a sector on the disk to the bad block list. When the controller is first initialized, it reads the bad block list and picks a spare block to replace the defective ones, recording the mapping in the bad block list. The software solution requires the file system to construct a file containing the numbers of all bad blocks. So, these blocks will never occur in the free blocks that are used on request to store data files.

2) Even with a cleaver strategy for dealing with bad blocks, it is important to back
up files frequently. One of these strategy is by dumping the entire disk to magnetic tape ( very cheap). If the disk is very large then at least back up the important directories and files. Another strategy may be used but wastes half the storage, is to provide each computer with two drivers instead of one. Both drivers are divided into two halves: data and backup. Each night the data portion of drive 0 is copied to the backup portion of drive 1, and vice versa, as shown in the figure below. In this way, even if one drive is completely damaged , no information is lost. Disk Disk 1 Backup Backup Data 0 Data 1 cpu

File System performance
Access to disk is much slower than access to memory ( a factor of slower). As a result of this difference in access time, many file systems have been designed to reduce the number of disk accesses needed. The most common technique used to reduce disk accesses is the block cache. Block cache : A cache is a collection of blocks that logically belong on the disk , but are being kept in memory for performance reason. The idea behind the cache is shown in the figure below. When a request for a block is needed then the cache will be checked to see if the needed block is found. If it is, the request can be satisfied without a disk access. I the block is not in the cache, it is first read into the cache , and then copied to wherever it is needed. Subsequent requests for the same block can be satisfied from the cache. When the cache is full then a replacement algorithm is needed ( like page replacement algorithms FIFO,LRU,...etc). disk Main memory cache

Blocks locations: Another important technique is used to improve the file system performance by reducing the amount of disk arm motion. This is done by putting blocks that are likely to be accessed in sequence close to each other. One way to do that is by use the blocks from the same cylinder. So we can reduce the rotational positioning time, which is considered as long time. Defragmenting: At the beginning all free disk space is in a single contiguous unit following the installed files. However, as time goes on , files are created and removed and typically the disk becomes badly fragmented, with files and holes all over the space. As a consequence, when a new file is created, the blocks used for it may be spread all over the disk, giving poor performance. The performance can be restored by moving files around to make them contiguous and to put at least most of the free space in one or more large contiguous regions on the disk. This activity is called defragmentation and users should do it regularly.

File Management – Implementation

Similar presentations

Presentation on theme: "File Management – Implementation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

File Management – Implementation

Similar presentations

Presentation on theme: "File Management – Implementation"— Presentation transcript:

Similar presentations

About project

Feedback