Presentation on theme: "University of Pennsylvania 10/26/00CSE 3801 File Systems CSE 380 Lecture Note 11b Insup Lee."— Presentation transcript:
University of Pennsylvania 10/26/00CSE 3801 File Systems CSE 380 Lecture Note 11b Insup Lee
University of Pennsylvania 10/26/00CSE 3802 FILE SYSTEMS Computer applications need to store and retrieve information: need to store very large amount of information need to store information permanently need to share information A file is a collection of data records grouped together for purpose of access control and modification; a data record is just a linear list of information items. A file system is that software responsible for creating, destroying, organizing, reading, writing, modifying, moving, and controlling access to files; and for management of the resources used by file.
University of Pennsylvania 10/26/00CSE 3803 File System Components 1File Accessing Language Interface to user: create, delete, read, write, modify, control the access to files. 2Accessing Procedures Routines for file directory management and searching, opening and closing files, mapping symbolic file names to their real addresses, controlling the access of files to legitimate users, managing interval buffers and IO programs. 3IO System Responsible for maintaining queues of IO requests, scheduling and initiating the operation, servicing IO errors, and handling IO completion signals. 4Auxiliary Storage Management Keeping track of available space on secondary storage, allocating and deallocating blocks of secondary storage on request. 5Backup and Recovery Ensures that the file system can be recovered from hardware and software errors.
University of Pennsylvania 10/26/00CSE 3804 Descriptors for Files 1File Identification: A symbolic name 2Physical Address. Location and extent of a file 3Access Control information (Protection) Who can access and how the file can be used (R/W/E) protection from several simultaneous writes 4Historical and Measurement Information (Accounting information) Creation date, date of last change or last read, # of times the file has been opened, and other usage data. 5Disposition: Temporary/permanent 6Coding of Information Binary(executable), Characters(EBCDIC & ASCII) 7Physical or file type Sequential, linked, indexed, fixed/variable records 8Logical representation
University of Pennsylvania 10/26/00CSE 3805 Directory Structure (structure of relating files to each other) A. Single level 1each file has a unique name (e.g., Univac Exec-8) 2since directory is large, it maybe expensive to search it. B. Two-level 1Each file has a two-part name of the form (user, local name) 2The “user” identifies a human or a human/project combination 3The “name” is meaningful only for that “user” 4e.g., Tops-10
University of Pennsylvania 10/26/00CSE 3806 C. Hierarchical (Tree-Structured) 1A file has a name with arbitrary member of parts (e.g. UNIX, MULTICES, VMS (8)) 2Each part needs to be unique only w.r.t. the previous parts 3A directory is in many ways a file like any other. (Its special feature is that it is internal to the tree and contains information about the files at the next level under it.) 4Extends nicely to a distributed file system
University of Pennsylvania 10/26/00CSE 3807 D. Aliases: the file appears under several names 1The file's contents are stored only once. 2Other attributes may be stored for each name separately, leading to different attributes for the different names. In UNIX, all attributes except the name are stored exactly once. 3Issues: - Who pays for a file that appears in several directories? (Best answers, charge 1/n, where n = # of names that the file has) - What happens when the file is deleted by only one name? (Do not physically remove until it has been deleted under all names) - What about circular paths in the file name space? Allow such paths, and make file search programs smart enough to detect cycles (Multics) or disallow such paths, not allowing directories to have aliases (UNIX)
University of Pennsylvania 10/26/00CSE 3808 Other issues E. Name Convention UNIX VMS Meaning p pas pascal source o obj object module file - output from an assembler or compiler out exe linked executable file F. Version number: a file name includes a version number (e.g., VMS) writing over a file only creates a new version; it does not destroying the previous version G. On Exec-8 (Univac 1100 series) up to 3 files may be indicated by a single name: source, relocatable and executable. Which one to use is determined by its application.
University of Pennsylvania 10/26/00CSE 3809 Access Rights Possible Rights 1Read 2Write 3Append at the end of the file 4Execute 5Delete (In Unix, this right is the “write” permission) 6Change the rights on the file (In Unix, only the owner has this right; it may not be turned off)
University of Pennsylvania 10/26/00CSE 38010 Access lists Access rights are stored for each file in an access list. 1One for each user in the system 2Access rights are specified for each of the following three pools: (a) the owner of the file (b) other people in the owner's group (c) the world Can't specify. allow A to share files with B allow B to share files with C but prevent A & C from sharing files 3Each user owns a list of capabilities which contain files & access rights the given user has.
University of Pennsylvania 10/26/00CSE 38011 Allocation Methods A. Goals 1fast sequential access 2fast random access 3ability to grow 4easy allocation of backing store 5minimum fragmentation B. Free Space Management 1Linked list of free blocks (or tracks or cylinders) Each entry may point to many free blocks, not just one, for efficiency. This method leads to random allocation on the disk. So file access may require a lot of seek activities. 2Bit map Each block (or tracks or cylinders) represented by one bit. Allow clustered allocation
University of Pennsylvania 10/26/00CSE 38012 C. Methods of allocating space 1Fixed contiguous regions: Each file occupies a fixed region. Fast seq. & random access, not easy to grow beyond its original intended size; use bit map for allocation 2Contiguous regions with overflow areas: Same as 1), but a file that grows beyond its original area is allowed a secondary contiguous area. Sequential access still fast. Random access fairly fast but require more calculation. 3Linked allocation The file is divided into blocks; each block points to the next one in the list. Sequential access slow, because of seek time. Random access requires sequential scans. Easy to increase the size of the file. 4Indexed allocation The file again is divided into blocks; each file has its own index block; each index points to a block
University of Pennsylvania 10/26/00CSE 38013 Comparisons Waste index space for small files. How large an index block should be? Use link list of index blocks. Sequential access slow due to seek time. Random access not so slow. Comparisons: Fast Slow Seq. Access 1 2 3 4 Rand Access 1 2 4 progressively slow Ability 1 2 OK 3 Easy 4 Easy to Grow impossible Fragmentation 1 2 4 3 Largest smallest 1Fixed Contiguous Regions 2Contiguous Regions with Overflow Areas 3Linked Allocation 4Indexed Allocation
University of Pennsylvania 10/26/00CSE 38014 The UNIX FILE SYSTEM (From “Unix Implementation”) 1A file is an array of bytes 2The canonical view of a “disk” is a randomly addressable array of 512-byte blocks 34 Regions (a) 1st block: unused for (booting procedures) (b) 2nd block: “super block”. Contains the size of the disk and the boundaries of the other regions (c) i-list: list of file definitions. Each file definition is a 64-byte structure, called i-node. (d) free storage blocks for the contents of files. 4Each i-node contains owner, protection bits, size, directory/file and 13 disk addresses. The first 10 of these addresses point directly at the first 10 blocks of a file. If a file is larger than 10 blocks (5,120 bytes), the 11st points at a block that
University of Pennsylvania 10/26/00CSE 38015 Unix File System (2) contains the addresses of the next 128 blocks of the file (70,656 bytes); 12th points at up to 128 blocks, each pointing to 128 blocks of the file (8,459,264 bytes); 13th address is for a “triple indirect” address (1,082,201,087 bytes). 5A directory is accessed exactly as an ordinary file. It contains 16 byte entries consisting of a 14-byte name and an i-number. The root of the file system hierarchy is at a known i-number. 6New block sizes 4.1 BSD 1024-byte block 4.2 BSD large blocks 8192/4096 bytes small blocks 1024/512 bytes utilization 30% disk bandwidth with new whereas 3% disk bandwidth with old (512 byte blocks)
University of Pennsylvania 10/26/00CSE 38016 Unix File System (3) Synch done every 30 secs. (i.e. Every 30 seconds, the super-block is written to the disk, to keep the in-core and disk copies synchronized) If the in-core super-block is lost due to system crash, then the free list is lost and must be constructed by a lengthy examination of all blocks in the file system.
University of Pennsylvania 10/26/00CSE 38017 File System Partitions Why partition a physical device into multiple file systems. 1different files systems can support different uses. (e.g. swap area for UM) 2improve reliability Software damage is confined within a single file system 3improve efficiency by varying parameters for each partition (such as the block & fragment sizes) 4prevent one large file from using all available space (files can not be split across file systems)