Understanding Operating Systems Seventh Edition Chapter 8 File Management
Learning Objectives After completing this chapter, you should be able to describe: The fundamentals of file management File-naming conventions, including the role of extensions The difference between fixed-length and variable-length record format The advantages and disadvantages of several file storage techniques Comparisons of sequential and direct file access Access control techniques and how they compare The role of data compression in file storage Understanding Operating Systems,7e
Introduction File management system Software responsible for creating, deleting, modifying, controlling access to files File Manager’s efficiency directly affected by: How the system’s files are organized How files are stored How each file’s records are structured How user access to all files is protected Understanding Operating Systems,7e
Responsibilities of the File Manager Four tasks File storage Keeping track of where each file is stored Policy implementation: Determine where and how files are stored Efficiently use available storage space Determine who (users) will have access Provide efficient file access for users using device-independent commands Understanding Operating Systems,7e
File Manager Responsibilities (con’td) File allocation (once user access cleared) Activate secondary storage device, load file into memory, and update file File deallocation Update file tables, rewrite file (if revised), and communicate file availability Understanding Operating Systems,7e
Responsibilities of the File Manager (cont'd.) User access factors: flexibility of access to information Share files Provide distributed access Allow users to browse public directories subsequent protection Prevent system malfunctions Security checks Accounts \ passwords Understanding Operating Systems,7e
Definitions Program files Data (document) files Directories (folders) Contain executable instructions Data (document) files Contain data – created \ used by programs Directories (folders) Listings of filenames and their attributes (used for organization) Understanding Operating Systems,7e
Data Definitions / Data Hierarchy Bit Byte Field Record File Database Bit – smallest data element Byte – 8 or 16 bits / representing a character Field – group of related bytes, identified by user (name, type, size) Record – group of related fields File – group of related records, data used/generated by specific application program Database – group of related files, interconnected at various levels, giving users flexibility of access to stored data Understanding Operating Systems,7e
Interacting with the File Manager Common user commands OPEN, DELETE, RENAME, COPY (figure 8.2) Typical menu of file options. © Cengage Learning 2014 Understanding Operating Systems,7e
Interacting with the File Manager Device-independent Physical location: knowledge not needed Cylinder, surface, sector Device medium: knowledge not needed Tape, magnetic disk, optical disc, flash storage Network knowledge: not needed Device-independence facilitated through device driver software Understanding Operating Systems,7e
Interacting with the File Manager (cont'd.) Logical commands Broken into lower-level steps Example: READ Move read/write heads to record cylinder Wait for rotational delay (sector containing record passes under read/write head) Activate appropriate read/write head and read record Transfer record to main memory Send flag indicating free device for another request System monitors for error conditions Understanding Operating Systems,7e
Typical Volume Configuration Secondary storage unit (removable, nonremovable) Multi-file volume Contains many files Multi-volume files Extremely large files spread across several volumes Volume name (label) File manager manages Easily accessible Innermost part of CD, beginning of tape, first sector of outermost track Understanding Operating Systems,7e
Typical Volume Configuration (cont'd.) (figure 8.3) The volume descriptor, which is stored at the beginning of each volume, includes this vital information about the storage unit. © Cengage Learning 2014 Understanding Operating Systems,7e
Typical Volume Configuration (cont'd.) Master file directory (MFD) Stored immediately after volume descriptor Lists Names and characteristics of every file in volume File names (program files, data files, system files) Subdirectories If supported by file manager Remainder of volume Used for file storage Understanding Operating Systems,7e
Typical Volume Configuration (cont'd.) Single directory per volume Supported by early operating systems Disadvantages Long search time for individual file Directory space filled before disk storage space filled Users cannot create subdirectories Users cannot safeguard their files Each program needs unique name Even those serving many users Understanding Operating Systems,7e
Introducing Subdirectories File managers Create MFD for each volume Contains file and subdirectory entries Subdirectory Created upon account opening Treated as file Flagged in MFD as subdirectory Unique properties Improvement over single directory scheme Problems remain: unable to logically group files Understanding Operating Systems,7e
Introducing Subdirectories (cont'd.) File managers today Users create own subdirectories (folders) Related files grouped together Implemented as inverted tree structure Efficient system searching of individual directories May require several directories to reach file Understanding Operating Systems,7e
Understanding Operating Systems,7e (figure 8.4) File directory inverted tree structure. The “root” is the MFD shown at the top, each node is a directory file, and each branch is a directory entry pointing to either another directory or to a real file. All program and data files subsequently added to the tree are the leaves, represented by circles. © Cengage Learning 2014 Understanding Operating Systems,7e
Introducing Subdirectories (cont'd.) File descriptor Filename: each must be unique File type: organization and usage System dependent File size: for convenience File location Identifies first physical block (or all blocks) Date and time of creation Owner Protection information: access restrictions Record size: fixed size, maximum size Understanding Operating Systems,7e
File-Naming Conventions Filename components Relative filename and extension Complete filename (absolute filename) Includes all path information Relative filename Name without path information Appears in directory listings, folders Provides filename differentiation within directory Varies in length One to many characters Operating system specific Understanding Operating Systems,7e
File-Naming Conventions (cont'd.) Extensions Appended to relative filename Two to four characters Separated from relative filename by period Identifies file type or contents and associated program Example BASIA_TUNE.AVI Unknown extension Requires user intervention Understanding Operating Systems,7e
File Extension / Type – Associated Program (Windows)
Understanding Operating Systems,7e (table 8.1) File name parameters for several operating systems. © Cengage Learning 2014 Understanding Operating Systems,7e
File-Naming Conventions (cont'd.) Operating system specifics Windows Drive label and directory name, relative name, and extension C:\users\bob\documents\assignment2.docx UNIX/Linux Forward slash (root), first subdirectory, sub-subdirectory, file’s relative name /usr/mary/photos/portrait.jpg Understanding Operating Systems,7e
File Organization / Record Format File organization - arrangement of records within files Record formats Fixed-length records Direct access: easy Record size critical Variable-length records Direct access: difficult No empty storage space and no character truncation File descriptor stores record format Used with files accessed sequentially Text files, program files Index used to access records Understanding Operating Systems,7e
Record Format (cont'd.) Understanding Operating Systems,7e (figure 8.5) Data stored in fixed length fields (top) that extends beyond the field limit is truncated. Data stored in variable length fields (bottom) is not truncated. © Cengage Learning 2014 Understanding Operating Systems,7e
Physical File Organization Record arrangement and medium characteristics Magnetic disks file organization Sequential, direct, indexed sequential File organization scheme selection considerations Data volatility File activity File size Response time Understanding Operating Systems,7e
Physical File Organization (cont'd.) Sequential record organization Records stored and retrieved serially One after the other Easiest to implement File search: beginning until record found Optimization features may be built into system Select key field from record and sort before storage Complicates maintenance algorithms Preserve original order when records added, deleted Understanding Operating Systems,7e
Physical File Organization (cont'd.) Direct record organization Direct access files Requires direct access storage device implementation Random organization Random access files Relative address record identification Known as logical addresses Computed when records stored and retrieved Hashing algorithms Transform each key into a number Understanding Operating Systems,7e
Physical File Organization (cont'd.) Direct record organization (cont'd.) Advantages Fast record access Sequential access if starting at first relative address and incrementing to next record Updated more quickly than sequential files No preservation of records order Adding, deleting records is quick Disadvantages Hashing algorithm collision: records with unique keys may generate the same logical address Understanding Operating Systems,7e
Understanding Operating Systems,7e (figure 8.6) The hashing algorithm causes a collision. Using a combination of street address and postal code, it generates the same logical address (152132737) for three different records. © Cengage Learning 2014 Understanding Operating Systems,7e
Physical File Organization (cont'd.) Indexed sequential record organization Best of sequential and direct access Indexed Sequential Access Method (ISAM) software Advantage: no collisions (no hashing algorithm) Generates index file for record retrieval Divides ordered sequential file into equal sized blocks Each entry in index file contains the highest record key and physical data block location Search index file Overflow areas Understanding Operating Systems,7e
Contiguous Storage Records stored one after another Advantages Any record found once starting address, size known Easy direct access Disadvantages Difficult file expansion; fragmentation (figure 8.8) With contiguous file storage, File A can’t be expanded without being rewritten to a larger storage area. File B can be expanded, by only one record replacing the free space preceding File C. © Cengage Learning 2014 Understanding Operating Systems,7e
Access Control Verification File sharing Data files, user-owned program files, system files Advantages Save space, synchronized updates, resource efficiency Disadvantage Need to protect file integrity File actions READ only, WRITE only, EXECUTE only, DELETE only, or a combination Understanding Operating Systems,7e
Access Control Matrix Advantages Easy to implement Works well in system with few files, users (table 8.3) The access control matrix showing access rights for each user for each file. © Cengage Learning 2014 Understanding Operating Systems,7e
Access Control Matrix (cont'd.) Disadvantages As files and/or users increase, matrix increases Possibly beyond main memory capacity Wasted space: due to null entries (table 8.4) The five access codes for User 2 from Table 8.3. The resulting code for each file is created by assigning a 1 for each checkmark, and a 0 for each blank space. © Cengage Learning 2014 Understanding Operating Systems,7e
Access Control Lists Modification of access control matrix technique (table 8.5) An access control list showing which users are allowed to access each of the five files. This method uses storage space more efficiently than an access control matrix. © Cengage Learning 2014 Understanding Operating Systems,7e
Access Control Lists (cont'd.) Contains user names granted file access User denied access grouped under “WORLD” Shorten list by categorizing users SYSTEM (ADMIN) Personnel with unlimited access to all files OWNER (USER) Absolute control over all files created in own account GROUP All users belonging to appropriate group have access WORLD All other users in system Understanding Operating Systems,7e
Understanding Operating Systems,7e
Capability Lists Lists every user and respective file access Can control access to devices as well as to files Most common (table 8.6) A capability list shows files for each user and requires less storage space than an access control matrix. © Cengage Learning 2014 Understanding Operating Systems,7e
Data Compression Compression - technique used to reduce file size Saves storage space required for files Improves speed / efficiency of transferring files Two algorithm types Lossless: retains all data in the file Typically used for text or arithmetic files applications Requires decompression Lossy: removes some data - often with minimal loss quality Typically used for image and sound file applications Understanding Operating Systems,7e
Text Compression Records with repeated characters Repeated terms Repeated characters are replaced with a code Repeated terms Compressed using symbols to represent most commonly used words University student database common words Student, course, grade, department each are represented with single character Front-end compression Entry takes given number of characters from previous entry that they have in common Understanding Operating Systems,7e
Text Compression Compressing repeating characters or terms For example: Data in fixed length field may not use all characters in field. LastName – fix length field of 25 characters The name Adams would be followed 20 spaces\blanks Adamsbbbbbbbbbbbbbbbbbbbb (25 characters) Repeating spaces\blanks could be compressed to Adamsb20 (8 characters) Reducing size by 17 characters Understanding Operating Systems,7e
Image and Sound Compression Lossy compression Irreversible: original file cannot be reconstructed Compression algorithm highly dependent on file type JPEG: still images MPEG: video images International Organization for Standardization (ISO) World’s leading developer of international standards Understanding Operating Systems,7e
Image and Sound Compression Typical uncompressed CD quality audio file such as .wav or .aiff requires about 10MB for 1 minute music 3.5 minute song ~ 35 MB file MP3: common lossy compression technique that reduces file size (90%) to about 10% of original with some (minimal) loss of sound quality 3.5 minute song ~ 3.5 MB file Common loseless compression formats such as WavPack and ALAC (Apple loseless) result in a 50% reduction in file size with no loss of quality 3.5 minute song ~ 17.5 MB file Trade off between size reduction and processing\ decompression time PCM audio format typically stored in .wav or .aiff Understanding Operating Systems,7e
Conclusion File manager Controls every file and processes user commands Manages access control procedures Maintain file integrity and security File organizations Sequential, direct, indexed sequential Physical storage allocation schemes Contiguous, noncontiguous, indexed Record types Fixed-length versus variable-length records Access control methods Data compression techniques Understanding Operating Systems,7e