Presentation is loading. Please wait.

Presentation is loading. Please wait.

File Management Marc’s first try, Please don’t sue me.

Similar presentations


Presentation on theme: "File Management Marc’s first try, Please don’t sue me."— Presentation transcript:

1 File Management Marc’s first try, Please don’t sue me.

2 Introduction Files Files Long-term existence Long-term existence Can be temporally decoupled from applications Can be temporally decoupled from applications Sharable between processes Sharable between processes Can be structured to the task Can be structured to the task Can be viewed in various logical manners Can be viewed in various logical manners Can have permissions for individuals or groups Can have permissions for individuals or groups Can be manipulated in a variety of ways Can be manipulated in a variety of ways

3 File Manipulation Operations Create Create Delete Delete Open Open Close Close Read (all or a portion) Read (all or a portion) Write (append or update) Write (append or update)

4 Internal File Structure Byte (most UNIX) Byte (most UNIX) Field Field Record Record File File Database Database

5 Internal File Structure (cont) Field: Field: Basic logical element of data Basic logical element of data Characterized by length and data type Characterized by length and data type ASCII String, decimal, integer, etc ASCII String, decimal, integer, etc Fixed or variable length Fixed or variable length With variable-length, may have subfields With variable-length, may have subfields Length may be indicated by demarcation Length may be indicated by demarcation

6 Internal File Structure (cont) Record: Record: A collection of related fields A collection of related fields Can be treated as a unit by app or user Can be treated as a unit by app or user Can be fixed or variable length Can be fixed or variable length If # of fields is variable, each has a name If # of fields is variable, each has a name Entire record usually has a length Entire record usually has a length

7 Internal File Structure (cont) File: File: A collection of similar records A collection of similar records Treated as a single entity Treated as a single entity Can be referenced by name Can be referenced by name Access control restrictions implemented Access control restrictions implemented Sometimes enforced at the record or field level Sometimes enforced at the record or field level

8 Internal File Structure (cont) Database: Database: Collection of related data (many files) Collection of related data (many files) Various explicit relationships between data Various explicit relationships between data Usually managed by a DBMS Usually managed by a DBMS Not usually ‘built-in’ to an OS Not usually ‘built-in’ to an OS

9 Internal File Structure (cont) Database: Database: Collection of related data (many files) Collection of related data (many files) Various explicit relationships between data Various explicit relationships between data Usually managed by a DBMS Usually managed by a DBMS Not usually ‘built-in’ to an OS Not usually ‘built-in’ to an OS

10 File Access Operations Operating primarily on records, but abstraction can be applied to just bytes: Retrieve_All Retrieve_All Read all records into memory in sequence Read all records into memory in sequence Retrieve_One Retrieve_One Usually associated with interactive, transaction-oriented applications Usually associated with interactive, transaction-oriented applications

11 File Access Operations (cont) Retrieve_Next/Previous Retrieve_Next/Previous Retrieve next record in some predefined logical sequence. Retrieve next record in some predefined logical sequence. Often associated with search Often associated with search Insert_One Insert_One May involve random access, or appending May involve random access, or appending Delete_One Delete_One Certain linkages or other data structures may require updating to preserve sequencing Certain linkages or other data structures may require updating to preserve sequencing

12 File Access Operations (cont) Update_One Update_One One-two punch: One-two punch: Retrieve a record, update one or more fields, then rewirte the updated record back into the file. Retrieve a record, update one or more fields, then rewirte the updated record back into the file. With variable-length fields/records, may require much more data structure manipulation. With variable-length fields/records, may require much more data structure manipulation. Retrieve_Few Retrieve_Few Get some specified number of records Get some specified number of records Usually used in databases when selecting on certain criteria Usually used in databases when selecting on certain criteria

13 File Management Systems Meet data management requirements of user Meet data management requirements of user Guarantee, whenever possible, that file data are valid Guarantee, whenever possible, that file data are valid Optimize performance (both throughput and response time) Optimize performance (both throughput and response time) Provide I/O support for various storage devices Provide I/O support for various storage devices Minimize or eliminate the potential for lost or destroyed data Minimize or eliminate the potential for lost or destroyed data Provide a standardized set of I/O interface routines to use processes Provide a standardized set of I/O interface routines to use processes Provide I/O support for multiple users Provide I/O support for multiple users

14

15 File System Architecture Device drivers Device drivers Responsible for starting and completing I/O requests to various peripheral devices Responsible for starting and completing I/O requests to various peripheral devices Basic file system (physical I/O level in OS) Basic file system (physical I/O level in OS) Deals with interchange of blocks of data Deals with interchange of blocks of data Does not understand content Does not understand content Basic I/O supervisor (part of OS) Basic I/O supervisor (part of OS) Maintains control structures for device I/O, scheduling, and file status. Maintains control structures for device I/O, scheduling, and file status. Logical I/O Logical I/O General-purpose facility for accessing records General-purpose facility for accessing records Maintains basic data about files (indices, etc) Maintains basic data about files (indices, etc)

16

17 File Organization and Access Several, sometimes conflicting criteria for organization of files: Several, sometimes conflicting criteria for organization of files: Short access time Short access time Ease of update Ease of update Economy of storage Economy of storage Simple maintenance Simple maintenance Reliability Reliability Conflict: economy of storage vs. redundancy Redundancy increases access speed and reliability, but also increases storage requirements Conflict: economy of storage vs. redundancy Redundancy increases access speed and reliability, but also increases storage requirements

18 Common File Organizations Pile Pile Data are collected in the order in which they arrive Data are collected in the order in which they arrive Each record consists of one burst of data Each record consists of one burst of data Records may have a wildly varying assortment of fields and field-lengths Records may have a wildly varying assortment of fields and field-lengths Each field must be self-describing Each field must be self-describing Record access is by exhaustive search. Record access is by exhaustive search. When you don’t know what you’ll get, this uses space well and is easy to update When you don’t know what you’ll get, this uses space well and is easy to update

19

20 Common File Organizations (cont) Sequential File Sequential File Fixed format used for records Fixed format used for records Length and position of each field known, requiring that only values of fields must be stored Length and position of each field known, requiring that only values of fields must be stored First field of every record is key field, records then stored in key sequence (can have variations) First field of every record is key field, records then stored in key sequence (can have variations) NOT good for interactive applications with individual record queries or updates NOT good for interactive applications with individual record queries or updates Inserting records is also inefficent, requiring periodic “batch merges” Inserting records is also inefficent, requiring periodic “batch merges” Can be implemented by organizing file physically as linked list Can be implemented by organizing file physically as linked list

21

22 Common File Organizations (cont) Indexed Sequential File Indexed Sequential File Uses an index to support random access Uses an index to support random access Requires an overflow file to handle additions Requires an overflow file to handle additions Index uses same key as main file, and has a pointer into the file, greatly improves search time. Index uses same key as main file, and has a pointer into the file, greatly improves search time. Can have multilevel indices to get blazing fast speed Can have multilevel indices to get blazing fast speed

23

24 Common File Organizations (cont) Indexed File Indexed File Uses an index to support random access Uses an index to support random access Maintains multiple indices for each type of field that may be the subject of a search Maintains multiple indices for each type of field that may be the subject of a search Records are accessed only by their indices, never by traversal Records are accessed only by their indices, never by traversal Variable-length fields can be used Variable-length fields can be used Exhaustive index and partial index may be used Exhaustive index and partial index may be used

25

26 Common File Organizations (cont) Hashed File Hashed File Hashes on the key value to go directly to the record on disk. Hashes on the key value to go directly to the record on disk. Primarily efficient for fixed-length records and Retreive_One operations Primarily efficient for fixed-length records and Retreive_One operations

27

28 File Directories Is almost always a file itself Is almost always a file itself Contains info for each file like: Contains info for each file like: File name, type, organization File name, type, organization Volume, starting address, size used/allocated Volume, starting address, size used/allocated Owner, access info, permitted actions Owner, access info, permitted actions Creation date, creator, last accessed, last accessor, last modified, last modifier, last backup, current usage Creation date, creator, last accessed, last accessor, last modified, last modifier, last backup, current usage

29 File Directory Operations Search Search Locate directory entry corresponding to specified file Locate directory entry corresponding to specified file Create file Create file Add new directory entry Add new directory entry Delete file Delete file Remove directory entry Remove directory entry List List Show directory contents, with possible filters Show directory contents, with possible filters Update Update Change properties of the directory or some file attributes only stored in the directory Change properties of the directory or some file attributes only stored in the directory

30 Directory Structure Could have a simple, single directory Could have a simple, single directory Many files make it unwieldy for users Many files make it unwieldy for users Hierarchical approach is widely used Hierarchical approach is widely used Master directory with a number of files and other directories contained within Master directory with a number of files and other directories contained within Recursive substructure allows virtually unlimited (in modern systems) number of levels Recursive substructure allows virtually unlimited (in modern systems) number of levels Usually uses a hashed structure to store entries Usually uses a hashed structure to store entries

31

32 Directory Structure (cont) Naming Naming Directory trees prevent the need for unique file or directory names on different levels Directory trees prevent the need for unique file or directory names on different levels Pathname (in UNIX) specifies the “level” from the top (root or master directory) Pathname (in UNIX) specifies the “level” from the top (root or master directory) /User_B/Draw/ABC /User_B/Draw/ABC Too complicated to specify full path every time, so we have concept of working directory, both for applications and users: Too complicated to specify full path every time, so we have concept of working directory, both for applications and users: If in User_B directory: access./Draw/ABC If in User_B directory: access./Draw/ABC

33

34 Access Rights Individuals or groups of users are granted certain rights to files or directories, in the following hierarchy: Individuals or groups of users are granted certain rights to files or directories, in the following hierarchy: None None Can’t even know about existence of file or directory Can’t even know about existence of file or directory Knowledge Knowledge User can determine that file exists and its owner User can determine that file exists and its owner Execution Execution User can load & execute program but cannot copy User can load & execute program but cannot copy Read Read User can read file for any purpose User can read file for any purpose Append Append User can add data to the file but cannot modify or delete User can add data to the file but cannot modify or delete Update Update User can modify, delete, and add to the file’s data (possibly graded) User can modify, delete, and add to the file’s data (possibly graded) Change protection Change protection User can change the access rights granted to other users User can change the access rights granted to other users Deletion Deletion User can delete the file from the file system and do anything else. User can delete the file from the file system and do anything else.

35

36 Simultaneous Access When access is granted to append or update a file to more than one user, the OS or file management system must enforce discipline. A brute-force approach is to allow a user to lock the entire file when it is to be updated. A finer grain of control is to lock individual records during update. This is the readers/writers problem, and the classic issues of mutual exclusion and deadlock must be addressed.

37 Record Blocking Blocks are the unit of I/O for secondary storage Blocks are the unit of I/O for secondary storage Records are logical unit of access, and must be organized in blocks to perform I/O Records are logical unit of access, and must be organized in blocks to perform I/O Three methods: Three methods: Fixed blocking Fixed blocking Fixed-length records are used, with integral number of records stored in a block. Internal fragmentation Fixed-length records are used, with integral number of records stored in a block. Internal fragmentation Variable-length spanned blocking Variable-length spanned blocking Variable-length records are used, packed into blocks with no unused space. Pointers used to span blocks Variable-length records are used, packed into blocks with no unused space. Pointers used to span blocks Variable-length unspanned blocking Variable-length unspanned blocking Same as above without spanning, with wasted space in most blocks, because of inability to use remainders Same as above without spanning, with wasted space in most blocks, because of inability to use remainders

38

39 Record Blocking (cont) Fixed blocking common for sequential files with fixed-length records Fixed blocking common for sequential files with fixed-length records Variable-length spanned blocking is efficient of storage and does not limit record size, but more complicated to implement and sometimes inefficient. Files are more difficult to update Variable-length spanned blocking is efficient of storage and does not limit record size, but more complicated to implement and sometimes inefficient. Files are more difficult to update Variable-length unspanned blocking results in wasted space and limits record size to the size of the block Variable-length unspanned blocking results in wasted space and limits record size to the size of the block Record-blocking technique may interact with VM. Page may be implemented as integral number of blocks, or vice versa Record-blocking technique may interact with VM. Page may be implemented as integral number of blocks, or vice versa

40 File Allocation Preallocation vs Dynamic Allocation Preallocation vs Dynamic Allocation Preallocation Preallocation Max file size is declared at time of creation Max file size is declared at time of creation Almost impossible to estimate reliably for most applications Almost impossible to estimate reliably for most applications Potentially very wasteful Potentially very wasteful Dynamic: Dynamic: Allocate space to a file in portions as necessary Allocate space to a file in portions as necessary Sound familiar? Sound familiar?

41 File Allocation (cont) Portion Size Portion Size Choosing a size is a tradeoff. Consider: Choosing a size is a tradeoff. Consider: Contiguity of space increases performance, especially for Retrieve_Next Contiguity of space increases performance, especially for Retrieve_Next Having a large number of small portions increases the size of tables needed to manage the allocation info Having a large number of small portions increases the size of tables needed to manage the allocation info Having fixed-size portions (blocks) simplifies the reallocation of space Having fixed-size portions (blocks) simplifies the reallocation of space Having variable-size or small fixed-size portions minimizes waste of unused storage due to overallocation Having variable-size or small fixed-size portions minimizes waste of unused storage due to overallocation Leads to 2 alternatives: Leads to 2 alternatives: Variable, large contiguous portions Variable, large contiguous portions Better performance, but space hard to reuse Better performance, but space hard to reuse Blocks Blocks Provide greater flexibility, but may require complex FA structures Provide greater flexibility, but may require complex FA structures

42

43 File Allocation (cont) Methods Methods Contiguous allocation – preallocation Contiguous allocation – preallocation File Allocation Table (FAT) needs one entry per file, showing start block and length File Allocation Table (FAT) needs one entry per file, showing start block and length External fragmentation occurs fairly quickly External fragmentation occurs fairly quickly Defragmentation is required to maintain performance Defragmentation is required to maintain performance Chained allocation Chained allocation On individual block basis On individual block basis Each block contains a pointer to next block Each block contains a pointer to next block Any free block can be added to a chain Any free block can be added to a chain No external fragmentation No external fragmentation Unfortunately, cannot capitalize on principle of locality Unfortunately, cannot capitalize on principle of locality

44

45 File Allocation (cont) Indexed allocation Indexed allocation FAT contains a separate one-level index per file FAT contains a separate one-level index per file File index kept in its own block File index kept in its own block Allocation can be in either fixed-size blocks or variable-size portions Allocation can be in either fixed-size blocks or variable-size portions By blocks eliminates external fragmentation By blocks eliminates external fragmentation By portions improves locality By portions improves locality File consolidation on a regular basis will improve performance File consolidation on a regular basis will improve performance Supports both sequential and direct access Supports both sequential and direct access

46

47

48 File Allocation (cont) Free Space Management – In addition to FAT we need disk allocation table (DAT) to manage free space Free Space Management – In addition to FAT we need disk allocation table (DAT) to manage free space Bit Tables Bit Tables A vector containing one bit for each block on the disk A vector containing one bit for each block on the disk Can be very fast in main memory, tradeoff is space Can be very fast in main memory, tradeoff is space Chained Free Portions Chained Free Portions Free portions are chained together by using a pointer and length value in each free portion Free portions are chained together by using a pointer and length value in each free portion Lends itself to high amounts of fragmentation, and even deletion of highly fragmented files becomes a chore Lends itself to high amounts of fragmentation, and even deletion of highly fragmented files becomes a chore Indexing (only for variable-size portions) Indexing (only for variable-size portions) Treats free space like a file and uses an index table. Treats free space like a file and uses an index table. One entry for every free portion, quite efficient One entry for every free portion, quite efficient Free Block List Free Block List Each block assigned a number sequentially and list of the numbers of all free blocks is maintained in a reserved portion of the storage. Each block assigned a number sequentially and list of the numbers of all free blocks is maintained in a reserved portion of the storage. Efficiency can be achieved by maintaining a small portion of the list in memory at any given time Efficiency can be achieved by maintaining a small portion of the list in memory at any given time

49 Reliability Consider this scenario: Consider this scenario: User A requests a file allocation to add to an existing file User A requests a file allocation to add to an existing file The request is granted and the disk and file allocation tables are updated in main memory but not yet on disk The request is granted and the disk and file allocation tables are updated in main memory but not yet on disk The system crashes and subsequently restarts The system crashes and subsequently restarts User B requests a file allocation and is allocated space on disk that overlaps the last allocation to user A User B requests a file allocation and is allocated space on disk that overlaps the last allocation to user A User A accesses the overlapped portion via a reference that is stored inside A’s file User A accesses the overlapped portion via a reference that is stored inside A’s file

50 Reliability (cont) Solution: Solution: Lock the disk allocation table on disk, preventing another user from altering the table until the current allocation is completed Lock the disk allocation table on disk, preventing another user from altering the table until the current allocation is completed Search the DAT (in memory) for available space Search the DAT (in memory) for available space Allocate space, update DAT, and update disk (write DAT back to disk, and possibly update pointers for chained allocation). Allocate space, update DAT, and update disk (write DAT back to disk, and possibly update pointers for chained allocation). Update the FAT on disk Update the FAT on disk Unlock the DAT Unlock the DAT


Download ppt "File Management Marc’s first try, Please don’t sue me."

Similar presentations


Ads by Google