Ch. 8 File Structures Sequential files. Text files. Indexed files. Hashed files. The role of the operating system.
Taxonomy of file structures Figure 8(a) Taxonomy of file structures
8.1 The Role of the Operating System Operating systems need to manipulate files to perform designated tasks. Operating systems maintains a table called a file descriptor or file control block for each file being processed. In PASCAL, file descriptors can be created by assign() and reset().
Application software manipulates file in terms of logical records Operating System manipulates file in terms of Physical records( or Blocks). On disk, a block is a sector. Operating systems maintains a table called a file descriptor or file control block for each file being processed.
The process of creating a file descriptor is known as Opening the file The process of discarding a file descriptor is known as Closing the file Real example
Before an application program can access a file via the operating system, it must ask the operating system to open the file. The pseudocode statement: Open the file document txt as a Docfile for input purposes
Figure 8.1: The role of an operating system when accessing a file
8.2 Sequential Files When to use it? When all the records need to be proceeded, it makes no difference which records are proceeded first. If the storage device is a tape system, we normally follow the sequential order because of the sequential nature of the tape itself. What’s about a disk system??? EOF and sentinel. How to update a sequential file?
Sequential file is a file that is accessed in a sequential manner. Sequential File Processing: While (the end of the file has not been reached) do ( retrieve the next record from the file and processed it )
Most Operating Systems maintain a list of the sectors on which the file is stored. This list is recorded as part of the disk’s directory system on the same disk as the file .
Figure 1.9: Memory cells arranged by address 找一幅动画
Read and write from disk in Sector Figure 8.2: Maintaining a file’s order by means of a file allocation table Read and write from disk in Sector
Question: Sometimes when editing a file with an editor or word processor, the addition or deletion of a single character can cause the size of the file to change by several kilobytes. Why ?
Answer: Space for the file in mass storage (disk) is a1located in sectors or collections of sectors cal1ed clusters. Thus the size of a fi1e changes by the size of these units rather than by single characters.
do ( retrieve the next record from the file and processed it ) The end of a sequence file is referred as EOF, ( end of file ), usually, to place a special record ,called sentinel, at the end of the file,and the value of EOF should never occur as data in the application. Logical record While (not EOF) do ( retrieve the next record from the file and processed it )
Another pseudocode example Logical record Retrieve the first record from the file; while (the retrieved record is not the sentinel) do (process the record and retrieve the next record from the file)
Figure 8(b) Sequential file Logical record
Figure 8.3: A procedure for merging two sequential files Logical record
Figure 8.4: Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) (continued) A B C
Figure 8.4: Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) D E F
Text Files Text file – each logical record consists of a single encoded character, traditionally using ASCII, resulting in a one-character-per-byte. How to manipulate a text file? A word processor? How to use text files to define an input and an output files to a program?
Figure 8.5: The structure of a simple employee file implemented as a text file
Figure 8.6: The first two bars of Beethoven’s Fifth Symphony Nontextual materials can be encoded as text files. Real example: Overture
Figure 8. 7: Converting data from two’s Figure 8.7: Converting data from two’s complement notation into ASCII for storage in a text file (continued)
Figure 8. 7: Converting data from two’s Figure 8.7: Converting data from two’s complement notation into ASCII for storage in a text file
Text and binary interpretations of a file Figure 8© Text and binary interpretations of a file
Real life example: name, dorm 8.3 Indexing If you need to retrieve records in the file in an arbitrary order throughout the day, what is the main problem when you use a sequential file to store the records? What’s the fast way to find the subject you are interesting in from a book??? Ans. Using the index. Real life example: name, dorm
Indexed Fundamentals An index for a file consists of a listing of the key field values occurring in the file along with the location in mass storage of the corresponding record. Key field. 关键字段 An inverted file - primary key and secondary key. When records are inserted and deleted, all indexes must be updated.
Logical view of an indexed file Figure 8(d) Logical view of an indexed file
Indexed Files A file’s index is normally stored as a separate file on the same mass storage device as the indexed file itself. It is usually transferred to main memory when the file is opened so that it is accessible when access to records in the indexed file is required.
Figure 8.8: Opening an indexed file
Indexed Files Index size - since the index must be moved to main memory to be searched, it must remain small enough to fit within a reasonable memory area. What if the index size is too large??? The partial-index structure. An index to the index.
Figure 8.10: A file with a partial index Find the first entry in the index that is equal to or greater than the desired key and then searching the corresponding sequential segment of the target record.
Question: The following table represents the contents of a partial index. Key Segment number 13C08 1 23G19 2 26X28 3 36Z05 4 Indicate which segment should be retrieved when searching for the record 16N67.
8.4 Hashing Sequential files - process in a serial order. Indexed files - direct access (random access) . Overhead: maintaining an index table. Hashed files - reduce the overhead by computing the location of a record in mass storage by applying an algorithm to the value of the key field in question.
Hashed Files A particular hashing technique: 1. Divide the mass storage area allotted to the file into several sections called buckets. 2. Convert any key field value into a numeric value. 3. Divide any key field value stored in memory by the number of buckets. 4. Convert any key field value into an integer that identifies the bucket in memory.
Q & A Using instructions of the form DR0S and ER0S as described at the end of Section 7.8, write a complete machine language routine to perform a pop operation in a stack implemented as shown in Figure 7.12. Assume that the stack pointer is in register F and that the top of the stack is to be pushed is into register 5.
Answer: D50F 21FF 5FF1
Class Review
Taxonomy of file structures Figure 8(a) Taxonomy of file structures
Question: Sometimes when editing a file with an editor or word processor, the addition or deletion of a single character can cause the size of the file to change by several kilobytes. Why ?
Answer: Space for the file in mass storage (disk) is a1located in sectors or collections of sectors cal1ed clusters. Thus the size of a fi1e changes by the size of these units rather than by single characters.
Logical view of an indexed file Figure 8(d) Logical view of an indexed file
Figure 8.8: Opening an indexed file
Figure 8.10: A file with a partial index Find the first entry in the index that is equal to or greater than the desired key and then searching the corresponding sequential segment of the target record.
8.4 Hashing Sequential files - process in a serial order. Indexed files - direct access (random access) . Overhead: maintaining an index table. Hashed files - reduce the overhead by computing the location of a record in mass storage by applying an algorithm to the value of the key field in question.
Figure 8.11: The rudiments of a hashing system, in which each bucket holds those records that hash to that bucket number (continued)
Figure 8. 11: The rudiments of a hashing system, in Figure 8.11: The rudiments of a hashing system, in which each bucket holds those records that hash to that bucket number
Figure 8.12: Hashing the key field value 25X3Z to one of 40 buckets
Hashed Files Collision - more than one record will hash to the same bucket. Assume insert records into 41 buckets randomly: the probability of placing the 1st record to an empty bucket is 41/41, the 2nd is 40/41, the 3rd is 39/41 and so on. The probability of placing 8 records into 8 empty buckets is (41/41)(40/41)(39/41)….(34/41) = .482 Less than 50%!!!
Hashed Files The high probability of collisions indicates that a hashed file should never be implemented under the assumption that clustering will never occur. How to handle the overflow problem? Reserve an additional area of mass storage to hold overflow records. Double hashing method.
Figure 8. 14: A large file partitioned into buckets Figure 8.14: A large file partitioned into buckets to be accessed by hashing
Figure 8.13: Handling bucket overflow
Figure 8(e) Modulo division
Figure 8(f) Collision
Open addressing resolution Figure 8(g) Open addressing resolution
Linked list resolution Figure 8(h) Linked list resolution
Bucket hashing resolution Figure 8(I) Bucket hashing resolution