Download presentation
Presentation is loading. Please wait.
Published byClara Malone Modified over 6 years ago
1
Ch. 8 File Structures Sequential files. Text files. Indexed files.
Hashed files. The role of the operating system.
2
Taxonomy of file structures
Figure 8(a) Taxonomy of file structures
3
8.1 The Role of the Operating System
Operating systems need to manipulate files to perform designated tasks. Operating systems maintains a table called a file descriptor or file control block for each file being processed. In PASCAL, file descriptors can be created by assign() and reset().
4
Application software manipulates file in terms of logical records
Operating System manipulates file in terms of Physical records( or Blocks). On disk, a block is a sector. Operating systems maintains a table called a file descriptor or file control block for each file being processed.
5
The process of creating a file descriptor is known as Opening the file
The process of discarding a file descriptor is known as Closing the file Real example
6
Before an application program can access a file via the operating system, it must ask the operating system to open the file. The pseudocode statement: Open the file document txt as a Docfile for input purposes
7
Figure 8.1: The role of an operating system when accessing a file
8
8.2 Sequential Files When to use it? When all the records need to be proceeded, it makes no difference which records are proceeded first. If the storage device is a tape system, we normally follow the sequential order because of the sequential nature of the tape itself. What’s about a disk system??? EOF and sentinel. How to update a sequential file?
9
Sequential file is a file that is accessed in a sequential manner.
Sequential File Processing: While (the end of the file has not been reached) do ( retrieve the next record from the file and processed it )
10
Most Operating Systems maintain a list of the sectors on which the file is stored. This list is recorded as part of the disk’s directory system on the same disk as the file .
11
Figure 1.9: Memory cells arranged by address
找一幅动画
12
Read and write from disk in Sector
Figure 8.2: Maintaining a file’s order by means of a file allocation table Read and write from disk in Sector
13
Question: Sometimes when editing a file with an editor or word processor, the addition or deletion of a single character can cause the size of the file to change by several kilobytes. Why ?
14
Answer: Space for the file in mass storage (disk) is a1located in sectors or collections of sectors cal1ed clusters. Thus the size of a fi1e changes by the size of these units rather than by single characters.
15
do ( retrieve the next record from the file and processed it )
The end of a sequence file is referred as EOF, ( end of file ), usually, to place a special record ,called sentinel, at the end of the file,and the value of EOF should never occur as data in the application. Logical record While (not EOF) do ( retrieve the next record from the file and processed it )
16
Another pseudocode example
Logical record Retrieve the first record from the file; while (the retrieved record is not the sentinel) do (process the record and retrieve the next record from the file)
17
Figure 8(b) Sequential file Logical record
18
Figure 8.3: A procedure for merging two sequential files
Logical record
19
Figure 8.4: Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) (continued) A B C
20
Figure 8.4: Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) D E F
21
Text Files Text file – each logical record consists of a single encoded character, traditionally using ASCII, resulting in a one-character-per-byte. How to manipulate a text file? A word processor? How to use text files to define an input and an output files to a program?
22
Figure 8.5: The structure of a simple employee file implemented as a text file
23
Figure 8.6: The first two bars of Beethoven’s Fifth Symphony
Nontextual materials can be encoded as text files. Real example: Overture
24
Figure 8. 7: Converting data from two’s
Figure 8.7: Converting data from two’s complement notation into ASCII for storage in a text file (continued)
25
Figure 8. 7: Converting data from two’s
Figure 8.7: Converting data from two’s complement notation into ASCII for storage in a text file
26
Text and binary interpretations of a file
Figure 8© Text and binary interpretations of a file
27
Real life example: name, dorm
Indexing If you need to retrieve records in the file in an arbitrary order throughout the day, what is the main problem when you use a sequential file to store the records? What’s the fast way to find the subject you are interesting in from a book??? Ans. Using the index. Real life example: name, dorm
28
Indexed Fundamentals An index for a file consists of a listing of the key field values occurring in the file along with the location in mass storage of the corresponding record. Key field. 关键字段 An inverted file - primary key and secondary key. When records are inserted and deleted, all indexes must be updated.
29
Logical view of an indexed file
Figure 8(d) Logical view of an indexed file
30
Indexed Files A file’s index is normally stored as a separate file on the same mass storage device as the indexed file itself. It is usually transferred to main memory when the file is opened so that it is accessible when access to records in the indexed file is required.
31
Figure 8.8: Opening an indexed file
32
Indexed Files Index size - since the index must be moved to main memory to be searched, it must remain small enough to fit within a reasonable memory area. What if the index size is too large??? The partial-index structure. An index to the index.
33
Figure 8.10: A file with a partial index
Find the first entry in the index that is equal to or greater than the desired key and then searching the corresponding sequential segment of the target record.
34
Question: The following table represents the contents of a partial index.
Key Segment number 13C 23G 26X 36Z Indicate which segment should be retrieved when searching for the record 16N67.
35
8.4 Hashing Sequential files - process in a serial order.
Indexed files - direct access (random access) . Overhead: maintaining an index table. Hashed files - reduce the overhead by computing the location of a record in mass storage by applying an algorithm to the value of the key field in question.
36
Hashed Files A particular hashing technique:
1. Divide the mass storage area allotted to the file into several sections called buckets. 2. Convert any key field value into a numeric value. 3. Divide any key field value stored in memory by the number of buckets. 4. Convert any key field value into an integer that identifies the bucket in memory.
37
Q & A Using instructions of the form DR0S and ER0S as described at the end of Section 7.8, write a complete machine language routine to perform a pop operation in a stack implemented as shown in Figure Assume that the stack pointer is in register F and that the top of the stack is to be pushed is into register 5.
38
Answer: D50F 21FF 5FF1
39
Class Review
40
Taxonomy of file structures
Figure 8(a) Taxonomy of file structures
41
Question: Sometimes when editing a file with an editor or word processor, the addition or deletion of a single character can cause the size of the file to change by several kilobytes. Why ?
42
Answer: Space for the file in mass storage (disk) is a1located in sectors or collections of sectors cal1ed clusters. Thus the size of a fi1e changes by the size of these units rather than by single characters.
43
Logical view of an indexed file
Figure 8(d) Logical view of an indexed file
44
Figure 8.8: Opening an indexed file
45
Figure 8.10: A file with a partial index
Find the first entry in the index that is equal to or greater than the desired key and then searching the corresponding sequential segment of the target record.
46
8.4 Hashing Sequential files - process in a serial order.
Indexed files - direct access (random access) . Overhead: maintaining an index table. Hashed files - reduce the overhead by computing the location of a record in mass storage by applying an algorithm to the value of the key field in question.
47
Figure 8.11: The rudiments of a hashing system, in which each bucket holds those records that hash to that bucket number (continued)
48
Figure 8. 11: The rudiments of a hashing system, in
Figure 8.11: The rudiments of a hashing system, in which each bucket holds those records that hash to that bucket number
49
Figure 8.12: Hashing the key field value 25X3Z to one of 40 buckets
50
Hashed Files Collision - more than one record will hash to the same bucket. Assume insert records into 41 buckets randomly: the probability of placing the 1st record to an empty bucket is 41/41, the 2nd is 40/41, the 3rd is 39/41 and so on. The probability of placing 8 records into 8 empty buckets is (41/41)(40/41)(39/41)….(34/41) = Less than 50%!!!
51
Hashed Files The high probability of collisions indicates that a hashed file should never be implemented under the assumption that clustering will never occur. How to handle the overflow problem? Reserve an additional area of mass storage to hold overflow records. Double hashing method.
52
Figure 8. 14: A large file partitioned into buckets
Figure 8.14: A large file partitioned into buckets to be accessed by hashing
53
Figure 8.13: Handling bucket overflow
54
Figure 8(e) Modulo division
55
Figure 8(f) Collision
56
Open addressing resolution
Figure 8(g) Open addressing resolution
57
Linked list resolution
Figure 8(h) Linked list resolution
58
Bucket hashing resolution
Figure 8(I) Bucket hashing resolution
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.