Ch. 8 File Structures Sequential files. Text files. Indexed files.

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
Chapter 4 : File Systems What is a file system?
Advance Database System
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Chapter 1 Data Storage. 2 Chapter 1: Data Storage 1.1 Bits and Their Storage 1.2 Main Memory 1.3 Mass Storage 1.4 Representing Information as Bit Patterns.
Hashing General idea: Get a large array
2 Systems Architecture, Fifth Edition Chapter Goals Describe numbering systems and their use in data representation Compare and contrast various data.
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1 Lecture 7: Data structures for databases I Jose M. Peña
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
Chapter 13 File Structures. Understand the file access methods. Describe the characteristics of a sequential file. After reading this chapter, the reader.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
Computers Data Representation Chapter 3, SA. Data Representation and Processing Data and information processors must be able to: Recognize external data.
Cis303a_chapt03-2a.ppt Range Overflow Fixed length of bits to hold numeric data Can hold a maximum positive number (unsigned) X X X X X X X X X X X X X.
File Structures Foundations of Computer Science  Cengage Learning.
Indexed and Relative File Processing
Comp 335 File Structures Hashing.
©Brooks/Cole, 2003 Chapter 13 File Structures. ©Brooks/Cole, 2003 Understand the file access methods. Describe the characteristics of a sequential file.
Chapter 1 Data Storage © 2007 Pearson Addison-Wesley. All rights reserved.
File Storage Organization The majority of space on a device is reserved for the storage of files. When files are created and modified physical blocks are.
13. File Structures. ACCESSMETHODSACCESSMETHODS 13.1.
Chapter 9 Database Systems Introduction to CS 1 st Semester, 2014 Sanghyun Park.
Chapter 9 Database Systems © 2007 Pearson Addison-Wesley. All rights reserved.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Hashed Files Text Versus Binary Meghan Cavanagh. Hashed Files a file that is searched using one of the hashing methods User gives the key, the function.
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Chapter 5 Record Storage and Primary File Organizations
DATA MANAGEMENT 1) File StructureFile Structure 2) Physical OrganisationPhysical Organisation 3) Logical OrganisationLogical Organisation 4) File OrganisationFile.
© 2006 Pearson Addison-Wesley. All rights reserved15 A-1 Chapter 15 External Methods.
Chapter 5 Ranking with Indexes. Indexes and Ranking n Indexes are designed to support search  Faster response time, supports updates n Text search engines.
Course Developer/Writer: A. J. Ikuomola
CHP - 9 File Structures.
CS522 Advanced database Systems
Indexing and hashing.
Data Structures Using C++ 2E
Data Structure Interview Question and Answers
CS522 Advanced database Systems
Chapter 9 Database Systems
Introduction to Computer Systems
File System Structure How do I organize a disk into a file system?
Data Structures Interview / VIVA Questions and Answers
Subject Name: File Structures
9/12/2018.
Data Structures Using C++ 2E
Hash functions Open addressing
Chapter 1 Data Storage.
Chapter 11: File System Implementation
Disk Storage, Basic File Structures, and Hashing
Hash Tables.
Chapter 10 Hashing.
CSCE Fall 2013 Prof. Jennifer L. Welch.
Indexing and Hashing Basic Concepts Ordered Indices
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Lecture 3: Main Memory.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Advance Database System
File Storage and Indexing
CS202 - Fundamental Structures of Computer Science II
Files Management – The interfacing
CSCE Fall 2012 Prof. Jennifer L. Welch.
Advance Database System
Presentation transcript:

Ch. 8 File Structures Sequential files. Text files. Indexed files. Hashed files. The role of the operating system.

Taxonomy of file structures Figure 8(a) Taxonomy of file structures

8.1 The Role of the Operating System Operating systems need to manipulate files to perform designated tasks. Operating systems maintains a table called a file descriptor or file control block for each file being processed. In PASCAL, file descriptors can be created by assign() and reset().

Application software manipulates file in terms of logical records Operating System manipulates file in terms of Physical records( or Blocks). On disk, a block is a sector. Operating systems maintains a table called a file descriptor or file control block for each file being processed.

The process of creating a file descriptor is known as Opening the file The process of discarding a file descriptor is known as Closing the file Real example

Before an application program can access a file via the operating system, it must ask the operating system to open the file. The pseudocode statement: Open the file document txt as a Docfile for input purposes

Figure 8.1: The role of an operating system when accessing a file

8.2 Sequential Files When to use it? When all the records need to be proceeded, it makes no difference which records are proceeded first. If the storage device is a tape system, we normally follow the sequential order because of the sequential nature of the tape itself. What’s about a disk system??? EOF and sentinel. How to update a sequential file?

Sequential file is a file that is accessed in a sequential manner. Sequential File Processing: While (the end of the file has not been reached) do ( retrieve the next record from the file and processed it )

Most Operating Systems maintain a list of the sectors on which the file is stored. This list is recorded as part of the disk’s directory system on the same disk as the file .

Figure 1.9: Memory cells arranged by address 找一幅动画

Read and write from disk in Sector Figure 8.2: Maintaining a file’s order by means of a file allocation table Read and write from disk in Sector

Question: Sometimes when editing a file with an editor or word processor, the addition or deletion of a single character can cause the size of the file to change by several kilobytes. Why ?

Answer: Space for the file in mass storage (disk) is a1located in sectors or collections of sectors cal1ed clusters. Thus the size of a fi1e changes by the size of these units rather than by single characters.

do ( retrieve the next record from the file and processed it ) The end of a sequence file is referred as EOF, ( end of file ), usually, to place a special record ,called sentinel, at the end of the file,and the value of EOF should never occur as data in the application. Logical record While (not EOF) do ( retrieve the next record from the file and processed it )

Another pseudocode example Logical record Retrieve the first record from the file; while (the retrieved record is not the sentinel) do (process the record and retrieve the next record from the file)

Figure 8(b) Sequential file Logical record

Figure 8.3: A procedure for merging two sequential files Logical record

Figure 8.4: Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) (continued) A B C

Figure 8.4: Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.) D E F

Text Files Text file – each logical record consists of a single encoded character, traditionally using ASCII, resulting in a one-character-per-byte. How to manipulate a text file? A word processor? How to use text files to define an input and an output files to a program?

Figure 8.5: The structure of a simple employee file implemented as a text file

Figure 8.6: The first two bars of Beethoven’s Fifth Symphony Nontextual materials can be encoded as text files. Real example: Overture

Figure 8. 7: Converting data from two’s Figure 8.7: Converting data from two’s complement notation into ASCII for storage in a text file (continued)

Figure 8. 7: Converting data from two’s Figure 8.7: Converting data from two’s complement notation into ASCII for storage in a text file

Text and binary interpretations of a file Figure 8© Text and binary interpretations of a file

Real life example: name, dorm 8.3 Indexing If you need to retrieve records in the file in an arbitrary order throughout the day, what is the main problem when you use a sequential file to store the records? What’s the fast way to find the subject you are interesting in from a book??? Ans. Using the index. Real life example: name, dorm

Indexed Fundamentals An index for a file consists of a listing of the key field values occurring in the file along with the location in mass storage of the corresponding record. Key field. 关键字段 An inverted file - primary key and secondary key. When records are inserted and deleted, all indexes must be updated.

Logical view of an indexed file Figure 8(d) Logical view of an indexed file

Indexed Files A file’s index is normally stored as a separate file on the same mass storage device as the indexed file itself. It is usually transferred to main memory when the file is opened so that it is accessible when access to records in the indexed file is required.

Figure 8.8: Opening an indexed file

Indexed Files Index size - since the index must be moved to main memory to be searched, it must remain small enough to fit within a reasonable memory area. What if the index size is too large??? The partial-index structure. An index to the index.

Figure 8.10: A file with a partial index Find the first entry in the index that is equal to or greater than the desired key and then searching the corresponding sequential segment of the target record.

Question: The following table represents the contents of a partial index. Key Segment number 13C08 1 23G19 2 26X28 3 36Z05 4 Indicate which segment should be retrieved when searching for the record 16N67.

8.4 Hashing Sequential files - process in a serial order. Indexed files - direct access (random access) . Overhead: maintaining an index table. Hashed files - reduce the overhead by computing the location of a record in mass storage by applying an algorithm to the value of the key field in question.

Hashed Files A particular hashing technique: 1. Divide the mass storage area allotted to the file into several sections called buckets. 2. Convert any key field value into a numeric value. 3. Divide any key field value stored in memory by the number of buckets. 4. Convert any key field value into an integer that identifies the bucket in memory.

Q & A Using instructions of the form DR0S and ER0S as described at the end of Section 7.8, write a complete machine language routine to perform a pop operation in a stack implemented as shown in Figure 7.12. Assume that the stack pointer is in register F and that the top of the stack is to be pushed is into register 5.

Answer: D50F 21FF 5FF1

Class Review

Taxonomy of file structures Figure 8(a) Taxonomy of file structures

Question: Sometimes when editing a file with an editor or word processor, the addition or deletion of a single character can cause the size of the file to change by several kilobytes. Why ?

Answer: Space for the file in mass storage (disk) is a1located in sectors or collections of sectors cal1ed clusters. Thus the size of a fi1e changes by the size of these units rather than by single characters.

Logical view of an indexed file Figure 8(d) Logical view of an indexed file

Figure 8.8: Opening an indexed file

Figure 8.10: A file with a partial index Find the first entry in the index that is equal to or greater than the desired key and then searching the corresponding sequential segment of the target record.

8.4 Hashing Sequential files - process in a serial order. Indexed files - direct access (random access) . Overhead: maintaining an index table. Hashed files - reduce the overhead by computing the location of a record in mass storage by applying an algorithm to the value of the key field in question.

Figure 8.11: The rudiments of a hashing system, in which each bucket holds those records that hash to that bucket number (continued)

Figure 8. 11: The rudiments of a hashing system, in Figure 8.11: The rudiments of a hashing system, in which each bucket holds those records that hash to that bucket number

Figure 8.12: Hashing the key field value 25X3Z to one of 40 buckets

Hashed Files Collision - more than one record will hash to the same bucket. Assume insert records into 41 buckets randomly: the probability of placing the 1st record to an empty bucket is 41/41, the 2nd is 40/41, the 3rd is 39/41 and so on. The probability of placing 8 records into 8 empty buckets is (41/41)(40/41)(39/41)….(34/41) = .482 Less than 50%!!!

Hashed Files The high probability of collisions indicates that a hashed file should never be implemented under the assumption that clustering will never occur. How to handle the overflow problem? Reserve an additional area of mass storage to hold overflow records. Double hashing method.

Figure 8. 14: A large file partitioned into buckets Figure 8.14: A large file partitioned into buckets to be accessed by hashing

Figure 8.13: Handling bucket overflow

Figure 8(e) Modulo division

Figure 8(f) Collision

Open addressing resolution Figure 8(g) Open addressing resolution

Linked list resolution Figure 8(h) Linked list resolution

Bucket hashing resolution Figure 8(I) Bucket hashing resolution