Presentation on theme: "Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key."— Presentation transcript:
Comp 335 File Structures Indexes
The Search for Information When searching for information, the information desired is usually associated with a key field. For example, given a file of 5000 student records, we might want to know all students who are majoring in a particular subject such as computer science; or all students who are seniors, have above a 3.0 GPA and are majoring in either English or History.
The Search for Information One way to find this information would be to search the file, record by record to get our information. This method would be extremely slow! To speed things up, we could sort the file according to the field of information we desire (such as major); now we could use a different search technique such as an ordered sequential search or binary search to find our information. This still has major problems.
The Search for Information If records were added and deleted from the file constantly, we would have to reorder the file each time an add or delete occurred. INDEXES provide a way to overcome the problem of inefficient searches for information and the need for constantly reordering the information source.
Indexes The principle of index files associated with a data file is identical to the index found in the back of a book or a card catalog in a library. Here are the fundamental principles as illustrated with a card catalog: The data file is the complete library of books. A massive amount of information. Most normal people will not search shelf by shelf, book by book to find their information. They instead will go to the card catalog, a smaller information source that keeps information for all the books ordered by some key information field, such as author or title. When we find the specific book in the card catalog, the card will contain a location to where the book is found in the library. We then seek to the location and retrieve the book.
Indexes In this situation, the card catalog is the index. A smaller information source which hopefully can be loaded into memory and searched fast because it is in a certain order. When what we want is found, the card contains the “call number” which in a sense is the address where the book will be found in the library. Looking at this scenario, we observe that to find our information it may take only TWO SEEKS to get to our information.
Primary Usages of Indexes Allows you to impose order on a file without having to rearrange the file. Gives you multiple access paths into a file Allows keyed (direct) access to variable length record files.
Primary Index File Structure Fundamentals The information source (i.e. – the data file) does not need to be ordered. Information (records) can be entry – sequenced. This means they can be placed wherever there is available space. Since the file will not be reorganized, the records remain in fixed locations, we call this record pinning. A primary index is constructed which consists of entries which are typically fixed length records. Each record contains 1) a primary key and 2) a record reference.
Primary Index File Structure Fundamentals The primary key must be a field (or a combination of fields) in a record which will contain a value which is unique. The reference field will contain the address of the record which has this primary key. This will be either a RRN (for fixed length records) or an actual byte address (for variable length records). The entries are stored in a specific order to aid in fast efficient searches of the index.
Primary Index File Structure Fundamentals Basic Operations to Indexed, Entry- Sequenced File Adding a record – placed in available space in the datafile, new entry created in index and inserted in correct location in index. Deleting a record – avail list updated in datafile, entry removed in index and entries packed. Updating a record and change occurs to the primary key – if fixed length records then reorder the index, if variable length record and change increases the size of the record then delete/add the updated record, reorder the index and change the address.
Secondary Indexes Information many times will want to be gathered from other fields which may contain information which is not unique. Generating a list of all computer science majors will call for a search of all records which have “computer science” in the major field. These fields are called secondary key fields.
Secondary Indexes These indexes have entries which consist of 1) a secondary key and 2) the primary key. The index will be ordered by the secondary key. To find information, you will search the secondary index for the secondary key(s), obtain the primary keys, locate the primary keys in the primary index to get the actual address of the records where the information resides.
Secondary Indexes Issues with secondary indexes: Redundant information Maintaining the secondary indexes Retrieving information by multiple keys Tight binding or loose binding of secondary keys
Review of Concepts Primary and secondary keys Tight vs loose binding of keys Inverted lists Entry-sequenced file Record pinning Canonical form Simple index