Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Computer Systems

Similar presentations


Presentation on theme: "Introduction to Computer Systems"— Presentation transcript:

1 Introduction to Computer Systems
Lecturer: Steve Maybank Department of Computer Science and Information Systems Spring 2017 Week 7b: File management, indexing and hashing 21 February 2017

2 Files and Records File: unit for the storage of information.
Logical record: natural division of the information in a file, i.e. natural given the meaning of the data in the file. 21 February 2017 Brookshear, Section 9.5

3 Example of a Logical Record
In many files each record contains a unique key. Format of the records: text (a sequence of characters) Name field Id number (key) Kimberly Ann Dawson 385172 Example of a record 21 February 2017 Brookshear, Section 9.5

4 Sequential Files Records are accessed in a fixed order.
On a sequential storage medium such as a CD, the records can be stored in the same order as the access order. Reading the file is efficient if the storage order is also the access order. 21 February 2017 Brookshear, Section 9.5

5 Examples of Sequential Files
Audio files Video files Text files, eg. programs or documents (in fact most files produced on a PC). 21 February 2017 Brookshear, Section 9.5

6 Segmentation of a Text File Into Records
fixed length blocks of 25 characters ……. file K i m b e r l y A n D a w s o 3 8 5 1 7 2 The records are read in turn. Sentinel: special record marking end of file (EOF). 21 February 2017 Brookshear, Section 9.5

7 Merging of Two Sequential Files
Input files Output file B D Stage 1 A A C E F B D A B Stage 2 A C E F 21 February 2017 Brookshear, Section 9.5

8 Procedure for Merging Two Sequential Files
def MergeFiles(inputFileA, inputFileB, OutputFile) : if (both input files at EOF) : Stop, with OutputFile empty if (inputFileA not at EOF) : Declare its first record to be its current record if (inputFileB not at EOF) while (neither input file at EOF) : Put the current record with the smaller key in OutputFile if (that current record is the last record in its corresponding input file) : Declare that input file to be at EOF) else : Declare the next record in that input file to be the file’s current record Starting with the current record in the input file that is not at EOF, copy the remaining records to OutputFile. 21 February 2017 Brookshear, Section 9.5

9 Random Accessing of Records
It is inefficient to access randomly chosen records from a sequential file. For efficient random access use an indexed file or a hash file 21 February 2017 Brookshear, Section 9.5

10 Indexed Files Each record is identified by a unique key.
The file has an index which contains the keys, and for each key the address of the corresponding record. Problem: the index has to be maintained 21 February 2017 Brookshear, Section 9.5

11 Example of an Indexed File
key address Record for Brown Brown P1 Johnson P2 Jones P3 Smith P4 Watson P5 Indexed file Index 21 February 2017 Brookshear, Section 9.5

12 Hash Files Each record is identified by a unique key.
The file has a function which takes a key as input and computes the address of the corresponding record. More efficient than a sequential file if records are accessed randomly. 21 February 2017 Brookshear, Section 9.5

13 Construction of a Hash File
Memory New record r4, key k4. r1 r3 Use k4 to compute an address in memory and store r4 at that address. r2 21 February 2017 Brookshear, Section 9.5

14 Example of a Hash File Key Memory address 1203 40 467 41 89 37
The hash function key |-> key%11+36 Key Memory address 1203 40 467 41 89 37 replaces the index 21 February 2017 Brookshear, Section 9.5

15 Terminology for Hash Files
Bucket: section of the storage area corresponding to a hash function value. Load factor: ratio of number of records in file to maximum number of records that can be stored. Load factors <=50% are recommended. Collision: the hash function takes the same value on two different keys. 21 February 2017 Brookshear, Section 9.5

16 Nature of a Hash Function
constant on strips. Disk: large set of keys • records in current file (4 records, 1 collision) 21 February 2017 Birkbeck College

17 General Result For “reasonable” numbers of buckets and records,
The probability of at least one collision is high. The probability that any bucket will be the site of many collisions is low. 21 February 2017 Brookshear, Section 9.5

18 Examples of Hash Functions
Mid square: compute (key * key) and set bucket number = middle digits. Extraction: select digits from certain positions within the key. Divide key by number of buckets and use the remainder. 21 February 2017 Birkbeck College

19 Hash Function Requirements
Easily and quickly computed. Values evenly spread over the bucket numbers with few collisions 21 February 2017 Brookshear, Section 9.5

20 Pros and Cons of Hash Functions
Pro: no need to maintain an index. Cons: Requires a method of dealing with collisions. Inefficient use of storage (load factor <=50% recommended). 21 February 2017 Brookshear, Section 9.5

21 Place Two Records in Three Buckets
R1, R2 R1 R2 All nine configurations have the same probability The sum of the different probabilities is 1 R1  R2 R2 R1 R1, R2 R1  R2 R2  R1 R2  R1  R1,R2 21 February 2017 Birkbeck College Number of buckets: 3 Number of records: 2 Number of ways of storing 2 records in 3 buckets = 3x3 Number of ways of storing 2 records in 3 buckets without collisions = 6 Probability of no collision = 6/3x3=2/3

22 Example: Probability of a Collision
Number of buckets: 3 Number of records: 2 Number of ways of storing 2 records in 3 buckets = 3x3 Number of ways of storing 2 records in 3 buckets without collisions = 6 Probability of no collision = 6/3x3=2/3 21 February 2017 Birkbeck College

23 Birthday Problem A room contains n people. What is the least value of n such that the probability that two people have the same birthday is greater than ½? A hash file has n records. There are 365 possible values for the hash function. What is the least value of n such that the probability that two records have the same hash value is greater than ½? 21 February 2017 Birkbeck College

24 2003 Examination, Qu. 15 Suppose that a hashed file is constructed using the ‘divide’ hash function with 5 storage buckets. For each of the following key field values, 10, 23 and 41, identify the bucket in which the record with that key value is placed. 21 February 2017 Birkbeck College


Download ppt "Introduction to Computer Systems"

Similar presentations


Ads by Google