Presentation is loading. Please wait.

Presentation is loading. Please wait.

Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing.

Similar presentations


Presentation on theme: "Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing."— Presentation transcript:

1 Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing

2 Kruse/Ryba ch09 What is an INDEX? An index lets you impose order on a file without actually rearranging the file. An index gives keyed access to fixed or variable- length record files.

3 Kruse/Ryba ch09 Simple Index A simple index uses a simple array to implement the index. Called by IBM ISAM (Indexed Sequential Access Method)

4 Kruse/Ryba ch09 ANG3795 167 COL31809 353 COL38358 211 DG139201 396 DG18807 256 FF245 442 LON2312 32 MER75016 300 RCA2626 77 WAR23699 132 LON|2312|Romeo and Juliet|... RCA|2626|Quartet in C Sharp... WAR|23699|Topuchstone|... ANG|3795|Symphony No. 9|... COL|38358|Nebraska|... DG|18807|Symphony No. 9|... MER|75016|Coq d'or Suite|... COL|31809|Symphony No. 9|... DG|139201|Violin Concerto|... FF|245|Good News|... Indexfile Key Reference Datafile Actual data record 32 77 132 167 211 256 300 353 396 442

5 Kruse/Ryba ch09 Concerns Two files to deal with Index file easier to deal with than data file because it has fixed-length records Fixed-length fields impose limits on size of keys In the example, the index carries no information other than the keys and the reference fields. Other data could be included. (length)

6 Kruse/Ryba ch09 Basic Operations Create the original empty index and data files. Load the index file into memory before using it. Rewrite the index file from memory after using it. Add records to the data file and index. Delete records from the data file. Update records in the data file.

7 Kruse/Ryba ch09 Creating the Files Create both the index and data files as empty files. Write headers to both files.

8 Kruse/Ryba ch09 Loading the Index into Memory Assume that the index file is small enough to fit into RAM. Each array element is an index record.

9 Kruse/Ryba ch09 Safety Mechanisms Know when the index is out of date. Be able to reconstruct the index from the data file.

10 Kruse/Ryba ch09 Record Addition Adding a new record to the data file requires that we also add a record to the index file.

11 Kruse/Ryba ch09 ANG3795 167 COL31809 353 COL38358 211 DG139201 396 DG18807 256 FF245 442 LON2312 32 MER75016 300 RCA2626 77 WAR23699 132 LON|2312|Romeo and Juliet|... RCA|2626|Quartet in C Sharp... WAR|23699|Topuchstone|... ANG|3795|Symphony No. 9|... COL|38358|Nebraska|... DG|18807|Symphony No. 9|... MER|75016|Coq d'or Suite|... COL|31809|Symphony No. 9|... DG|139201|Violin Concerto|... FF|245|Good News|... Indexfile Key Reference Datafile Actual data record 32 77 132 167 211 256 300 353 396 442 486 LON|783|Sweet Somthings|... LON783 486 MER75016 300 RCA2626 77

12 Kruse/Ryba ch09 Record Deletion Any of the methods discussed in chapter 5 could be used. However, the index file must now be considered. The index entry could be removed and the array adjusted or the index entry could just be marked as deleted.

13 Kruse/Ryba ch09 Record Updating Updating changes the key field –conceptually, this is best thought of as a deletion followed by an addition Updating does not change a key field –this will not cause any changes in the index file but could well cause changes in the data file if the size of the record changes.

14 Kruse/Ryba ch09 Indexes too large to fit in RAM Essentially, the later text material deals with this problem. Hashed Organization Tree-structures

15 Kruse/Ryba ch09 Access by Multiple Keys BEETHOVEN ANG3795 BEETHOVEN DG139201 BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Secondary key organized by composer

16 Kruse/Ryba ch09 Record Addition Additional indices imply additional overhead when new records are added.

17 Kruse/Ryba ch09 Record Deletion This usually implies removing all references to that record in the file system. Since the primary index does reflect a deletion, a request from a secondary index will result in a failure, implying the record has been deleted. Such a method would result in wasted space in the secondary index.

18 Kruse/Ryba ch09 Record Updating If the update changes the secondary key –it may be necessary to rearrange the secondary key index so it stays in sorted order If the update changes the primary key –this creates a major impact on secondary indices If the update is confined to other fields. –Updates that do not affect either the primary or secondary key fields do not affect the secondary key index.

19 Kruse/Ryba ch09 Access by Multiple Keys COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG139201 Secondary key organized by recording title

20 Kruse/Ryba ch09 Access by Multiple Keys COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG139201 Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

21 Kruse/Ryba ch09 Access by Multiple Keys COQ D'OR SUITE MER75016 GOOD NEWS FF245 NEBRASKA COL38358 QUARTET IN C SHAR RCA2626 ROMEO AND JULIET LON2312 SYMPHONY NO. 9 ANG3795 SYMPHONY NO. 9 COL31809 SYMPHONY NO. 9 DG18807 TOUCHSTONE WAR23699 VIOLIN CONCERTO DG139201 Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

22 Kruse/Ryba ch09 Access by Multiple Keys BEETHOVEN ANG3795 BEETHOVEN DG139201 BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

23 Kruse/Ryba ch09 Access by Multiple Keys BEETHOVEN ANG3795 BEETHOVEN DG139201 BEETHOVEN DG18807 BEETHOVEN RCA2626 COREA EAR23699 DVORAK COL31809 PROKOFIEV LON2312 RIMSKY-KORSAKOV MER75016 SPRINGSTEEN COL38358 SWEET HONEY IN THE FF245 Find all data records with composer = BEETHOVEN and title = SYMPHONY NO. 9

24 Kruse/Ryba ch09 ANG3795 167 COL31809 353 COL38358 211 DG139201 396 DG18807 256 FF245 442 LON2312 32 MER75016 300 RCA2626 77 WAR23699 132 LON|2312|Romeo and Juliet|... RCA|2626|Quartet in C Sharp... WAR|23699|Topuchstone|... ANG|3795|Symphony No. 9|... COL|38358|Nebraska|... DG|18807|Symphony No. 9|... MER|75016|Coq d'or Suite|... COL|31809|Symphony No. 9|... DG|139201|Violin Concerto|... FF|245|Good News|... Indexfile Key Reference Datafile Actual data record 32 77 132 167 211 256 300 353 396 442 LOGICAL AND

25 Kruse/Ryba ch09 Problems We have to rearrange the index file every time a new record is added to the file, even if the new record is from an existing secondary key.

26 Kruse/Ryba ch09 A Better Solution: Linking the List of References Inverted lists work their way backward from a secondary key to the primary key to the record itself.

27 Kruse/Ryba ch09 BEETHOVEN COREA DVORAK PROKOFIEV ANG3795 DG139201 DG18807 RCA2626 WAR23699 COL31809 LON2312

28 Kruse/Ryba ch09 BEETHOVEN COREA DVORAK PROKOFIEV ANG3795 DG139201 DG18807 RCA2626 WAR23699 COL31809 LON2312 Might create a large number of small files, one for each composer.

29 Kruse/Ryba ch09 Improved Version Redefine the secondary key index so it consists of records with two fields - a secondary key field, and a field containing the relative record number of the first corresponding primary key reference in the inverted list. The actual primary key references associated with each secondary key would be stored in a separate entry-sequenced file.

30 Kruse/Ryba ch09 3 BEETHOVEN 2 COREA 7 DVORAK 10 PROKOFIEV 6 RIMSKY-KORSAKOV 4 SPRINGSTEEN 9 SWEET HONEY IN 01234560123456 LON2312 RCA2626 WAR23699 ANG2795 COL38358 DG18807 MER75016 COL31809 DG139201 FF245 ANG3193 8 5 0 0 1 2 3 4 5 6 7 8 9 10 Secondary Index File Lable ID List File

31 Kruse/Ryba ch0931 Hash Functions Truncation –Ignore part, use the rest for key Folding –Partition and combine Modular Arithmetic Perfect Hash Function

32 Kruse/Ryba ch0932 int hash(const Key &target) { int value = 0; for (int position = 0; position < 8; position++) value = 4 * value + target.key_letter(position); return value % hash_size; } C++ Example

33 Kruse/Ryba ch0933 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

34 Kruse/Ryba ch0934 Collision Resolution Linear Probing –Clustering Rehashing Increment Functions Quadratic Probing –h+i 2 Key-Dependent Increments –Increment = (int)the_data.key_letter(0); Random Probing

35 Kruse/Ryba ch0935 Error_code Hash_table::insert(const Record &new_entry) { Error_code result = success; int probe_count, // be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed Key null; // Null key for comparison purposes. null.make_blank(); probe = hash(new_entry); probe_count = 0; increment = 1; while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1) / 2) {// Has overflow occurred? probe_count++; probe = (probe + increment) % hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; else if(table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result; }

36 Kruse/Ryba ch0936 Collision Resolution with Buckets 0 1 2

37 Kruse/Ryba ch0937 Collision Resolution by Chaining

38 Kruse/Ryba ch0938 Collision Resolution by Chaining Advantages –Saving of space –Simple, efficient collision handling –Size of hash table does not need to exceed the number of records –Deletion becomes quick and easy Disadvantage –Links require space

39 Kruse/Ryba ch0939 Theoretical Comparison Load factor0.100.500.800.900.991.00 Successful search, expected number of probes: Chaining1.051.251.401.451.502.00 Open, random probes1.051.402.02.64.6----- Open, linear probes1.061.503.05.550.5-------

40 Kruse/Ryba ch0940 Theoretical Comparison Load factor0.100.500.800.900.992.00 Unsuccessful search, expected number of probes: Chaining0.100.500.800.900.992.00 Open, random probes1.12.005.010.0100----- Open, linear probes1.122.5013.50.5000-------

41 Kruse/Ryba ch0941 Empirical Comparison Load factor0.100.500.800.900.992.00 Successful search, expected number of probes: Chaining1.041.21.41.41.592.00 Open, quadratic probes1.041.502.12.75.2----- Open, linear probes1.051.603.4.6.221.3-------

42 Kruse/Ryba ch0942 Empirical Comparison Load factor0.100.500.800.900.992.00 Unsuccessful search, expected number of probes: Chaining0.100.500.800.900.992.00 Open, quadratic probes1.132.205.211.9126.----- Open, linear probes1.132.7015.4.59.8430.-------

43 Kruse/Ryba ch0943 Highlights

44 Kruse/Ryba ch0944 Chapter 9 - The End


Download ppt "Kruse/Ryba ch091 Object Oriented Data Structures Tables and Information Retrieval Rectangular Tables Tables of Various Shapes Radix Sort Hashing."

Similar presentations


Ads by Google