Presentation is loading. Please wait.

Presentation is loading. Please wait.

Compression of Inverted Indexes for Fast Query Evaluation Falk Scholer Hugh Williams John Yiannis Justin Zobel (RMIT University, Melbourne, Australia)

Similar presentations


Presentation on theme: "Compression of Inverted Indexes for Fast Query Evaluation Falk Scholer Hugh Williams John Yiannis Justin Zobel (RMIT University, Melbourne, Australia)"— Presentation transcript:

1 Compression of Inverted Indexes for Fast Query Evaluation Falk Scholer Hugh Williams John Yiannis Justin Zobel (RMIT University, Melbourne, Australia) URL: http://doi.acm.org/10.1145/564376.564416http://doi.acm.org/10.1145/564376.564416 Published 2002

2 To conserve storage space and improve query performance, an inverted index can be compressed. An uncompressed inverted index typically consumes over 30% of the space required to store the uncompressed collection. A compressed index can consume between 10% and 15% of the uncompressed collection.

3 Bitwise and bytewise compression schemes were considered. Of the bitwise compression algorithms, three were considered: Elias gamma coding, Elias delta coding, and Golomb-Rice coding.

4 Gamma coding is relatively inefficient for storing integers larger than 15. Delta coding is more suited to larger integers. Golomb-Rice coding offers generally more compact storage and faster retrieval of integers than the Elias codes.

5 These three coding schemes can be combined: Golomb codes for document numbers Gamma codes for frequencies Delta codes for offsets

6 Bytewise coding schemes involve compressing integers to an integral number of bytes. Bytewise enhancements to the coding schemes that were tested included byte boundary alignment of integers, and the use of a signature block to indicate the number of byte comprising an integer. The use of signature blocks was shown to reduce performance.

7 It was concluded experimentally that a variable-byte bytewise compression scheme resulted in better overall performance than more compact bitwise schemes. Query evaluation was twice as fast.

8 Incremental Updates of Inverted Lists for Text Document Retrieval Anthony Tomasic, Stanford University Hector Garcia-Molina, Stanford University Kurt Shoens, IBM Almaden URL: http://doi.acm.org/10.1145/191839.191896http://doi.acm.org/10.1145/191839.191896 Published 1994

9 The Internet presents us with large, rapidly growing repositories of information. Efficient methods of indexing and of updating these indexes are necessary. Article presents properties of and recommendations for variations of a certain dynamic indexing scheme.

10 Algorithm presented is as follows: Two data structures are present: and inverted index in memory, and an inverted index on disk. The in-memory indexes are called short lists, and for each, a fixed amount of space is allocated to it called a bucket.

11 The disk indexes are called long lists, and each term has an undetermined amount of space.

12 Algorithm: An in-memory list L for word w must be moved to disk. First, if w already has a long list on disk, L is appended to the long list. Otherwise, we assume L is a short list and insert it into bucket h(w). If the bucket is not already in memory, it is read in, and L inserted. If the bucket overflows, we then pick a longest short list in the bucket, remove it, and make it a long list, writing it to disk.

13 When building indexes, there is a tradeoff between update performance and query performance.

14 Two index-building styles described and tested: new and whole The new style of building an index is best if query performance is not critical. As short lists fill up, they are written to disk to available free blocks. For common words, several long lists may exist. No effort is made to consolidate these lists on disk.

15 The whole style appends long lists of the same words together. Every time a list is written to disk, the entire index is copied to a different location if necessary. This style is better for applications where query performance is critical.


Download ppt "Compression of Inverted Indexes for Fast Query Evaluation Falk Scholer Hugh Williams John Yiannis Justin Zobel (RMIT University, Melbourne, Australia)"

Similar presentations


Ads by Google