Compressing Relations And Indexes Jonathan Goldstein Raghu Ramakrishnan Uri Shaft Department of Compter Sciences, University of Wisconsin-Madison June 18, 1997
Agenda Introduction Compressing A Relation Compression Applied to Rectangle Base Indexes Performance Evaluation Questions and Remarks
Introduction Page level Compression Performance Study Application to B-trees and R-trees Multidimensional bulk loading algorithm
Introduction
Introduction
Compressing A relation Frames Of Reference Non numeric attributes File level compression
Frames of Reference
Point approximation in lossy compression
Compressing an indexing structure Compressing a B-tree Compressing a rectangle based indexing structure Compression oriented Bulk Loading
Rectangle Based indexing qualities
Changing the frame of reference
Bulk-Loading Algorithm Input. A set of points in some n-dimentional space. Output. A partition of the inut into subsets. Requirements. The partition shuold group points that are close to each other in the same group as much as possiblg
GB-Pack compression oriented bulk loading
GB-Pack compression oriented bulk loading Qualities: trading off some tree quality for increased compression. number of entries per page is data-dependent. cutting a dimension in a value boundary in the data.
GB-Pack compression oriented bulk loading
GB-Pack compression oriented bulk loading
GB-Pack compression oriented bulk loading
Performance Evaluation Relational Compression Experiments. CPU vs. I/O Costs. Comparison With Techniques in commercial systems. Importance of Tuple-Level Decompression. R-tree Compression Experiments.
Synthetic Data Sets Size: The number of tuples in the relation. Dimensionality: The number of attributes of the relations. Range: The range of values for the attributes. Distribution :uniform(worst case) / exponential. Partition Strategy. Page size.
Sales Data Set Sales data set. Compression Achieved versus dimensionality
CPU vs. I/O Costs
R-tree Compression Experiments Testing the quality of R-trees on Sales Data Set.
Questions And Remarks