Presentation on theme: "FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012."— Presentation transcript:
FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012
A Lightning-Fast Index Drives Massive Data Analysis http://www.scidacreview.org/0904/html/fastbit.html FastBit significantly improves the speed of a searching operation on both high- and low-cardinality values with a number of techniques, including a vertical data organization, an innovative bitmap compression technique, and several new bitmap encoding methods... The ability to index high-cardinality data is unique to FastBit and is not supported by other bitmap indexing methods.
The FastBit Technologies 1. vertical data organization = 'vertical partitioning'. Only a few of the (hundreds of) variables in each partition. 2. bitmap compression: Word-Aligned Hybrid Compression 3. two-level bitmap encoding
Word-aligned Hybrid Compression run-length encoding 31-bit groups
Two-level Bitmap Encoding Approximate solution, then refine. Bin the values into groups, e.g. A to G, H to P, Q to Z. Encode the bin identifiers as bitmap. Encodings: equality, range, interval. – Interval has half the number of bitmap indexes. Multicomponent encoding: Bin the bins to reduce number of bitmap indexes. Multi-level encoding: hierarchy of bins, coarse to fine. Use interval encoding for coarse, equality for fine.
Querying on more than one variable FastBit performs extremely well on multi- variable queries because the intersection between the search results on each variable is a simple AND operation over the resulting bitmaps.