Presentation is loading. Please wait.

Presentation is loading. Please wait.

FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012.

Similar presentations


Presentation on theme: "FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012."— Presentation transcript:

1 FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012

2 A Lightning-Fast Index Drives Massive Data Analysis http://www.scidacreview.org/0904/html/fastbit.html FastBit significantly improves the speed of a searching operation on both high- and low-cardinality values with a number of techniques, including a vertical data organization, an innovative bitmap compression technique, and several new bitmap encoding methods... The ability to index high-cardinality data is unique to FastBit and is not supported by other bitmap indexing methods.

3 Allele Data Variables Allele = f(Marker, Line, Experiment) Size: 10^9 10^4 10^4 10^1 Cardinality: 2 = = =

4 Bitmap Indexing

5 The FastBit Technologies 1. vertical data organization = 'vertical partitioning'. Only a few of the (hundreds of) variables in each partition. 2. bitmap compression: Word-Aligned Hybrid Compression 3. two-level bitmap encoding

6 Word-aligned Hybrid Compression run-length encoding 31-bit groups

7 Two-level Bitmap Encoding Approximate solution, then refine. Bin the values into groups, e.g. A to G, H to P, Q to Z. Encode the bin identifiers as bitmap. Encodings: equality, range, interval. – Interval has half the number of bitmap indexes. Multicomponent encoding: Bin the bins to reduce number of bitmap indexes. Multi-level encoding: hierarchy of bins, coarse to fine. Use interval encoding for coarse, equality for fine.

8 Indexing Bin Identifiers

9 Querying on more than one variable FastBit performs extremely well on multi- variable queries because the intersection between the search results on each variable is a simple AND operation over the resulting bitmaps.

10 Performance

11 Instructions http://crd-legacy.lbl.gov/~kewu/fastbit/doc/quickstart.html


Download ppt "FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012."

Similar presentations


Ads by Google