Presentation is loading. Please wait.

Presentation is loading. Please wait.

Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User.

Similar presentations


Presentation on theme: "Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User."— Presentation transcript:

1 Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger 1, Kesheng Wu 1, Rene Brun 2, Philippe Canal 3 (1) Berkeley Lab, Berkeley, USA (2) CERN, Geneva, Switzerland (3) Fermi Lab, Batavia, USA

2 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 2 Contents u Introduction to Bitmap Indices u Integration of Bitmap Indices into ROOT n Support for TTree::Draw and TChain::Draw n Example Usage u Example Usage u Experimental Results n Index Size n Performance of Bitmap Index vs. TTreeFormula u Conclusions

3 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 3 Bitmap Indices u Bitmap indices are efficient data structures for accelerating multi-dimensional queries: E.g. pT > 195 AND nTracks 12.4 u Supported by most commercial database management systems and data warehouses u Optimized for read-only data

4 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 4 Equality Encoding vs. Range Encoding a) list of attributes b) equality encoding c) range encoding with cardinality 10 Range encoding optimized for one-sided range queries, e.g. a0 <= 3

5 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 5 Bitmap Indices with Binning u Simple bitmap indices work well for low-cardinality attributes, i.e. number of distinct values per attribute is low ( < 10,000) u For high-cardinality attributes, the size of the bitmap index is often too large to be of practical usage (also with good compression algorithms) u Solution: n Keep bitmap for attribute range rather than for each distinct attribute value (binning) n Requires additional step for evaluating candidates in bin (“Candidate Check”) – see example on the next slide

6 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 6 Range Query on Bitmap Index with Binning “Candidate check” is performed on bitmap 4 to identify attribute values where x < 63 bitmap 3 XOR bitmap 4

7 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 7 Implementation Details u FastBit: n Bitmap Index software developed at Berkeley Lab n Includes very efficient bitmap compression algorithm u Integrated bitmap indices to support: n TTree::Draw n TTree::Chain u Each attribute to be indexed is stored as a separate branch u Index is currently stored as binary file

8 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 8 Example - Build Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); char indexLocation[1024] = “/data/index/"; bitmapIndex.ReadRootWriteIndexFile(tree, indexLocation); // build index for two attributes bitmapIndex.BuildIndex(tree, "a1", indexLocation); bitmapIndex.BuildIndex(tree, "a2", indexLocation);

9 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 9 Example - Tree::Draw with Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); bitmapIndex.Draw(tree, "a1:a2", "a1 700");

10 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 10 Performance Measurements u Compare performance of TTreeFormula with TBitmapIndex::EvaluateQuery u Do not include time for drawing histograms u Run multi-dimensional queries (cuts with multiple predicates)

11 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 11 Experimental Setup u Software/Hardware: n Bitmap Index Software is implemented in C++ n Tests carried out on: s Linux CentOS s 2.8 GHz Intel Pentium IV with 1 GB RAM s Hardware RAID with SCSI disk u Data: n 7.6 million records with ~100 attributes each n Babar data set: u Bitmap Indices: n 10 out of ~100 attributes n 1000 equality-encoded bins n 100 range-encoded bins n Bitmap Index Compression algorithm: WAH (Word-Aligned Hybrid)

12 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 12 Size of Compressed Bitmap Indices EE-BMI: equality-encoded bitmap index RE-BMI: range-encoded bitmap index

13 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 13 Query Performance - TTreeFormula vs. Bitmap Indices Performance improvement of bitmap indices over TTreeFormula up to a factor of 10.

14 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 14 Query Performance - TTreeFormula vs. Bitmap Indices

15 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 15 Performance improvement of bitmap indices over TTreeFormula up to a factor of 10. Query Performance - TTreeFormula vs. Bitmap Indices

16 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 16 Approximate Answers u For bitmap indices with binning the exact answers are yielded during the Candidate Check Phase n Read certain records from disk to check if they fulfill the query constraint u Approximate answers are returned if the Candidate Check is omitted u The error of the approximate depends on the number of bins: n Note: the query result includes more events n However, no correct events are dropped u We used two different binning strategies: n Equality Encoding with 1000 bins: error rate 0.1% n Range Encoding with 100 bins: error rate 1%

17 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 17 Query Performance - Approximate Answers (Error 0.1- 1%) Performance improvement of bitmap indices over TTreeFormula up to a factor of 30.

18 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 18 Query Performance - Approximate Answers (Error 0.1- 1%)

19 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 19 Performance improvement of bitmap indices over TTreeFormula up to a factor of 30. Query Performance - Approximate Answers (Error 0.1- 1%)

20 Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 20 Conclusions u We integrated bitmap indices into ROOT to support: n TTree::Draw n TChain::Draw u Bitmap indices significantly improve the performance of end- user analysis by up to a factor of 10. u With approximate answers of 0.1-1% error the performance improvement is up to a factor of 30. u Bitmap indices are also used successfully in STAR experiment at Brookhaven to access ROOT-files with GridCollector. u Future work: n Store bitmap indices as ROOT-tree. n Integrate with PROOF to support parallel index evaluation.


Download ppt "Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User."

Similar presentations


Ads by Google