Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User.

Slides:



Advertisements
Similar presentations
FastBit for Allele Data Dave Matthews USDA-ARS, Cornell University Ithaca, NY 10 April 2012.
Advertisements

Bitmap Index Design and Evaluation Ariel Noy Data representation and retrieval seminar By: Chee-Yong Chan Yannis E.Ioannidis.
1 Projection Indexes in HDF5 Rishi Rakesh Sinha The HDF Group.
Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
BTrees & Bitmap Indexes
Grid Collector: Enabling File-Transparent Object Access For Analysis Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani.
HDF5 FastQuery Accelerating Complex Queries on HDF Datasets using Fast Bitmap Indices John Shalf, Wes Bethel LBNL Visualization Group Kensheng Wu, Kurt.
ITIS 5160 Indexing. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
CS561-S2004 strategies for processing ad hoc queries 1 Strategies for Processing Ad Hoc Queries on Large Data Warehouses Presented by Fan Wu Instructor:
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
BASE: Using Abstraction to Improve Fault Tolerance Rodrigo Rodrigues, Miguel Castro, and Barbara Liskov MIT Laboratory for Computer Science and Microsoft.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
Searching Technology For a Large Number Of Objects Kurt Stockinger and John Wu Lawrence Berkeley National Laboratory.
A Paradigm Shift in Database Optimization: From Indices to Aggregates Presented to: The Data Warehousing & Data Mining mini-track – AMCIS 2002 as Research-in-Progress.
July, 2001 High-dimensional indexing techniques Kesheng John Wu Ekow Otoo Arie Shoshani.
1 SciCSM: Novel Contrast Set Mining over Scientific Datasets Using Bitmap Indices Gangyi Zhu, Yi Wang, Gagan Agrawal The Ohio State University.
Fast Nearest Neighbor Search with Keywords. Abstract Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions.
Oracle Index study for Event TAG DB M. Boschini S. Della Torre
Database Management 9. course. Execution of queries.
Bitmap Indices for Speeding Up End User Physics Analysis Main Results of Ph.D. Thesis Kurt Stockinger Database Group, IT-Division, CERN Formerly affiliated.
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
Bitmap Indices for Data Warehouse Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
The STAR Grid Collector and TBitmapIndex John Wu Kurt Stockinger, Rene Brun, Philippe Canal – TBitmapIndex Junmin Gu, Jerome Lauret, Arthur M. Poskanzer,
B.Sc. Matej Gomboši Determining differences between two sets of polygons Laboratory for Geometric Modelling and Multimedia Algorithms Faculty of Electrical.
Using Bitmap Index to Speed up Analyses of High-Energy Physics Data John Wu, Arie Shoshani, Alex Sim, Junmin Gu, Art Poskanzer Lawrence Berkeley National.
ROOT for Data Analysis1 Intel discussion meeting CERN 5 Oct 2003 Ren é Brun CERN Distributed Data Analysis.
September, 2002 Efficient Bitmap Indexes for Very Large Datasets John Wu Ekow Otoo Arie Shoshani Lawrence Berkeley National Laboratory.
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
Performance of Compressed Inverted Indexes. Reasons for Compression  Compression reduces the size of the index  Compression can increase the performance.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
9/28/2005Philippe Canal, ROOT Workshop TTree / SQL Philippe Canal (FNAL) 2005 Root Workshop.
March, PROOF - Parallel ROOT Facility Maarten Ballintijn Bring the KB to the PB not the PB to the KB.
March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
ROOT-CORE Team 1 Philippe Canal Markus Frank I/O Basic I/O, RDBMS interfaces, Trees.
Thomas Heinis* Eleni Tzirita Zacharatou ‡ Farhan Tauheed § Anastasia Ailamaki ‡ RUBIK: Efficient Threshold Queries on Massive Time Series § Oracle Labs,
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
ROOT Data bases access1 ROOT Data Bases access LCG Data Bases deployment workshop 19 October Ren é Brun CERN.
Dense-Region Based Compact Data Cube
Database System Architecture and Implementation
Indexing Structures for Files and Physical Database Design
Record Storage, File Organization, and Indexes
INFORMATION RETRIEVAL AND KNOWLEDGE MANAGEMENT SYSTEM
ITIS 5160 Indexing.
Efficient Image Classification on Vertically Decomposed Data
COMP 430 Intro. to Database Systems
Chapter 15 QUERY EXECUTION.
Efficient Image Classification on Vertically Decomposed Data
A Fast and Scalable Nearest Neighbor Based Classification
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Lecture 15: Bitmap Indexes
Dual Bitmap Index: Space-Time Efficient Bitmap
BITMAP INDEXES E0 261 Jayant Haritsa Computer Science and Automation
Query Processing.
Presentation transcript:

Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User Physics Analysis in ROOT Kurt Stockinger 1, Kesheng Wu 1, Rene Brun 2, Philippe Canal 3 (1) Berkeley Lab, Berkeley, USA (2) CERN, Geneva, Switzerland (3) Fermi Lab, Batavia, USA

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 2 Contents u Introduction to Bitmap Indices u Integration of Bitmap Indices into ROOT n Support for TTree::Draw and TChain::Draw n Example Usage u Example Usage u Experimental Results n Index Size n Performance of Bitmap Index vs. TTreeFormula u Conclusions

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 3 Bitmap Indices u Bitmap indices are efficient data structures for accelerating multi-dimensional queries: E.g. pT > 195 AND nTracks 12.4 u Supported by most commercial database management systems and data warehouses u Optimized for read-only data

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 4 Equality Encoding vs. Range Encoding a) list of attributes b) equality encoding c) range encoding with cardinality 10 Range encoding optimized for one-sided range queries, e.g. a0 <= 3

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 5 Bitmap Indices with Binning u Simple bitmap indices work well for low-cardinality attributes, i.e. number of distinct values per attribute is low ( < 10,000) u For high-cardinality attributes, the size of the bitmap index is often too large to be of practical usage (also with good compression algorithms) u Solution: n Keep bitmap for attribute range rather than for each distinct attribute value (binning) n Requires additional step for evaluating candidates in bin (“Candidate Check”) – see example on the next slide

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 6 Range Query on Bitmap Index with Binning “Candidate check” is performed on bitmap 4 to identify attribute values where x < 63 bitmap 3 XOR bitmap 4

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 7 Implementation Details u FastBit: n Bitmap Index software developed at Berkeley Lab n Includes very efficient bitmap compression algorithm u Integrated bitmap indices to support: n TTree::Draw n TTree::Chain u Each attribute to be indexed is stored as a separate branch u Index is currently stored as binary file

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 8 Example - Build Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); char indexLocation[1024] = “/data/index/"; bitmapIndex.ReadRootWriteIndexFile(tree, indexLocation); // build index for two attributes bitmapIndex.BuildIndex(tree, "a1", indexLocation); bitmapIndex.BuildIndex(tree, "a2", indexLocation);

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 9 Example - Tree::Draw with Index // open ROOT-file TFile f("data/root/data.root"); TTree *tree = (TTree*) f.Get("tree"); TBitmapIndex bitmapIndex; bitmapIndex.Init(); bitmapIndex.Draw(tree, "a1:a2", "a1 700");

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 10 Performance Measurements u Compare performance of TTreeFormula with TBitmapIndex::EvaluateQuery u Do not include time for drawing histograms u Run multi-dimensional queries (cuts with multiple predicates)

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 11 Experimental Setup u Software/Hardware: n Bitmap Index Software is implemented in C++ n Tests carried out on: s Linux CentOS s 2.8 GHz Intel Pentium IV with 1 GB RAM s Hardware RAID with SCSI disk u Data: n 7.6 million records with ~100 attributes each n Babar data set: u Bitmap Indices: n 10 out of ~100 attributes n 1000 equality-encoded bins n 100 range-encoded bins n Bitmap Index Compression algorithm: WAH (Word-Aligned Hybrid)

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 12 Size of Compressed Bitmap Indices EE-BMI: equality-encoded bitmap index RE-BMI: range-encoded bitmap index

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 13 Query Performance - TTreeFormula vs. Bitmap Indices Performance improvement of bitmap indices over TTreeFormula up to a factor of 10.

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 14 Query Performance - TTreeFormula vs. Bitmap Indices

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 15 Performance improvement of bitmap indices over TTreeFormula up to a factor of 10. Query Performance - TTreeFormula vs. Bitmap Indices

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 16 Approximate Answers u For bitmap indices with binning the exact answers are yielded during the Candidate Check Phase n Read certain records from disk to check if they fulfill the query constraint u Approximate answers are returned if the Candidate Check is omitted u The error of the approximate depends on the number of bins: n Note: the query result includes more events n However, no correct events are dropped u We used two different binning strategies: n Equality Encoding with 1000 bins: error rate 0.1% n Range Encoding with 100 bins: error rate 1%

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 17 Query Performance - Approximate Answers (Error %) Performance improvement of bitmap indices over TTreeFormula up to a factor of 30.

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 18 Query Performance - Approximate Answers (Error %)

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 19 Performance improvement of bitmap indices over TTreeFormula up to a factor of 30. Query Performance - Approximate Answers (Error %)

Stockinger, Wu, Brun, Canal Bitmap Indices for Fast End-User Physics Analysis in ROOT Zeuthen, Germany, May 2005 –- n° 20 Conclusions u We integrated bitmap indices into ROOT to support: n TTree::Draw n TChain::Draw u Bitmap indices significantly improve the performance of end- user analysis by up to a factor of 10. u With approximate answers of 0.1-1% error the performance improvement is up to a factor of 30. u Bitmap indices are also used successfully in STAR experiment at Brookhaven to access ROOT-files with GridCollector. u Future work: n Store bitmap indices as ROOT-tree. n Integrate with PROOF to support parallel index evaluation.