Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Lucene Index on Hbase in an HPC Environment

Similar presentations


Presentation on theme: "Analysis of Lucene Index on Hbase in an HPC Environment"— Presentation transcript:

1 Analysis of Lucene Index on Hbase in an HPC Environment
Prerna Shraff Anand Hegde

2 Concept BigTable Compressed, high performance database system
built on GFS, Chubby Lock Service, SSTable etc. Hbase Hadoop database Open source distributed versioned column oriented Modeled after BigTable

3 Outline Data intensive computing requires storage solutions for huge amount of data. The requirement is to host very large tables on clusters of commodity hardware. HBase helps in fulfilling the above requirement. Hbase provides Bigtable like capabilities on top of Hadoop.

4 The Idea Current implementation in this field includes an experiment using Lucene Index on Hbase in an HPC Environment. (Xiaoming Gao, Vaibhav Nachankar, Judy Qiu) To expand the scope of the existing project. To evaluate the performance in terms of many other parameters.

5 Architecture

6 Implemented solution Use of inverted index using Lucene index.
Index refers to doc1 -> “cloud” Inverted index refers to “cloud” -> doc1 Apache Lucene was used to implement inverted indices. Apache Lucene supports full-text search.

7 Implemented Design The existing design has separate tables for book images, book texts and Lucene indices.

8 System Implementation

9 Initial Analysis Experiment was performed on the Alamo HPC cluster of FutureGrid. Experiment was conducted with 5 Books. Total terms evaluated : 8263

10 Initial Data Analysis

11 Initial Data Analysis

12 Proposed Work To test across more number of data sets.
To test across different clusters like India on FutureGrid. To test across different number of HDFS data nodes. To test across more number of client nodes with different number of client queries.

13 Obstacles we can face Hardware differs from cluster to cluster and performance will differ accordingly. Problems may occur with increase or decrease of data nodes. Important items to consider would be switching capacity of the device, number of systems connected and uplink capacity. Finding appropriate number of data sets.

14 References Hbase http://hbase.apache.org/book.html#ops_mgt
BigTable Experiment using Lucene Index on Hbase in an HPC Environment. (Xiaoming Gao, Vaibhav Nachankar, Judy Qiu)

15 Thank You


Download ppt "Analysis of Lucene Index on Hbase in an HPC Environment"

Similar presentations


Ads by Google