Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13.

Similar presentations


Presentation on theme: "Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13."— Presentation transcript:

1 Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13

2 HBase vs BigTable The Problem Implementation Performance Analysis Survey Conclusion

3  BigTable Compressed, high performance database system It is built GFS using Chubby Lock Service, SSTable etc.  HBase Hadoop Database Open source, distributed versioned, column oriented Modeled after BigTable

4 Data intensive computing requires storage solutions for huge amount of data. The requirement is to host very large tables on clusters of commodity hardware. HBase provides BigTable like capabilities on top of Hadoop. Current implementation in this field includes an experiment using Lucene Index on HBase in an HPC Environment. (Xiaoming Gao, Vaibhav Nachankar, Judy Qiu)

5

6 Configured Hadoop and HBase on Alamo cluster. Added scripts to run the program sequentially on multiple nodes. Modified scripts to record size of the table. Modified scripts to record time of execution for both sequential and parallel execution.

7 Sequential execution across same number of nodes for different data sizes. Sequential execution across different number of data nodes for same data size. Parallel execution across same number of nodes for different data sizes.

8 Performed analysis on Alamo cluster on FutureGrid System type: Dell PowerEdge No. of CPUs: 192 No. of cores: 768 3 ZooKeeper nodes + 1 HDFS-Master + 1 HBase- master

9 00000004###md###Title###Geoffrey C. Fox Papers Collection 1990 00000004###md###Category###paper, proceedings collection 00000004###md###Authors###Geoffrey C. Fox, others 00000004###md###CreatedYear###1990 00000004###md###Publishers###California Institute of Technology CA 00000004###md###Location###California Institute of Technology CA 00000004###md###StartPage###1 00000004###md###CurrentPage###105 00000004###md###Additional###This is a paper collection of Geoffrey C. Fox 00000004###md###DirPath###Proceedings in a collection of papers from one conference/Fox 00000005###md###Title###C3P Related Papers - T.Barnes 00000005###md###Category###paper, proceedings collection 00000005###md###Authors###T.Barnes, others

10

11

12

13 There are a lot of load testing frameworks available to run distributed tests using many machines. Popular ones are Grinder, Apache JMeter, Load Runner etc. Compared the above testing frameworks to choose the best framework.

14 Gives the absolute measure of the system response time. Targets the regressions on the sever and the application code. Examines the response. Helps evaluate and compare middleware solutions from different vendors.

15 Automated performance testing product on a commercial ground Supports JavaScript and C-script Windows platform Commercial Aimed for Automated Test Engineers Has a UI  Framework: Virtual User Scripts Controller

16 Pure Java desktop application designed to load test functional behavior and measure performance designed for testing Web Applications Java based Highly extensible  Test Plan Thread Groups Controllers Samplers Listeners

17 Open source Uses Jython Scripts can be run by defining the tests in the grinder.properties file  Framework: Console Agent Workers

18

19 ParameterLoadRunnerGrinderJMeter Server monitoring Strong for MS Windows Needs wrapper based approach No built in monitoring Amount of loadNumber of users restricted Number of agents restricted Number of agents depend on H/W support available Able to run in batch? No Yes Ease of installation DifficultModerateEasy Setting up tests Icon basedUses JythonJava based

20 ParameterLoadRunnerGrinderJMeter Running testsComplexModerateSimple Result generation Integrated analysis tool No integrated tool available Can generate client side graphs Agent management Easy/AutomaticManualReal time/Dynamic Cross PlatformNo. MS Windows only Yes Intended audience Aimed at non- developers Aimed at developers Aimed at non- builders StabilityPoorModeratePoor CostExpensiveFree (open source)

21 Study HBase Study Lucene Indexing Modify Scripts Add Scripts Study Testing Frameworks Implement Grinder

22 Sequential execution takes more time compared to parallel execution on HBase. Research indicates that HBase is not as robust as the BigTable yet. Regarding the testing framework, we recommend Grinder as it is an open source tool and has lot of documentation. Grinder also provides good real time feedbacks.

23 http://grinder.sourceforge.net/ http://jmeter.apache.org/ http://www8.hp.com/us/en/software/software- product.html?compURI=tcm:245-935779 http://hpcdb.org/sites/hpcdb.org/files/gao_lucene.pdf http://hadoop.apache.org/common/docs/stable/fil e_system_shell.html#du

24


Download ppt "Anand Hegde Prerna Shraff Performance Analysis of Lucene Index on HBase Environment Group #13."

Similar presentations


Ads by Google