Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.

Project By: Anuj Shetye Vinay Boddula

Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.

As RDF datasets goes on increasing, therefore size of RDF is much larger than traditional graph Cardinality of vertex and edges is much larger. Therefore large data stores are required for following reasons Fast and efficient querying. Scalability issues.

Research has been done to map RDF dataset onto relational databases example: Virtuoso, Jena SDB. But dataset is stored centrally i.e. on one server. Examples: Jena SDB map RDF triple in relational database. – Scalability Some try to store RDF data as a large graph but on single node example Jena TDB– Scalability

Hbase is an open source distributed sorted map datastore. modelled on google big table.

Hbase is a No SQL datbase. High Scalability, Highly Fault Tolerant. Fast Read/Write Dynamic Database Hadoop and other apps integrated. Column family oriented data layout. Max datasize : ~1 PB. Read/write limits millions of queries per second. Who uses Hbase/Bigtable Adobe, Facebook, Twitter, Yahoo, Gmail, Google maps etc.

Src : cloudera

Mapper MR Job MR job MR Job Hbase Data store System Architecture I/p File

Row keyData Anuj hasAdvisor : {‘Dr. Miller’} workedFor: {‘UGA’} Vinay hasAdvisor : {‘Dr.Ramaswamy’} hasPapers : {‘Paper 1’,’Paper 2’} workedFor: {‘IBM’, ‘UGA’} Logical view as ‘Records’

Row KeyColumn keyTimestam p value AnujhasAdvisorT1Dr. Miller VinayhasAdvisorT2Dr.Ramaswamy Row KeyColumn keyTimestampvalue VinayhasPaperT2Paper1 VinayhasPaperT1Paper2 Physical Model hasAdvisor Column family hasPaper Column family

Row KeyColumn keyTimestampvalue AnujworkedForT1‘UGA’ VinayworkedForT3‘UGA’ VinayworkedForT2‘IBM’ workedFor Column family

Two major issues can be solved using Hbase Data insertion Data updation Versioning possible (Timestamps). Bulk loading of data. Two types complete bulk load (hbase File Formatter, our approach ) Incremental bulk load

We talk about it during the demo

CumulusRDF: Linked Data Management on Nested Key- Value Stores appeared in SSWS 2011 works on distributed key value indexing on data stores they used Casandra as the data store. Apache Casandra is currently capable of storing rdf data and has an adapter to store data in a distributed management system.

Our future work lies in developing an efficient interface for sparql as querying with SQL like HIVE is slower in Hbase. The testing of the system was done on single node, therefore testing it on multiple nodes would be an ultimate test of efficiency.

Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.

Similar presentations

Presentation on theme: "Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.

Similar presentations

Presentation on theme: "Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion."— Presentation transcript:

Similar presentations

About project

Feedback