Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nov 2006 Google released the paper on BigTable.

Similar presentations


Presentation on theme: "Nov 2006 Google released the paper on BigTable."— Presentation transcript:

1 Nov 2006 Google released the paper on BigTable.
Feb 2007 Initial HBase prototype was created as a Hadoop contribution. Oct 2007 The first usable HBase along with Hadoop was released. Jan 2008 HBase became the sub project of Hadoop. Oct 2008 HBase was released. Jan 2009 HBase was released. Sept 2009 HBase was released. May 2010 HBase became Apache top-level project. 2011 Hbase 0.92

2 Cloudera Image for hands-on
Installation instruction

3 Agenda Now HBase architecture Data operations - hands on Summary

4 ? Recap Mahout Sqoop Oozie Hive Pig Spark Impala Flume Hbase Zookeeper
Sequential data scanning with Scala, Python, Java, SQL Sequential data scanning with Java Sequential data scanning with SQL using MapReduce Sequential data scanning with SQL (direct data access) Machine learning Mahout Workflow manager Oozie Data exchange with RDBMS Sqoop Scripting Pig Hive SQL Large scale data proceesing Spark Distributed file system Log data collector Flume Impala SQL NoSql columnar store Hbase Zookeeper Coordination MapReduce YARN Cluster resource manager ? HDFS Hadoop Distributed File System

5 What is HBase? NoSQL database on Hadoop Optimized for random reads
Key – value store, schema-less For storing big tables with many rows and columns Consistent inserts, updates and deletes of rows Optimized for random reads Data partitioning by row key values Index on row key values Bloom filter Column store Scalable

6 What HBase is not? Not a relational database Transactions are not ACID
Index available only on a row key Weak for sequential data scanning

7 When to use? In general: For data too big to store on some central storage For random data access: quick lookups of individual records The data can be represented by key-value sets Database of binary records (serialized objects, documents) When data set has to be updated is sparse – records have variable number of attributes has custom data types (serialization)

8 When NOT to use? For massive data processing/analytics
use MR, Spark, Hive, Impala… instead For data sets with very high frequency insertion rates stability concerns - from own experience Data schema is complex If “I do not know what solution to use”


Download ppt "Nov 2006 Google released the paper on BigTable."

Similar presentations


Ads by Google