Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.

Similar presentations


Presentation on theme: "Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface."— Presentation transcript:

1 Introduction to Hbase

2 Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface  Summarize

3 What is Hbase Hbase is an open source, distributed sorted map modeled after Google's BigTable

4 Open Source  Apache 2.0 License  Committers and contributors from diverse organizations like Facebook, Trend Micro etc.

5 About RDBMS  Have a lot of Limitations  Both read / write throughput not high (transactional databases)  Specialized Hardware is quite expensive

6 Background  Google releases paper on Bigtable – 2006  First usable Hbase – 2007  Hbase becomes Apache top-level project – 2010

7 Overview of Hbase  Hbase is a part of Hadoop eco-system.  Apache Hadoop is an open source system to reliably store and process data across many commodity computers  Hadoop provides:  Fault tolerance  Scalability

8 Hadoop's components  MapReduce (Process)  Fault tolerant distributed processing  HDFS (store)  Self-healing  High-bandwidth  Clustered storage

9 Difference Between Hadoop/HDFS and Hbase  HDFS is a distributed file system that is well suited for the storage of large files.  Hbase is built on top of HDFS and provides fast record lookups (and updates) for large tables.  HDFS has based on GFS.

10 Hbase is  Distributed – uses HDFS for storage  Column – Oriented  Multi-Dimensional  Storage System

11 Hbase is NOT  A sql Database – No Joins, no query engine, no datatypes, no sql  No Schema

12 Storage Model  Column – oriented database (column families)  Table consists of Rows, each which has a primary key(row key)  Each Row may have any number of columns  Table schema only defines Column families(column family can have any number of columns)  Each cell value has a timestamp

13 Static Columns intvarcharintvarcharint varcharintvarcharint varcharintvarcharint

14 Something different  Row1 → ColA = Value ColB = Value ColC = Value  Row2 → ColX = Value ColY = Value

15 A Big Map Row Key + Column Key + timestamp => value Row KeyColumn KeyTimestampValue 1Info:name1273516197868Sakis 1Info:age127387182418421 1Info:sex1273746281432Male 2Info:name1273863723227Themis 2Info:name1273973134238Andreas

16 RDBMS vs Hbase RDBMSHbase Data layoutRow-orientedColumn-oriented Query languageSQLGet/put/scan/etc * SecurityAuthentication/Authori zation Work in Progress Max data sizeTBsHundrends of PBs Read / write throughput limits 1000s queries/secondMillions of queries per second

17 Terms and Daemons  Region  A subset of table's rows  Region Server(slave)  Serves data for reads and writes  Master  Responsible for coordinating the slaves  Assigns regions, detects failures of Region Servers  Control some admin function

18 Distributed coordination  To manage master election and server availability we use Zookeeper  Set up a cluster, provides distributed coordination primitives  An excellent tool for building cluster management systems

19 Hbase Architecture

20 Hbase Interface  Java  Thrift (Ruby, Php, Python, Perl, C++,..)  Hbase Shell

21 Use Hbase if  You need random write, random read or both  You need to do many thousands of operations per sec on multiple TB of data  Your access patterns are simple

22 Thank You


Download ppt "Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface."

Similar presentations


Ads by Google