Presentation is loading. Please wait.

Presentation is loading. Please wait.

+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:

Similar presentations


Presentation on theme: "+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:"— Presentation transcript:

1 + Hbase: Hadoop Database B. Ramamurthy

2 + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend: Search  Analytics Simple get from a database  provide the primary key  get the row; traditional RDBMS is optimized for this  normalized tables  multiple indices etc. NULLs are expensive Analytics  huge number of rows accessed efficiently  To supply analytic algorithms with big-data  inherently denormalized  multiple versions eg. time series NULLs are typical/norm…very common

3 + Motivation-1 HDFS itself is “big” Why do we need “hbase” that is bigger and more complex? Word count, web logs …are simple compared to web pages…consider what a web crawler encounters… http://www.cse.buffalo.edu http://www.math.buffalo.edu/index.shtml

4 + Introduction Persistence is realized (implemented) in traditional applications using Relational Database Management System (RDBMS) Relations are expressed using tables and data is normalized Well-founded in relational algebra and functions Related data are located together However social relationship data and network demand different kind of data representation Relationships are multi-dimensional Data is by choice not normalized (i.e, inherently redundant) Column-based tables rather than row-based (Consider Friends relation in Facebook) Sparse table Solution is Hbase: Hbase is database built on HDFS

5 + Motivation-2 Google: GFS  Big Table  Colossus Facebook: HDFS  Hive  Cassandra  Hbase Yahoo: HDFS  Hbase To source a MR workflow and to sink the output of MR workflow; To organize data for large scale analytics To organize data for querying To organize data for warehousing; intelligence discovery NO-SQL (see salesforce.com) Compare storing a Bank Account details and a Facebook User Account details

6 + Hbase Hbase reference : http://hbase.apache.orghttp://hbase.apache.org Main concept: millions of rows and billions of columns on top of commodity infrastructure (say, HDFS) Hbase is a data repository for big-data It can be a source and sink to HDFS workflow Hbase includes base classes for supporting and backing MR workflows, Pig and Hive as sink as well as source HBASE HDFS HBASE

7 + When to use Hbase? When you need high volume data to be stored Un-structured data Sparse data Column-oriented data Versioned data (same data template, captured at various time, time-elapse data) When you need high scalability (you are generating data from an MR workflow: you need to store sink it somewhere…) When you have long rows that a table needs to be split within a traditional row…shrading into horizontal partition.

8 + Hbase: A Definitive Guide By George Lars Online version available Also look at http://www.larsgeorge.com/2009/10/hbase- architecture-101-storage.htmlhttp://www.larsgeorge.com/2009/10/hbase- architecture-101-storage.html

9 + Column -based

10 + Hbase Architecture

11 + Data Model http://www.larsgeorge.com/2009/10/hbase-architecture- 101-storage.html http://www.larsgeorge.com/2009/10/hbase-architecture- 101-storage.html Table Row# is some uninterrupted number Column Families (courses: mth309, courses:cse241) Region Region File

12 Hardware HDFS HBASE Operating Sys Client Htable MR Client Htable Applications: Google Earth

13 Client -ROOT- META data META data User table Implemented Thru regionserver and regions: Rows, colfam, cols User table Implemented Thru regionserver and regions: Rows, colfam, cols

14 Row Row Key Column Family ….. Column qualifier Column qualifier Column qualifier Column qualifier Column qualifier Column qualifier Column qualifier Column qualifier Timestamp: data Column qualifier Column qualifier Timestamp: data One row’s data

15 A A B B Z Z Rows Region Keys T-Z Region Keys T-Z Region Keys I-M Region Keys I-M Region Keys A-C Region Keys A-C Region Keys F-I Region Keys F-I Region Keys M-T Region Keys M-T Region Keys C-F Region Keys C-F Region server1 Region server 2 Region server 3

16 HDFS Zookeeper Hbase API Master RegionServer HFile Memstore Write- ahead Log Big-data application: EMR, healthcare, health exchanges


Download ppt "+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:"

Similar presentations


Ads by Google