Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login.

Similar presentations


Presentation on theme: "Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login."— Presentation transcript:

1 Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login

2 Introduction to Giri Vislawath Senior Software Developer Overstock.com giri.vislawath@gmail.com

3 Agenda What is HBase ? –What HBase is NOT? Relational Database vs HBase HBase –Architecture –Data Model –Logical & Physical View –Design Considerations –Setup –Clients Demo Q & A

4 What is HBase? Open source Apache project Non-relational, distributed Database Runs on top of HDFS Modeled after Google’s BigTable technology Written in Java NoSQL (Not Only SQL) Database Consistent and Partition tolerant Runs on commodity hardware Large Database ( terabytes to petabytes). Low latency random read / write to HDFS. Many companies are using HBase –Facebook, Twitter, Adobe, Mozilla, Yahoo!, Trend Micro, and StumbleUpon

5 HBase is NOT A direct replacement for RDBMS ACID (Atomicity, Consistency, Isolation, and Durability) complaint – HBase provides row-level atomicity – A scan is NOT consistent view of a table (neither isolated) – All visible data is also durable data.

6 Relational Database vs HBase Hardware –Expensive Enterprise multiprocessor systems –Same as Hadoop Fault Tolerance –RDBMS are configured with high availability. Server down time intolerable. –Built into the architecture. Individual Node failure does not impact overall performance. Database Size –RDBMS can hold upto TBs (Tera bytes) –Hbase can hold PBs (Peta bytes) Data Layout –RDBMS are rows and columns oriented –Hbase is Column oriented

7 Relational Database vs HBase Data Type –Rich data type. –Bytes Transactions –Fully ACID complaint. –ACID on single row only. Indexes –PK, FK and other indexes. –Sorted Row-key (not a real index)

8 HBase Architecture Client Zookeeper Master Region Server 2 Region Server 3 Region Server 1 HDFS / Hadoop

9 HBase – Fault Tolerance What if region server dies? –The hbase master will assign a new regionserver. What if maser dies? –The back up master will take over. What if the backup master dies? –You are dead. Replication of Data –HBase achieves this using HDFS replication mechanism. Failure Detection –Zookeeper is used for identifying failed region servers. 9

10 HBase Data Model No Schema Table –Row-key must be unique –Rows are formed by one or more columns –Columns are grouped into Column Families –Column Families must be defined at table creation time –Any number of Columns per column family –Columns can be added on the fly –Columns can be NULL NULL columns are NOT stored (free of cost) Column only exist when inserted (Sparse) Cell –Row Key, Column Family, Qualifier, Timestamp / Version Data represented in byte array –Table name, Column Family name, Column name

11 HBase – Logical View of Data ID (pk)First Name Last NametweetTimestamp 1234JohnSmithhello20130710 5678JoeBrownxyz20120825 5678JoeBrownzzz20130916 Row keyValue (Column Family, Qualifier, Version) 1234Info{‘lastName’: ‘Smith’, ‘firstName’:’John’} pwd{‘tweet’:’hello’ @ts 20130710} 5678Info{‘lastName’: ‘Brown’, ‘firstName’:’Joe’} pwd{‘tweet’:’xyz’ @ts 20120825, ‘tweet’:’zzz’ @ts 20130916} RDBMS View Logical Hbase View

12 HBase – Physical View of Data Row keyColumn Family:ColumnTimestampValue 1234info:fn12345678John 1234Info:ln12345678Smith 5678Info:fn12345679Joe 5678Info:ln12345679Brown Info column family Row keyColumn Family:ColumnTimestampValue 1234tweet:msg12345678Hello 5678tweet:msg12345679xyz 5678tweet:msg12345999zzz tweet column family

13 Hbase – Logical to Physical View RowC1C2C3C4C5C6C7 ROW1V1V3V6 ROW2V4V6V7 ROW3V6V5 ROW4V10V11V2 CF1 CF2 HFile for CF1 HFile for CF2 ROW1:CF1:C1:V1 ROW1:CF1:C3:V3 ROW2:CF1:C1:V4 ROW2:CF1:C2:V6 ROW2:CF1:C4:V7 ROW3:CF1:C3:V6 ROW4:CF1:C1:V10 ROW4:CF1:C3:V11 ROW1:CF1:C1:V1 ROW1:CF1:C3:V3 ROW2:CF1:C1:V4 ROW2:CF1:C2:V6 ROW2:CF1:C4:V7 ROW3:CF1:C3:V6 ROW4:CF1:C1:V10 ROW4:CF1:C3:V11 ROW1:CF2:C6:V6 ROW3:CF2:C6:V5 ROW4:CF2:C6:V2 ROW1:CF2:C6:V6 ROW3:CF2:C6:V5 ROW4:CF2:C6:V2 Physical View

14 DesignConsiderations Row Key design –To Leverage Hbase system, row-key design is very important –Row Key must be designed based on how you access data. –Salting rowkey (prefix) –Must be designed to make sure data uniformly distributed (Avoid hotspotting) Column Family design –Designed based on grouping of like information (user base info, user tweets) –Short name for column family (every row in Hfile contains the name, in bytes) –Two to three column families per Table

15 Hbase - Setup HBase is written in Java HBase Shell is based on JRuby’s IRB (interactive ruby shell) Download HBase from https://hbase.apache.org/https://hbase.apache.org/ Latest stable version is 0.94.17 Hbase –Standalone $HBASE_HOME/bin/start-hbase.sh $HBASE_HOME/bin/stop-hbase.sh $HBASE_HOME/bin/hbase shell –Single Node Cluster mode (pseudo) Cloudera VM (on VMPlayer or VirtualBox) (www.cloudera.com)

16 HBase – Clients Program / API based clients –Java, REST, Thrift, Avro Batch Clients –MapReduce (Pig, Hive) Shell –Command Line Interface –Supports Client and Administrative operations. Web-based UI –HUI (Hbase cluster UI)

17 Hbase – Shell (commands) CommandDescription listShows list of tables create ‘users’, ‘info’Creates users table with a single column family name info. put ‘users’, ‘row1’, ‘info:fn’, ‘John’ Inserts data into users table and column family info. get ‘users’, ‘row1’Retrieve a row for a given row key scan ‘users’Iterate through table users disable ‘users’ drop ‘users’ Delete a table (requires disabling table) CRUD explained CREATE = PUT READ=GET UPDATE=PUT DELETE=DELETE

18 Hbase – Java API (examples) CommandDescription GetGet get = new Get(String.valueOf(uid).getBytes()); Result[] results = table.get(gets); PutPut p = new Put(Bytes.toBytes(""+user.getUid())); p.add(Bytes.toBytes("info"), Bytes.toBytes("fn"), Bytes.toBytes(user.getFirstName())); p.add(Bytes.toBytes("info"), Bytes.toBytes("ln"), Bytes.toBytes(user.getLastName())); table.put(p); Delete (column, column family) Delete d = new Delete(Bytes.toBytes(“”+user.getUid())); d.deleteColumn(Bytes.toBytes("info"), Bytes.toBytes("fn"), Bytes.toBytes(user.getFirstName()), timestapmp1); Batch OperationsList of Get, Put or Delete operations ScanIterate over a table. Prefer Range / Filtered scan. Expensive operation.

19

20 References HBase: The Definitive Guide by Lars George HBase in Action by Nick Dimiduk and Amandeep Khurana

21 Thank You


Download ppt "Thanks to our Sponsors! To connect to wireless 1. Choose Uguest in the wireless list 2. Open a browser. This will open a Uof U website 3. Choose Login."

Similar presentations


Ads by Google