Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed.

Similar presentations


Presentation on theme: "Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed."— Presentation transcript:

1 Introduction of HBase Reporter: Hu Yi 2009-3-11

2 Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed Computing Environment. Data is logically organized into tables, rows and columns.

3 Outline Data Model Architecture and Implementation Examples & Tests

4 Conceptual View A data row has a sortable row key and an arbitrary number of columns. A Time Stamp is designated automatically if not artificially. : Row key Time Stamp Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t15 “ anchor:cnnsi.com ”“ CNN ” t13 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …” :

5 Physical Storage View Physically, tables are stored on a per-column family basis. Empty cells are not stored in a column- oriented storage format. Each column family is managed by an HStore. Row keyTS Column “ contents: ” “com.apache.w ww” t12 “ …” t11 “ …” “ com.cn.www ” t6 “ …” t5 “ …” t3 “ …” Row keyTS Column “ anchor: ” “com.apache. www”t10 “anchor: apache.com” “APACHE” com.cn.www ” t9 “ anchor: cnnsi.com ” “ CNN ” t8 “ anchor: my.look.ca ” “ CNN.co m ” HStore Data MapFile Index MapFile Key/Value Index key HStore Memcache

6 Row Ranges: Regions Row key/ Column ascending, Timestamp descending Physically, tables are broken into row ranges contain rows from start-key to end-key Row key Time Stamp Column “ contents: ” Column “ anchor: ” aaaa t15 anchor:ccvalue t13 ba t12 bb t11 anchor:cdvalue t10 bc aaab t14 aaac anchor:bevalue aaad anchor:advalue aaae t5 ae t3 af

7 Outline Data Model Architecture and Implementation Examples & Tests

8 Three major components The HBaseMaster The HRegionServer The HBase client

9 HBaseMaster Assign regions to HRegionServers. 1. ROOT region locates all the META regions. 2. META region maps a number of user regions. 3. Assign user regions to the HRegionServers. Enable/Disable table and change table schema Monitor the health of each Server

10 ROOT/META Table Each row in the ROOT and META tables is approximately 1KB in size. At the default size of 256MB. 2 24 TB

11 HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits write Hstore1 Hstore2 Memcache1 HLog Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apac he.ww w” t12“ …” t11“ …” t10 “anchor:apache.com” “APACH E” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look. ca ” “ CNN.co m ” t6 “ …” t5 “ …” t3 “ …” Memcache2 Mapfile1.1 Mapfile1.2

12 HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Read Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …”

13 HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Cache Flushes Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 HLog Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …” Mapfile1.1 Mapfile1.2 Mapfile1.3

14 HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Compaction s Hstore1 Memcache1 Mapfile1.1 Mapfile1.2 Mapfile1 Row key Time Stam p Column “ contents: ” Column “ anchor: ” “com.apach e.www” t12“ …” t11“ …” t10 “anchor:apache. com” “APACHE” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look.c a ” “ CNN.com ” t6 “ …” t5 “ …” t3 “ …”

15 HRegionServer Write Requests Read Requests Cache Flushes Compactions Region Splits Hstore1 Memcache1 Mapfile1 Row key Time Stam p Column “ contents : ” Column “ anchor: ” “com.apac he.ww w” t12“ …” t11“ …” t10 “anchor:apache.com” “APACH E” “ com.cnn.w ww ” t9 “ anchor:cnnsi.co m ” “ CNN ” t8 “ anchor:my.look. ca ” “ CNN.co m ” t6 “ …” t5 “ …” t3 “ …”

16 HBase Client

17 ROOT Region

18 HBase Client META Region

19 HBase Client User Region Information cached

20 Outline Data Model Architecture and Implementation Examples & Tests

21 Create MyTable HBaseAdmin admin= new HBaseAdmin(config); HColumnDescriptor []column; column= new HColumnDescriptor[2]; column[0]=new HColumnDescriptor("columnFamily1:"); column[1]=new HColumnDescriptor("columnFamily2:"); HTableDescriptor desc= new HTableDescriptor(Bytes.toBytes("MyTable")); desc.addFamily(column[0]); desc.addFamily(column[1]); admin.createTable(desc); Row KeyTimestampcolumnFamily1:columnFamily2:

22 Insert Values BatchUpdate batchUpdate = new BatchUpdate("myRow",timestamp); batchUpdate.put("columnFamily1:labela",Bytes.toBytes("l abela value")); batchUpdate.put("columnFamily1:labelb",Bytes.toBytes(“l abelb value")); table.commit(batchUpdate); Row KeyTimestampcolumnFamily1: myRow ts1labelalabela value ts2 labelb labelb value

23

24

25 Search Row key Time Stamp Column “ anchor: ” “com.apache.www” t12 t11 t10 “anchor:apache.com”“APACHE” “ com.cnn.www ” t9 “ anchor:cnnsi.com ”“ CNN ” t8 “ anchor:my.look.ca ”“ CNN.com ” t6 t5 t3 Select value from table where key=‘com.apache.www’ AND label=‘anchor:apache.com’

26 Search Scanner Select value from table where anchor=‘cnnsi.com’ Row key Time Stamp Column “ anchor: ” “com.apache.www” t12 t11 t10 “anchor:apache.com”“APACHE” “ com.cnn.www ” t9 “ anchor:cnnsi.com ”“ CNN ” t8 “ anchor:my.look.ca ”“ CNN.com ” t6 t5 t3

27 Summary Column-oriented modification more flexible. Higher performance on row key clusters.

28 Future work More test work Optimization on search

29 Thank you


Download ppt "Introduction of HBase Reporter: Hu Yi 2009-3-11. Overview HBase is an Apache open source project whose goal is to provide storage for the Hadoop Distributed."

Similar presentations


Ads by Google