Big Data Storage (III) -- 使用 HareDB HBase Client 工具.

Big Data Storage (III) -- 使用 HareDB HBase Client 工具

HareDB 簡介 HareDB HBase Client 是在 Hadoop eco system 之中，著名的 HBase 工具可以於 Sourceforge 網站下載，使用人數眾多 HareDB 的相關資料請參閱 www.haredb.com www.haredb.com 在 HBase 的設計與應用上， HareDB 可以增加使用的便利 Image from : www.HareDB.com

HareDB HBase Client 特性 GUI Management –Intuitive operation 3

HareDB HBase Client 特性 Much Faster Query –HiveQL + HBase Coprocessor = HareQL –Support most of features of SQL Aggregation functions, union, join, subquery… 4 Select sum(cf1:time) From visit where :key < ‘1.1.181.134’

HareDB HBase Client 特性 Easy Install –Just double click the executable jar 5

HareDB HBase Client 特性 Bulkload 適合巨量資料匯入 HBase –Import data much easier and faster 6

HareDB-Download 須先自行建立 Hadoop 叢集請到網站 sourceforge.net 或 www.haredb.com 下載

系統環境請下載 HareDB 1.98.x 版本 Run-time environment are as below. – Hadoop 2.3.X (using YARN) – HBase 0.96+ Recommend that using YARN – According to the huge update in HBase 0.94 to HBase 0.96, this version(1.98.x) of HareDB HBase Client is not fully functional in HBase 0.94. We’d strongly recommend that you should update your HBase to version 0.96+. 資料來源： www.haredb.com, HareDB Tutorial, HareDB HBase Client User Manual www.haredb.com

開始使用 HareDB 點擊資料夾裡的 hareDB.client – java 版本為 jdk6

Click the icon Select a connection you want to connect to. The system will try to connect to HBase and show After connection is succeed, the connection node will show up. Connect to HBase

Connection We define a “Connection” which has two parts of information. – One is for Hadoop and HBase Cluster. – Another is for Hive metastore which is a place to store meta. Connection Manager is a component which can help us to add 、 modify and delete a connection. There is an item “Manage Connections” in the bottom of the connection list. Please select “Manage Connections”

Connection Manager System will pop up a Connection Manager windows. Area A: It is a connection list which contents all connections you have. Area B: There are information of the connection you selected related with Hadoop and HBase Cluster Area C: There are information of the connection you selected related hive metastore.

Add a new Connection Open a “Connection Manager” Click the button

Connection Manager - Connection Connection Name: Just a name which will show up at the connection list. Zookeeper Host/ip: Input zookeeper’s host name or ip. ZooKeeper Client Port: Input the port of zookeeper’s that your request can through fs.default.name: Address of HDFS for client. yarn.resourcemanager.address: Address of ResourceManager for client. yarn.resourcemanager.scheduler.address: ApplicationMaster request resources through this address. yarn.resourcemanager.resource-tracker.address:Address of ResourceManager for NodeManager. yarn.resourcemanager.admin.address:Address of ResourceManager for Administrator. mapreduce.jobhistory.address:Address for resuesting job history. coprocessor folder: A HDFS folder path for HareDB HBase Client. This folder should be granted the privilege of insert, edit and delete file for HareDB HBase Client.

Connection Manager – Hive Metastore These information is for metadata of the HTable. We rely on the metadata database of Apache Hive. We support all kinds of hive metadata database. First, it’s “Embedded”. If you don’t have any hive metadata database or you didn’t install hive or even you don’t know what is Hive. Please select “Embedded”

Connection Manager – Hive Metastore Second, it’s “Local”. If you have an RDB to be a Hive metadata database, please select “Local”. And please input the rdb connection information.

Connection Manager – Hive Metastore Third, it’s “Remote”. Apache Hive provide a thrift service to interactive with metadata database. So, please provide the url of thrift server. For example: http://192.168.1.11:10000 192.168.1.11 => your hive thrift server. 10000 => Thrift server port. After filling the information, we have to click the button “Apply” Hereafter, the new connection will appear to the connection list in the left side of “Connection Manager”.

Edit Connection Alter Connection – Open “Connection Manager” (Reference: Connection Manager)Connection Manager – Select a connection you want to alter in the connection list. – Furthermore, the information of the connection you selected will show up at the right side of the Connection Manager. – To modify any information you want to, then click the button “Apply” Delete Connection Clone Connection Rename Connection – Right click the connection

We defines 4 kinds of table. Each kinds of table has different icon. It’s an HBase table without registration of HareQL coprocessor. It’s an HBase table with no table metadata. It’s an HBase table with table metadata It’s a Hive table with table metadata. This hive table is a child of HBase table. Table

Create Table Select a connection and connect to HBase. (Reference “Connect to HBase”)Connect to HBase Right click on the connection icon and it will pop up a menu.

Please select “Create Table”, and it will pop up a windows. Input a table name.

At least input a column family. Of course you can delete column family

Press “Ok” You can check the table you just created in the table list

Alter Table Select an connection to connect to HBase (Reference: ”Connect to HBase”)Connect to HBase Right click on the HTable you want to alter and system will pop up a menu. Please select “Alter Table”, and it will pop up a windows. You can add column family or delete column family here. However, please be careful while you are deleting column family. Deleting column family will lose your data which is belong to the column family.

Edit Column Family’s Properties Please open the Alter Table Windows

Select a Column Family

Click “Edit Column Family”, and system will pop up a window. You can modify most properties.

Table Metadata As we know, there is no schema in HBase. We only define table name and column families while we are creating HBase table. Furthermore, there is no data type in HBase. Everything is stored as a byte[] in HBase. However if we don’t know the correct type, we can’t cast the byte[] to the correct type and we can’t read it correctly. So, we provide some metadata manage tool for you. – Create Meta – Cteate Hive Table

Create Meta Select a connection and connect to HBase. (Reference “Connect to HBase”)Connect to HBase Right click on the table which you want to create meta and it will pop up a menu.

Click “Create meta”, and it will pop up a windows. If the table has some hive tables already, system will give you a suggestion according to those hive tables. Screenshot of Create Meta with suggestion Screenshot of Create Meta without suggestion

Rowkey is a must row, don’t delete it. Insert Column – Click “Insert Column” – System will pop up a window.

– Column Name: Any name you want – Data Type: this type must the same with the type while you input it to HBase. – We support these types as below. – Column Family and Qualifier must be the same with your HBase column which you want to map. Be careful it’s case sensitive.

Create Hive Table Select a connection and connect to HBase. (Reference “Connect to HBase”)Connect to HBase Right click on the table which you want to create hive table for it and system will pop up a menu.

Click “Create hive table”, and system will pop up a windows. Input a table name And you can insert, modify column and delete column

Register a coprocessor According to the HBase, we must register coprocessor for the table first before using it. Select a connection and connect to HBase. (Reference “Connect to HBase”)Connect to HBase Right click on the table which you want to register for it and system will pop up a menu. Click “Coprocessor” and system will pop up a window. Click “Register” and done.

Introduction to HareQL What is HareQL? – HareQL is hiveQL+HBase Coprocessor. – We can use hiveQL which is a subset of ANSI SQL to query data.

HareQL Syntax SELECT SELECT [ ALL | DISTINCT] select_expr as alias, select_expr,... FROM table_reference [WHERE where_condition] [GROUP BY col_list [HAVING condition]] [SORT BY | ORDER BY col_list] [LIMIT number]

HareQL Syntax UNION select_statement UNION ALL select_statement UNION ALL select_statement... UNION is used to combine the result from multiple SELECT statements into a single result set. We currently only support “UNION ALL”. Currently, we are not support sub-query. So we don’t support the query like this: SELECT * FROM ( select_statement UNION ALL select_statement)

JOIN join_table: table_reference JOIN table_factor [join_condition] | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition | table_reference LEFT SEMI JOIN table_reference join_condition | table_reference CROSS JOIN table_reference [join_condition] (as of Hive 0.10) table_reference: table_factor | join_table table_factor: tbl_name [alias] | table_subquery alias | ( table_references ) join_condition: ON equality_expression ( AND equality_expression )* equality_expression: expression = expression Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job. Also, more than two tables can be joined in HareDB. HareQL Syntax

HareQL Operator Logical Operators AND 、 OR 、 && 、 || 、 NOT A Relational Operators A = B 、 A B A <> B 、 A != B A B 、 A >= B A IS NULL 、 A IS NOT NULL A LIKE B A RLIKE B A REGEXP B

HareQL Operators Arithmetic Operators The following operators support various common arithmetic operations on the operands. A and B are columns in HBase Table. No matter what type of A and B are number (like integer or float…) or string in Hive metastore, their value in HBase cell must be the same type with hive meta. We all return String types; if any of the operands are NULL, then the result is also NULL. A + B, A – B, A * B, A / B, A % B, A & B, A | B, A ^ B, ~A

HareQL Functions Aggregate Functions count(*), count(expr), count(DISTINCT expr[, expr_.]) sum(col), sum(DISTINCT col) avg(col), avg(DISTINCT col) min(col), max(col)

Open HQL Command Register a coprocessor – Our HQL (HareQL Command) is implemented by HBase coprocessor.HareQL Command – According to the HBase, we must register coprocessor first before using it.(Reference “Register Coprocessor”)Register Coprocessor Select a connection and connect to HBase. (Reference “Connect to HBase”)Connect to HBase Right click on the connection icon and system will pop up a menu. Click “Open HQL command” and system will pop up a window like fellow.

You can use sql here to query data, but just “Select” no “Insert 、 Update 、 Delete” and “DDL”.

Of course, you may select some specific columns.Such as, “Select :key, cf1:customer,cf1:t_amount from First_Table”. Special characters in columns of HBase Table are permitted. So that’s a real problem for us to tell what the really column is. So please use the “accent key” to mark each column that you are going to select. For example, Select `:ket`,`cf1:customer`,`cf1:t_amount` from First_Table. If you don’t use the “accent key” to mark each column, the sql script will not work. And where is the “accent key”? Please see the photo as below. After you input the sql, press the RUN.

Open HTable Select a connection and connect to HBase. (Reference “Connect to HBase”)Connect to HBase Double click the table or right click on the table and system will pop up a menu. Select “OpenHTable”. You can click “Query”. Then data will show up as below. If your data is encode with integer or other data type, they are usually gibberish. You can create a Meta for this table(Reference Create Meta), system will cast it to correct data type for you automatically. Create Meta

Open Hive Table Select a connection and connect to HBase. (Reference “Connect to HBase”)Connect to HBase Double click the Hive table or right click on the table and system will pop up a menu and select “Open hive Table”.

System will show the hive table with data in the right side as fellow.

You can input some criteria in filter area than click “Execute”. You will get the result. PS: This table must register our coprocessor first, or this function will not work. (Reference: Register a coprocessor)Register a coprocessor

Data Operation – Put data After input data to the cell, please remember to click “Enter”. Currently, if you click your mouse in other location, maybe you will lose the values which you just enter. If the table is with meta which you set by yourself, the system will auto cast the string type to the correct data type according to the meta you set.

Data Operation – Modify data

Data Operation – Delete Data

Data Operation – New Qualifier

HBase Hands on -- 使用 HBase API

API Hands on etc/hosts 設定 IP

API Samples

Create Table HBaseAdmin admin = new HBaseAdmin(config); HTableDescriptor tabledescriptor = new HTableDescriptor(Bytes.toBytes(“name”)); tabledescriptor.addFamily(new HColumnDescriptor(“cf”)); admin.createTable(tabledescriptor);

API Samples Put Data HTable table = new HTable(config, “name"); Put put = new Put(Bytes.toBytes("row1")); put.add(Bytes.toBytes("cf"),Bytes.toBytes("aa"),Bytes.toBytes("value1")); table.put(put);

API Sample Get Data HTable table = new HTable(config, “name"); Get get = new Get(Bytes.toBytes("row1")); Result rs = table.get(get); byte[] val = rs.getValue(Bytes.toBytes("cf"),Bytes.toBytes("aa")); String rowname=""; rowname = Bytes.toString(val); System.out.println(rowname);

HBase Counter HBase counter allow user to collect information which can be use to do some statistics and be analyzed

HBase Counter 單計數器：指定想要計算的 Column 使用單計數器累加 HTable table = new HTable(conf,”counters”) long cnt1 = table.incrementColumnValue(Bytes.toBytes(“value”)) Bytes.toBytes((“column”),Bytes.toBytes(“hits”),1) 多計數器：針對 row 來執行累加 Increment(byte[]row) Increment addColumn(byte[] family,byte[] qualifier,long amount)

HTable Pool 讓客戶端建立 pool 到 HBase 叢集，避免讓使用者每個要求都建立一個 HTable 的實例 Configuration conf = HBaseConfiguration.create(); HTablePool pool = new HTablePool(conf, 5); HTableInterface[] tables = new HTableInterface[10]; for (int n = 0; n < 10; n++) { tables[n] = pool.getTable("testtable"); System.out.println(Bytes.toString(tables[n].getTableName())); } for (int n = 0; n < 5; n++) { pool.putTable(tables[n]); } pool.closeTablePool("testtable");

Java CRUD Programing

After schema design… Schema Design Schema Design Create table New Data Query Data Exist Data 1.HBase shell 2.HBase Java API 1.HBase shell 2.HBase Java API 1.HBase shell 2.HBase Java API 1.HBase shell 2.HBase Java API 1.Bulkload 2.HBase Java API 1.Bulkload 2.HBase Java API 1.HBase shell 2.HBase Java API 1.HBase shell 2.HBase Java API

HBase Shell $ hbase shell hbase(main):001:0> list # 列出所有的 table hbase(main):002:0> describe ’employees’ # 顯示 employee 的設定值 hbase(main):003:0> create ’employees’, ’f1’ # 建立 table hbase(main):004:0> scan ’employees’, {STARTROW=>'Customer Service_100050', STOPROW=>'Customer Service_100082'} hbase(main):005:0> get ’employees’, ’rowkey’ hbase(main):006:0> disable ‘employees’ # 關閉 table hbase(main):007:0> drop ‘employees’ # 刪除 table, 必須先 disable hbase(main):008:0> help # 顯示所有的命令 hbase(main):009:0> help ‘scan’ # 顯示 ‘scan’ 命令的說明 hbase(main):010:0> exit # 結束 hbase shell

HBase Client CRUD API SQL HBase Client API (org.apache.hadoop.hbase.client) HBase Shell Command select Get Scan get scan insertPutput deleteDeletedelete updatePutput

Before use Client CRUD API import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HBaseAdmin; import org.apache.hadoop.hbase.client.HTable; Configuration conf = HBaseConfiguration.create(); conf.set("hbase.zookeeper.quorum", "localhost"); conf.set("hbase.zookeeper.property.clientPort", “2181"); HBaseAdmin admin = new HBaseAdmin(conf); String tableName = "employees"; HTable table = new HTable(conf, tableName);

Get Configuration conf = Factory.createConfiguration(); String tableName = "employees"; HTable htable = new HTable(conf, tableName); Get get = new Get(Bytes.toBytes("Customer Service_100052")); Result result = htable.get(get); System.out.println( "result size: " + result.size() ); System.out.println( "result string: " + result.toString() ); for (int i=0; i<result.size(); i++) { System.out.println( String.format("result[%d]: %s", i, result.raw()[i].toString())); } htable.close(); result size: 7 result[0]: Customer Service_100052/cf1:d-from_date/1386558250874/Put/vlen=8/ts=0 result[1]: Customer Service_100052/cf1:d-to_date/1386558250874/Put/vlen=8/ts=0 result[2]: Customer Service_100052/cf1:p-birth_date/1386558250874/Put/vlen=8/ts=0 result[3]: Customer Service_100052/cf1:p-first_name/1386558250874/Put/vlen=8/ts=0 result[4]: Customer Service_100052/cf1:p-gender/1386558250874/Put/vlen=1/ts=0 result[5]: Customer Service_100052/cf1:p-hire_date/1386558250874/Put/vlen=8/ts=0 result[6]: Customer Service_100052/cf1:p-last_name/1386558250874/Put/vlen=5/ts=0 Example: 查詢一筆 (row) 資料

Get Configuration conf = Factory.createConfiguration(); String tableName = "employees"; HTable htable = new HTable(conf, tableName); Get get = new Get(Bytes.toBytes("Customer Service_100052")); get.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("p-first_name")); get.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("p-last_name")); Result result = htable.get(get); System.out.println( "result size: " + result.size() ); for (int i=0; i<result.size(); i++) { KeyValue kv = result.raw()[i]; System.out.println( String.format("result[%d]: %s", i, kv.toString()) ); } htable.close(); result size: 2 result[0]: Customer Service_100052/cf1:p-first_name/1386558250874/Put/vlen=8/ts=0 result[1]: Customer Service_100052/cf1:p-last_name/1386558250874/Put/vlen=5/ts=0

Result, KeyValue package org.apache.hadoop.hbase.client; public class Result { public List list(); public KeyValue[] raw(); public List getColumn(byte [] family, byte [] qualifier); public byte[] getValue(byte [] family, byte [] qualifier); } package org.apache.hadoop.hbase; public class KeyValue { public byte [] getRow(); public byte [] getFamily(); public byte [] getQualifier(); public byte [] getValue(); }

Get package org.apache.hadoop.hbase.client; public class Get { public Get(byte [] row); public Get addFamily(byte [] family); public Get addColumn(byte [] family, byte [] qualifier); public Get setTimeRange(long minStamp, long maxStamp); public Get setTimeStamp(long timestamp); }

... 練習時間... 請使用 Get 取得 Rowkey 為 “Production_493258” 的資料，且只須要 “d- from_date”, “d-to_date” 二個欄位

Put Configuration conf = Factory.createConfiguration(); String tableName = "employees"; HTable htable = new HTable(conf, tableName); Put put = new Put(Bytes.toBytes("Development_999999")); byte[] familyName = Bytes.toBytes("cf1"); String[] columnNames = new String[] { "p-birth_date", "p-first_name", "p-last_name", "p-gender", "p-hire_date", "d-from_date", "d-to_date" }; String[] values = new String[] { "1970-03-10", "John", "Li", "M", "1988-05-08", "1988-05-08", "9999-01-01" }; put.add(familyName, Bytes.toBytes(columnNames[0]), Bytes.toBytes( toDateValue(values[0]) )); put.add(familyName, Bytes.toBytes(columnNames[1]), Bytes.toBytes( values[1] )); put.add(familyName, Bytes.toBytes(columnNames[2]), Bytes.toBytes( values[2] )); put.add(familyName, Bytes.toBytes(columnNames[3]), Bytes.toBytes( values[3] )); put.add(familyName, Bytes.toBytes(columnNames[4]), Bytes.toBytes( toDateValue(values[4]) )); put.add(familyName, Bytes.toBytes(columnNames[5]), Bytes.toBytes( toDateValue(values[5]) )); put.add(familyName, Bytes.toBytes(columnNames[6]), Bytes.toBytes( toDateValue(values[6]) )); htable.put(put); htable.close(); $ hbase shell hbase(main):001:0> get ‘employees’, ‘Development_999999’ Example: 新增一筆 (row) 資料

Put package org.apache.hadoop.hbase.client; public class Put { public Put(byte [] row); public Put add(byte [] family, byte [] qualifier, byte [] value); public Put add(byte [] family, byte [] qualifier, long ts, byte [] value); public Put add(KeyValue kv); public Get setTimeStamp(long timestamp); }

Delete Configuration conf = Factory.createConfiguration(); String tableName = "employees"; HTable htable = new HTable(conf, tableName); Delete delete = new Delete(Bytes.toBytes("Development_999999")); htable.delete(delete); htable.close(); Example: 刪除一筆 (row) 資料

Delete Configuration conf = Factory.createConfiguration(); String tableName = "employees"; HTable htable = new HTable(conf, tableName); Delete delete = new Delete(Bytes.toBytes("Research_100001")); delete.deleteColumn(Bytes.toBytes("cf1"), Bytes.toBytes("d-from_date")); delete.deleteColumn(Bytes.toBytes("cf1"), Bytes.toBytes("d-to_date")); htable.delete(delete); htable.close(); Example: 刪除相同 (row), 其中二個 column 的資料

Delete package org.apache.hadoop.hbase.client; public class Delete { public Delete(byte [] row); public Delete deleteFamily(byte [] family); public Delete deleteColumn(byte [] family, byte [] qualifier); public Delete deleteColumn(byte [] family, byte [] qualifier, long timestamp); public Delete deleteColumns(byte [] family, byte [] qualifier); public Delete deleteColumns(byte [] family, byte [] qualifier, long timestamp); }

Scan Configuration conf = Factory.createConfiguration(); String tableName = "employees"; HTable htable = new HTable(conf, tableName); Scan scan = new Scan(); ResultScanner scanner = htable.getScanner(scan); for (Result res : scanner) { System.out.println(res); } scanner.close(); htable.close(); Example: 查詢 table 所有資料

Scan Configuration conf = Factory.createConfiguration(); String tableName = "employees"; HTable htable = new HTable(conf, tableName); Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("p-first_name")).addColumn(Bytes.toBytes(“cf1"), Bytes.toBytes("p-last_name")).setStartRow(Bytes.toBytes(“Sales_000000")).setStopRow(Bytes.toBytes(“Sales_999999")); ResultScanner scanner = htable.getScanner(scan); for (Result res : scanner) { System.out.println(res); } scanner.close(); htable.close(); Example: 查詢範圍 1.Rowkey 範圍 : [Sales_000000, Sales_999999) 2. 僅須要 p-first_name, p-last_name 二個欄位

ResultScanner package org.apache.hadoop.hbase.client; public interface ResultScanner extends Closeable, Iterable { public Result next(); public Result[] next(int nbRows); public void close(); }

Scan package org.apache.hadoop.hbase.client; public class Scan { public Scan(byte [] startRow, byte [] stopRow); public Scan addFamily(byte [] family); public Scan addColumn(byte [] family, byte [] qualifier); public Scan setFilter(Filter filter); public Scan setStartRow(byte [] startRow); public Scan setStopRow(byte [] stopRow); }

進階議題： Coprocessor 用來加速查詢 HBase 資料有兩種作法： Observer, Endpoint

進階議題： Coprocessor Observer Coprocessor Image from : HBase The Definitive Guide

進階議題： Coprocessor Endpoint Coprocessor Image from : HBase The Definitive Guide

Schema Design “Denormalization” 反正規化

Normalization Tables Columns Primary Keys Foreign Keys (one to many) Junction Tables (many to many)

Problem Data scale is an implementation concern. Too many joins -More table - more join lookup steps - more slower

Denormalization rowkey persondept birth_datefirst_namelast_namehire_datenamefrom_dateto_date 100011953-09-02GeorgiFacello1986-06-26Development1986-06-269999-01-01 100021964-06-02BezalelSimmel1985-11-21Sales1996-08-039999-01-01 100031959-12-03PartoBamford1986-08-28Production1995-12-039999-01-01 100041954-05-01ChristianKoblick1986-12-01Production1986-12-019999-01-01 Table: employee Two Family

Denormalization Table: employee rowkeyp-birth_datep-first_namep-last_namep-hire_dated-named-from_dated-to_date 100011953-09-02GeorgiFacello1986-06-26Development1986-06-269999-01-01 100021964-06-02BezalelSimmel1985-11-21Sales1996-08-039999-01-01 100031959-12-03PartoBamford1986-08-28Production1995-12-039999-01-01 100041954-05-01ChristianKoblick1986-12-01Production1986-12-019999-01-01 d=department, p=person One Family

Data in the HBase

HBase 設計經驗討論

Schema Design “Key design”

Schema Design 短胖 Table ? 高瘦 Table ?

Key Design 如何進行由大到小 ? 亂數產生 ? 時間順序 ?

Rowkey scan Get object - 查詢相同的 rowkey Scan object - 多樣化查詢 rowkey - 由左自右, 字典排序 - 表示方式 : [startRow, endRow) 包含排除

Partial rowkey scans Rowkey 結構 - - - - 四種不同的資料 rowkey 掃描方式 - - - - - - - - - -

Key design 的幾種方式 Sequencial key Salted key Field swap/promotion Random key

Sequencial key, Timeserial key 寫入相同的 region 不利頻繁的寫入 Region Hotspotting A0001 A0002 A0003 A0004 2010-0101- 2010-0102- : : : 2010-0301- 2010-0302- : : : 2010-0801- : 員工編號產品交易紀錄

Salted key 演算法對 rowkey 加工所有 row 平均散佈到各 Region Server A0001 A0002 A0003 A0004 String rowkey = id.reverse() 1000A 2000A 3000A 4000A Salting 員工編號

Field swap/promotion Swap, 交換 - - ——> - Promotion, 提昇 rowkeybirth_datefirst_namelast_namehire_datenamefrom_dateto_date 100011953-09-02GeorgiFacello1986-06-26Development1986-06-269999-01-01 100021964-06-02BezalelSimmel1985-11-21Sales1996-08-039999-01-01 persondept rowkeybirth_datefirst_namelast_namehire_datefrom_dateto_date Development-100011953-09-02GeorgiFacello1986-06-26 9999-01-01 Sales-100021964-06-02BezalelSimmel1985-11-211996-08-039999-01-01

Random key 隨機分布的 rowkey byte[] rowkey = MD5(timestamp) 1.Hash function 2.16 bytes(128 bit) 3. 不可逆 4. 廣泛運用於錯誤檢查 1.Hash function 2.16 bytes(128 bit) 3. 不可逆 4. 廣泛運用於錯誤檢查

Performance Reference: HBase: The Definitive Guide 運用不同的 Key 設計方式，將影響資料寫入或讀出的效能，其關係如下圖：

Transaction HBase is not an ACID compliant database. However, it does guarantee certain specific properties. http://hbase.apache.org/acid-semantics.html

Single Row Transaction Row atomic (Atomic) Row lock (Isolation) Same row, Same File (Consistency) Durability

Transaction Car_idDateBrandPerson_ID C0012012/06/15NISSANA000001 C0022012/06/15AudiA000001 C0032012/06/15BENZA000002 Person_IDName A000001Hubert A000002Mike A000003Jack

Transaction Person_IDNameC001_BrandC001_DateC002_BrandC002_Date A000001HubertNISSAN2012/06/15Audi2012/06/15

Big Data Storage (III) -- 使用 HareDB HBase Client 工具.

Similar presentations

Presentation on theme: "Big Data Storage (III) -- 使用 HareDB HBase Client 工具."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Big Data Storage (III) -- 使用 HareDB HBase Client 工具.

Similar presentations

Presentation on theme: "Big Data Storage (III) -- 使用 HareDB HBase Client 工具."— Presentation transcript:

Similar presentations

About project

Feedback