SwatI Agarwal, Thomas Pan eBay Inc.

Slides:



Advertisements
Similar presentations
© Hortonworks Inc MapReduce over snapshots HBASE-8369 Enis Soztutar Enis [at] apache [dot] Page 1.
Advertisements

Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Overview of MapReduce and Hadoop
Big Data Working with Terabytes in SQL Server Andrew Novick
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Big Data Technologies for InfoSec Dive Deeper. See Further. Ram Sripracha UCLA / Sift Security.
Mixing Low Latency with Analytical Workloads for Customer Experience Management Neil Ferguson, Development Lead, NICE Systems.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Hypertable Doug Judd Background  Zvents plan is to become the “Google” of local search  Identified the need for a scalable DB 
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Windows 2000 Advanced Server and Clustering Prepared by: Tetsu Nagayama Russ Smith Dale Pena.
The Multiple Uses of HBase Jean-Daniel Cryans, DB Berlin Buzzwords, Germany, June 7 th,
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Ch 6. Performance Rating Windows 7 adjusts itself to match the ability of the hardware –Aero Theme v. Windows Basic –Gaming features –TV recording –Video.
Goodbye rows and tables, hello documents and collections.
Introduction to Hadoop and HDFS
L/O/G/O 云端的小飞象系列报告之二 Cloud 组. L/O/G/O Hadoop in SIGMOD
LOGO Discussion Zhang Gang 2012/11/8. Discussion Progress on HBase 1 Cassandra or HBase 2.
Google Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Bigtable: A Distributed Storage System for Structured Data 1.
Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Development of Hybrid SQL/NoSQL PanDA Metadata Storage PanDA/ CERN IT-SDC meeting Dec 02, 2014 Marina Golosova and Maria Grigorieva BigData Technologies.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Data storing and data access. Adding a row with Java API import org.apache.hadoop.hbase.* 1.Configuration creation Configuration config = HBaseConfiguration.create();
CS 347Lecture 9B1 CS 347: Parallel and Distributed Data Management Notes 13: BigTable, HBASE, Cassandra Hector Garcia-Molina.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Supporting Large-scale Social Media Data Analyses with Customizable Indexing Techniques on NoSQL Databases.
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Cloudera Kudu Introduction
Bigtable: A Distributed Storage System for Structured Data
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Virtual Machine Movement and Hyper-V Replica
Next Generation of Apache Hadoop MapReduce Owen
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
BIG DATA/ Hadoop Interview Questions.
Sql Server Architecture for World Domination Tristan Wilson.
Bigtable A Distributed Storage System for Structured Data.
Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.
NetFlow Analyzer Best Practices, Tips, Tricks. Agenda Professional vs Enterprise Edition System Requirements Storage Settings Performance Tuning Configure.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Bigtable A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data
How did it start? • At Google • • • • Lots of semi structured data
Large-scale file systems and Map-Reduce
CLOUDERA TRAINING For Apache HBase
Learning MongoDB ZhangGang
CSE-291 (Cloud Computing) Fall 2016
Hadoop Clusters Tess Fulkerson.
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
CS6604 Digital Libraries IDEAL Webpages Presented by
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Hbase – NoSQL Database Presented By: 13MCEC13.
Presentation transcript:

SwatI Agarwal, Thomas Pan eBay Inc. Hbase Operations SwatI Agarwal, Thomas Pan eBay Inc.

Overview Pre-production cluster handling production data sets and work loads Data storage for listed item drives eBay Search Indexing Data storage for ranking data in the future Leverage map reduce in the same cluster to build search index

HBASE CluSTER 225 Data nodes Region server Task Tracker Data Node 14 Enterprise Nodes Primary Name Node Secondary Name Node Job Tracker Node 5 ZooKeeper Nodes HBase Master CLI Node Ganglia Reporting Nodes Spare Nodes for Failover Node Hardware 12 2TB hard-drives 72GB RAM 24 cores under hyper-threading

Cluster Level Configuration

Hadoop/HBase Configuration Region Server HBase Region Server JVM Heap Size: -Xmx15GB HBase Region Server JVM NewSize: -XX:MaxNewSize=150m -XX:NewSize=100m (XX:MaxNewSize=512m) Number of HBase Region Server Handlers: hbase.regionserver.handler.count=50 (Matching number of active regions) HBase Region Server Lease Period: hbase.regionserver.lease.period=300000 (5 minutes for server side timeout as lease timeout) Region Size: hbase.hregion.max.filesize=53687091200 (50GB to avoid automatic split) Turn off auto major compaction: hbase.hregion.majorcompaction=0 Read/Write cache configuration HBase block cache size (read cache): hfile.block.cache.size=0.65 (65% of 15GB ~= 9.75GB)  HBase Region Server Memstore Upper Limit: hbase.regionserver.global.memstore.upperLimit=0.10 HBase Region Server Memstore Lower Limit: hbase.regionserver.global.memstore.lowerLimit=0.09 Scanner caching: hbase.client.scanner.caching=200 HBase Block Multiplier: hbase.hregion.memstore.block.multiplier=4 (For memstore flush issue) Client settings HBase RPC Timeout: hbase.rpc.timeout=600000 (10 minutes for client side timeout) HBase Client Pause: hbase.client.pause=3000 Zoo Keeper Maximum Client Count: hbase.zookeeper.property.maxClientCnxns=5000 HDFS Block Size: dfs.block.size=134217728 (128MB) Data node xciever count: dfs.datanode.max.xcievers=131072 Number of mappers per node: mapred.tasktracker.map.tasks.maximum=8 Number of reducers per node: mapred.tasktracker.reduce.tasks.maximum=6 Swap turned off

HBase Tables Multiple tables in a single cluster Multiple column families per table Number of columns per column family: < 200. 1.45 billion rows total Max row size: ~20KB Average row size: ~10KB 13.01TB data Bulk load speed: ~500 Million items in 30 minutes Random write updates: 25K records per minute Scan speed: 2004 rows per second per region server (average version 3), 465 rows per second per region server (average version 10) Scan speed with filters: 325~353 rows per second per region server

Hbase Tables (cont.) Pre-split 3600 Regions per table Table is split into roughly equal sized regions. Important to pick well distributed keys Currently using bit reversal Region split has been disabled by setting very large region size. Major compaction on demand Purge rows periodically Balance regions among region servers on demand

RowKey Scheme and Sharding 64-bit unsigned integer Bit reversal of document id Document ID: 2 RowKey: 0x4000000000000000 HBase creates regions with even RowKey range Each map task maps to each region.

MoNITORING Systems Ganglia Nagios Alerts Table consistency – hbck Table balancing – in-house tool Region size CPU usage Memory usage Disk failures HDFS block count …… In-house Job Monitoring System Based on OpenTSDB Job Counters

CHALLENGES/ISSUES HBase stability HBase health monitoring HDFS issues can impact Hbase, such as name node failure Map/Reduce jobs can impact HBase region servers, such as high memory usage Region stuck in migration HBase health monitoring HBase table maintenance HBase table regions become unbalanced Major compaction after row purge and updates Software Upgrades cause big downtime Normal hardware failures may cause issues Stuck regions due to failed hard disk Region servers were deadlocked due to jvm Testing

Future Direction High scalability High availability Adopt co-processor Scale out a table with more regions Scale out the whole cluster with more data High availability No downtime for upgrades Adopt co-processor Near-Real-Time Indexing

Community Acknowledgement Kannan Muthukkaruppan Karthik Ranganathan Lars George Michael Stack Ted Yu Todd Lipcon Konstantin Shvachko