Optimizing HBase scanner performance

Slides:



Advertisements
Similar presentations
ITEC 352 Lecture 25 Memory(2). Review RAM –Why it isnt on the CPU –What it is made of –Building blocks to black boxes –How it is accessed –Problems with.
Advertisements

Natural Data Clustering: Why Nested Loops Win So Often May, 2008 ©2008 Dan Tow, All rights reserved SingingSQL.
Anshul Kumar, CSE IITD CSL718 : Memory Hierarchy Cache Performance Improvement 23rd Feb, 2006.
CS525: Special Topics in DBs Large-Scale Data Management HBase Spring 2013 WPI, Mohamed Eltabakh 1.
Computer Forensics BACS 371
IELM 230: File Storage and Indexes Agenda: - Physical storage of data in Relational DB’s - Indexes and other means to speed Data access - Defining indexes.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
Multiprocessing Memory Management
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Computer Organization and Architecture
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
-A APACHE HADOOP PROJECT
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
FALL 2004CENG 351 File Structures1 Indexing Reference: Sections
Secondary Storage Management Hank Levy. 8/7/20152 Secondary Storage • Secondary Storage is usually: –anything outside of “primary memory” –storage that.
Distributed storage for structured data
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
1 Physical Data Organization and Indexing Lecture 14.
1 File Systems Chapter Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems.
Review of Memory Management, Virtual Memory CS448.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Search Engine Caching Rank-preserving two-level caching for scalable search engines, Paricia Correia Saraiva et al, September 2001
Data storing and data access. Plan Basic Java API for HBase – demo Bulk data loading Hands-on – Distributed storage for user files SQL on noSQL Summary.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
Ronnie Saurenmann Principal Architect Microsoft Switzerland.
Key/Value Stores CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Data storing and data access. Adding a row with Java API import org.apache.hadoop.hbase.* 1.Configuration creation Configuration config = HBaseConfiguration.create();
Maintaining a Database Access Project 3. 2 What is Database Maintenance ?  Maintaining a database means modifying the data to keep it up-to-date. This.
Distributed Networks & Systems Lab Distributed Networks and Systems(DNS) Lab, Department of Electronics and Computer Engineering Chonnam National University.
HBase Elke A. Rundensteiner Fall 2013
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
1 Memory Management. 2 Fixed Partitions Legend Free Space 0k 4k 16k 64k 128k Internal fragmentation (cannot be reallocated) Divide memory into n (possible.
CS333 Intro to Operating Systems Jonathan Walpole.
Parallel and Distributed Simulation Time Parallel Simulation.
Intuitions for Scaling Data-Centric Architectures
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Bigtable: A Distributed Storage System for Structured Data
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
Apache Accumulo CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
Same Plan Different Performance Mauro Pagano. Consultant/Developer/Analyst Oracle  Enkitec  Accenture DBPerf and SQL Tuning Training Tools (SQLT, SQLd360,
Bigtable A Distributed Storage System for Structured Data.
Jonathan Walpole Computer Science Portland State University
Chapter 2 Memory and process management
CSC 4250 Computer Architectures
How will execution time grow with SIZE?
CSE-291 (Cloud Computing) Fall 2016
The University of Adelaide, School of Computer Science
File System Structure How do I organize a disk into a file system?
5.2 Eleven Advanced Optimizations of Cache Performance
Database Implementation Issues
MON TUE WED THU
November 14 6 classes to go! Read
Introduction to Database Systems
Sun Mon Tue Wed Thu Fri Sat
Spreadsheets, Modelling & Databases
Secondary Storage Management Hank Levy
Sun Mon Tue Wed Thu Fri Sat
2016 | 10 OCT SUN MON TUE WED THU FRI SAT
Sun Mon Tue Wed Thu Fri Sat
LSbM-tree:一个读写兼优的大数据存储结构
Database Implementation Issues
CSE 542: Operating Systems
Presentation transcript:

Optimizing HBase scanner performance Mikhail Bautin Software Engineer 01/19/2012

HBase Scanners What happens on a Get RegionScanner StoreScanner ColumnFamily1 ColumnFamily2 StoreScanner StoreScanner Store = (Region, CF) . . . StoreFileScanner . . . StoreFileScanner StoreFileScanner (R1,C1,T3) (R1,C2,T2) (R1,C2,T1) (R1,C1,T1) (R1,C2,T3) (R2,C1,T2) (R2,C2,T1) . . .

HBase Scanner State What happens on a next() RegionScanner ColumnFamily1 Priority Queue ColumnFamily2 StoreScanner StoreScanner Store = (Region, CF) Priority Queue Priority Queue . . . StoreFileScanner . . . StoreFileScanner StoreFileScanner Current KeyValue Current KeyValue Current KeyValue

Avoiding next() on StoreFileScanner Every next() call may result in disk I/O HBASE-4433: avoid extra next if done with row/column (Kannan) An optimization for queries specifying a column set INCLUDE_AND_SEEK_NEXT_COL INCLUDE_AND_SEEK_NEXT_ROW HBASE-4434: Don't do HFile Scanner next() unless the next KV is needed (Kannan) Avoid aggressive pre-fetching

Simple ROWCOL Bloom Filters Do we have to read all of these files? Query: (R1, C3) Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1

Simple ROWCOL Bloom Filters In some cases, we only have to read one file Query: (R1, C3) Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1

Multi-column Bloom Filters (HBASE-2794) ROWCOL Bloom filters for multi-column queries Query: C1 and C3 in all rows Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1

Multi-column Bloom Filters (HBASE-2794) ROWCOL Bloom filters for multi-column queries Query: C1 and C3 in all rows—seek to (R1, C1) Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1

Multi-column Bloom Filters (HBASE-2794) ROWCOL Bloom filters for multi-column queries Query: C1 and C3 in all rows—seek to (R1, C3) Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 Fake key: (R1, end of C3) Fake key: (R1, end of C3)

Multi-column Bloom Filters (HBASE-2794) ROWCOL Bloom filters for multi-column queries Query: C1 and C3 in all rows—seek to (R2, C1) Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 (R2, C1, T1) (R2, C1, T2) wins by timestamp (R2, C1, T1)

Multi-column Bloom Filters (HBASE-2794) ROWCOL Bloom filters for multi-column queries Query: C1 and C3 in all rows—seek to (R2, C3) Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 Fake key: (R2, end of C3) Fake key: (R2, end of C3) (R2, C3, T1)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 Fake key: (R1, C1, T4) Fake key: (R1, C1, T3) Fake key: (R1, C1, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 (R1, C1, T4) Fake key: (R1, C1, T3) Fake key: (R1, C1, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 Fake key: (R1, C3, T4) Fake key: (R1, C3, T3) Fake key: (R1, C3, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 (R2, C1, T1) Fake key: (R1, C3, T3) Fake key: (R1, C3, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 (R2, C1, T1) (R1, C3, T2) is next Fake key: (R1, C3, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 (R2, C1, T1) Fake key: (R2, C1, T3) To be selected next. Fake key: (R2, C1, T2)

(R2, C1, T2) wins by timestamp Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 T2 – T3 T1 – T4 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 (R2, C1, T1) (R2, C1, T2) wins by timestamp Fake key: (R2, C1, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 Fake key: (R2, C3, T4) Fake key: (R2, C3, T3) Fake key: (R2, C3, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 EOF Real seek to (R2, C3, T3) Fake key: (R2, C3, T2)

Lazy Seek (HBASE-4465) Optimizing for reading recent data T1 – T2 Row Col TS R1 C1 T2 T1 C2 R2 C3 Row Col TS R1 C1 T3 C2 C3 T2 R2 Row Col TS R1 C1 T4 C2 T2 R2 T1 EOF EOF (R2, C3, T1)

Top-of-the-row seek Some applications do not use DeleteFamily We always seek to the top of the row first DeleteFamily comes before all columns, i.e. at (R1, empty column) Even if we only need (R1, C1), there might be a DeleteFamily for R1 Some applications do not even use DeleteFamily Two fixes by Liyin Tang: Utilize existing ROWCOL Bloom filter (HBASE-4469) Added a separate ROW-only Bloom filter for DeleteFamily(HBASE- 4532)

Seek on deleted KV (HBASE-4585) What if the requested column has been deleted? We are requesting C1, C2, ..., Cn What if we see a delete marker for Ci? Previously, we would keep calling next() Now, we seek to (i + 1)’th requested column (also a fix by Liyin)

Data block read requests (dark launch) Thu, Sep 15 – Sun, Sep 25 2011 Fri Sep 16th vs. Sep 23rd: 45% savings in logical block read requests (cache hits + misses) Pushed on Tue Sep 20th: No extra next when done with column/row (HBASE-4433) No KV prefetch (HBASE-4434) Lazy Seek (HBASE-4465)

Data block read requests (dark launch) Sun, Sep 25 – Mon, Oct 3 2011 Sun Sep 25th vs. Oct 2nd: 33% savings in logical block read requests (cache hits + misses) Pushed on Fri Sep 30th: Avoid top-of-the-row seek (HBASE-4469, Liyin) Off-peak compactions (HBASE-4463, Karthik)

Data block cache misses (dark launch) 20.6 K (Mon Sep 19th) -> 11.8 K (Mon Sep 26th) -> 9.8 K (Mon Oct 3rd) 52% savings (42% and then 17% more) No next KV prefetch No next() when done with row/column Lazy Seek No top-of-the-row seek Off-peak compactios

Avoid loading previous block (HBASE-4443) We sometimes go to previous block on exact match Future work Suppose the first key of a block matches (Row, Column) But maybe there is an earlier key that would also match? We load the previous block to find out Possible fixes: Track deletes and optimize the MAX_VERSIONS=1 case Add last key in block to index (increases index size)

Top-of-the-column seek (HBASE-4962) Some applications do not use DeleteColumn Future work DeleteColumn deletes all versions of a particular column Comes before all Puts for a (Row, Column) Slows down timestamp range queries Proposed solution: Add a (Row, Column) Bloom filter for DeleteColumn only Seek to (Row, Column, T2) for a [T1, T2] range query