Secondary Indexing in Phoenix

Secondary Indexing in Phoenix
SF HBase User Group – September 26, 2013 James Taylor Phoenix Lead Software Engineer Jesse Yates HBase Committer Software Engineer

Agenda About Indexes In Phoenix Immutable Indexes Mutable Indexes
Demo! Roadmap SF HUG – Sept 2013

Phoenix Open Source “SQL-skin” on HBase JDBC Driver Faster than HBase
“SQL-skin” on HBase Everyone knows SQL! JDBC Driver Plug-and-play Faster than HBase in some cases SF HUG – Sept 2013

Secondary Indexes Sort on ‘orthogonal’ axis Save full-table scan
Expected database feature Hard in HBase b/c of ACID considerations SF HUG – Sept 2013

Indexes In Phoenix Creating an index Deciding when an index is used
DDL statement Creates another HBase table behind the scenes Deciding when an index is used Transparent to the user (but user can override through hint) No stats yet Knowing which table was used EXPLAIN <query> SF HUG – Sept 2013

Creating Indexes In Phoenix
CREATE INDEX <index_name> ON <table_name>(<columns_to_index>…) INCLUDE (<columns_to_cover>…); Optionally add IMMUTABLE_ROWS=true property to CREATE TABLE statement SF HUG – Sept 2013

Creating Indexes In Phoenix
CREATE TABLE baby_names ( name VARCHAR PRIMARY KEY, occurrences BIGINT); CREATE INDEX baby_names_idx ON baby_names(occurrences DESC, name); SF HUG – Sept 2013

Deciding When To Use Transparent to the user
Query optimizer does the following: Compiles query against data and index tables Chooses “best” one (not yet stats driven) Can index even be used? Active, Using columns contained in index (no join back to data table) Can ORDER BY be removed? Which plan forms the longest start/stop scan key? SF HUG – Sept 2013

Deciding When To Use SELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10; SELECT name, occurrences FROM baby_names_idx LIMIT 10 ORDER BY not necessary since rows in index table are already ordered this way SF HUG – Sept 2013

Deciding When To Use SELECT name, occurrences FROM baby_names WHERE occurrences > 100; SELECT name, occurrences FROM baby_names_idx Uses index, since we can form start row for scan based on filter of occurrences SF HUG – Sept 2013

Deciding When To Use SELECT /* NO_INDEX */ name FROM baby_names WHERE occurrences > 100; SELECT /*+ INDEX (baby_names baby_names_idx other_baby_names_idx) */ name,occurrences Override optimizer by telling it not to use any indexes Tell optimizer priority in which it should consider using indexes` SF HUG – Sept 2013

Knowing which table was used
EXPLAIN SELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10; CLIENT PARALLEL 1-WAY FULL SCAN OVER BABY_NAMES_IDX SERVER FILTER BY PageFilter 10 CLIENT 10 ROW LIMIT SF HUG – Sept 2013

Immutable Indexes Immutable Rows Much easier to implement
Client-managed Bulk-loadable e.g. stats, historical data SF HUG – Sept 2013

Mutable Indexes Global Index Change row state
Common use-case “expected” implementation Covered Columns/Join Index SF HUG – Sept 2013

1.5 years* SF HUG – Sept 2013

Internals Index Management Recovery Mechanism Build index updates
Ensures index is ‘cleaned up’ Recovery Mechanism Ensures index updates are “ACID” SF HUG – Sept 2013

“There is no magic” - Every programming hipster (chipster)
SF HUG – Sept 2013

Mutable Indexing: Standard Write Path
Client HRegion RegionCoprocessorHost WAL RegionCoprocessorHost MemStore SF HUG – Sept 2013

Mutable Indexing Codec Indexer Builder WAL Updater Durable!
Region Coprocessor Host WAL Updater WAL Durable! Index Table Region Coprocessor Host Indexer SF HUG – Sept 2013

Index Management Lives within a RegionCoprocesorObserver
Access to the local HRegion Specifies the mutations to apply to the index tables public interface IndexBuilder { public void setup(RegionCoprocessorEnvironment env); public Map<Mutation, String> getIndexUpdate(Put put); public Map<Mutation, String> getIndexUpdate(Delete delete); } SF HUG – Sept 2013

Why not write my own? Managing Cleanup Abstract access to HRegion
Efficient point-in-time correctness Performance tricks Abstract access to HRegion Minimal network hops Sorting correctness Phoenix typing ensures correct index sorting SF HUG – Sept 2013

Example: Managing Cleanup
Updates can arrive out of order Client-managed timestamps ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Fam2 Qual2 12 val2 13 val3 SF HUG – Sept 2013

Row1 Fam Qual 11 val4 ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 Fam2 Qual2 12 val2 13 val3 SF HUG – Sept 2013

ROW FAMILY QUALIFIER TS VALUE Row1 Fam Qual 10 val1 11 val4 Fam2 Qual2 12 val2 13 val3 SF HUG – Sept 2013

Surprisingly hard! Managing Cleanup History “roll up”
Out-of-order Updates Point-in-time correctness Multiple Timestamps per Mutation Delete vs. DeleteColumn vs. DeleteFamily Surprisingly hard! SF HUG – Sept 2013

Phoenix Index Builder Much simpler than full index management
Hides cleanup considerations Abstracted access to local state public interface IndexCodec{ public void initialize(RegionCoprocessorEnvironment env); public Iterable<IndexUpdate> getIndexDeletes(TableState state); public Iterable<IndexUpdate> getIndexUpserts(TableState state); } SF HUG – Sept 2013

Phoenix Index Codec SF HUG – Sept 2013
8pt font, <200 lines, including comments SF HUG – Sept 2013

Dude, where’s my data? Ensuring Correctness SF HUG – Sept 2013

HBase ACID Does NOT give you: Does give you: Cross-row consistency
Cross-table consistency Does give you: Durable data on success Visibility on success without partial rows SF HUG – Sept 2013

Key Observation “Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.” - Lars Hofhansl SF HUG – Sept 2013

Idempotent Index Updates
Doesn’t need full transactions Replay as many times as needed Can tolerate a little lag As long as we get the order right SF HUG – Sept 2013

Failure Recovery Custom WALEditCodec Custom WAL Reader
Encodes index updates Supports compressed WAL Custom WAL Reader Replay index updates from WAL <property> <name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value> </property> <name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value> SF HUG – Sept 2013

Failure Situations Any time before WAL, client replay
Any time after WAL, HBase replay All-or-nothing SF HUG – Sept 2013

Failure #1: Before WAL Client HRegion SF HUG – Sept 2013
RegionCoprocessorHost WAL RegionCoprocessorHost MemStore SF HUG – Sept 2013

Failure #1: Before WAL Client HRegion RegionCoprocessorHost WAL No problem! No data is stored in the WAL, client just retries entire update. RegionCoprocessorHost MemStore SF HUG – Sept 2013

Failure #2: After WAL Client HRegion SF HUG – Sept 2013
RegionCoprocessorHost WAL RegionCoprocessorHost MemStore SF HUG – Sept 2013

Failure #2: After WAL WAL replayed via usual replay mechanisms Client
HRegion RegionCoprocessorHost WAL WAL replayed via usual replay mechanisms RegionCoprocessorHost MemStore SF HUG – Sept 2013

“Magic” Server-short circuit Lazy load columns Skip-scan for cache
Parallel Writing Custom MemStore in Indexer Caching HTables Pluggable Index Writing/Failure Policy Minimize byte[] copy (ImmutableBytesPtr) SF HUG – Sept 2013

Demo SF HUG – Sept 2013

Roadmap Next release of Phoenix Performance improvements
Functional Indexes Other indexing approaches (Huawei, SEP) SF HUG – Sept 2013

Open Source! Main: https://github.com/forcedotcom/phoenix Indexing:
SF HUG – Sept 2013

(obligatory hiring slide)
We’re Hiring! (obligatory hiring slide)

Questions? Comments? jtaylor@salesforce.com @jamesplusplus
@jesse_yates

Appendix AsyncHBaseWriter github.com/jyates/phoenix/tree/async-hbase
2x+ slower* * Written in 2hrs, not 100% correct either SF HUG – Sept 2013

Secondary Indexing in Phoenix

Similar presentations

Presentation on theme: "Secondary Indexing in Phoenix"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Secondary Indexing in Phoenix

Similar presentations

Presentation on theme: "Secondary Indexing in Phoenix"— Presentation transcript:

Similar presentations

About project

Feedback