Presentation is loading. Please wait.

Presentation is loading. Please wait.

Phoenix We put the SQL back in NoSQL James Taylor Demos:

Similar presentations


Presentation on theme: "Phoenix We put the SQL back in NoSQL James Taylor Demos:"— Presentation transcript:

1 James Taylor jtaylor@salesforce.com
Phoenix We put the SQL back in NoSQL James Taylor Demos: GOC demo – popups and filters Pulse – show Splunk dashboard and talk to the process – Shriman Stats.pl for GSI and SDA - Saran

2 Agenda What is Phoenix? Why SQL? What is next? Q&A Completed
Add stories the team is planning to work on for the next sprint – List in priority order

3 What is Phoenix? SQL layer on top of HBase
Delivered as an embedded JDBC driver Targets low latency queries over HBase data Columns modeled as multi-part row key and key values Versioned schema repository Query engine transforms SQL into puts, delete, scans Uses native HBase APIs instead of Map/Reduce Brings the computation to the data: Aggregate, insert, delete datathrough coprocessors Push predicates through custom filters 100% Java Open source here: Completed Add stories the team is planning to work on for the next sprint – List in priority order

4 Why SQL? Broaden HBase adoption Give folks an API they already know
Reduce the amount of code users need to write SELECT TRUNC(date,'DAY’), AVG(cpu_usage) FROM web_stat WHERE domain LIKE 'Salesforce%’ GROUP BY TRUNC(date,'DAY') Performance optimizations transparent to the user Aggregation Stats gathering Secondary indexing Leverage existing tooling SQL client Completed Add stories the team is planning to work on for the next sprint – List in priority order

5 But I can’t surface x,y,z in SQL…
Completed Add stories the team is planning to work on for the next sprint – List in priority order

6 But I can’t surface x,y,z in SQL…
Completed Add stories the team is planning to work on for the next sprint – List in priority order

7 But I can’t surface x,y,z in SQL…
Define multi-part row keys Completed Add stories the team is planning to work on for the next sprint – List in priority order

8 But I can’t surface x,y,z in SQL…
Define multi-part row keys CREATE TABLE web_stat ( domain VARCHAR NOT NULL, feature VARCHAR NOT NULL, date DATE NOT NULL, usage BIGINT, active_visitor INTEGER, CONSTRAINT pk PRIMARY KEY (domain, feature, date) ); Completed Add stories the team is planning to work on for the next sprint – List in priority order

9 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang custom function Completed Add stories the team is planning to work on for the next sprint – List in priority order

10 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang custom function Derive class from ScalarFunction Add annotation to define name, args, and types Implement evaluate method Register function (blog on this coming soon: Completed Add stories the team is planning to work on for the next sprint – List in priority order

11 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Completed Add stories the team is planning to work on for the next sprint – List in priority order

12 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Set CURRENT_SCN property on connection to earlier timestamp Queries will see only rows before timestamp Schema in-place at that point in time will be used Completed Add stories the team is planning to work on for the next sprint – List in priority order

13 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Completed Add stories the team is planning to work on for the next sprint – List in priority order

14 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Declare new new child entity as nested table Prefix column qualifier of nested entities with: table name + child primary key + child column name Restrict join to be only through parent/child relation Execute query by scanning nested child rows TBD: Completed Add stories the team is planning to work on for the next sprint – List in priority order

15 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Completed Add stories the team is planning to work on for the next sprint – List in priority order

16 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes “Salt” row key on upsert by mod-ing with cluster size Query for fully qualified key by inserting salt byte Range scan by concatenating results of scan over all possible salt bytes Or alternately Define column used for hash to derive row key prefix TBD: Completed Add stories the team is planning to work on for the next sprint – List in priority order

17 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Add stories the team is planning to work on for the next sprint – List in priority order

18 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Surface the HBase put-and-increment functionality through the standard SQL sequence support TBD: Add stories the team is planning to work on for the next sprint – List in priority order

19 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Add stories the team is planning to work on for the next sprint – List in priority order

20 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Support the standard SQL TABLESAMPLE clause Implement filter that uses a skip next hint Base next key on the table stats “guide posts” TBD: Add stories the team is planning to work on for the next sprint – List in priority order

21 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Declare columns at query time Add stories the team is planning to work on for the next sprint – List in priority order

22 But I can’t surface x,y,z in SQL…
Define multi-part row keys Implement my whizz-bang built-in function Run snapshot in time queries Nest child entities inside of a row Prevent hot spotting on writes Increment atomic counter Sample table data Declare columns at query time SELECT col1,col2,col3 FROM my_table(col2 VARCHAR, col3 INTEGER) WHERE col3 > 10 TBD: Add stories the team is planning to work on for the next sprint – List in priority order

23 Conclusion Phoenix fits the 80/20 use case rule
Let us know what you’d like to see added Get involved – we need your help! Think about how your new feature can be surfaced in SQL Add stories the team is planning to work on for the next sprint – List in priority order

24 Thank you! Questions/comments?

25 Product Metrics HTable
Query Processing Product Metrics HTable Row Key ORG_ID DATE FEATURE TXNS Key Values IO_TIME RESPONSE_TIME Scan Start key: ORG_ID (:1) + DATE (:2) End key: ORG_ID (:1) + DATE (:3) Filter Filter: IO_TIME > 100 Aggregation Intercepts scan on region server Builds map of distinct FEATURE values Returns one row per distinct group Client does final merge SELECT feature, SUM(txns) FROM product_metrics WHERE org_id = :1 AND date >= :2 AND date <= :3 AND io_time > 100 GROUP BY feature Add stories the team is planning to work on for the next sprint – List in priority order

26 Phoenix Query Optimizations
Start/stop key of scan based on AND-ed columns Through SUBSTR, ROUND, TRUNC, LIKE Parallelized on client by chunking over start/stop key of scan Aggregation on region-servers through coprocessor Inline for GROUP BY over row key ordered columns In memory map per group otherwise WHERE clause executed through custom filters Incremental evaluation with early termination Evaluated through byte pointers IN and OR over same column (in progress) Becomes batched get or filter with next row hint Top N queries (future) Through coprocessor keeping top N rows TABLESAMPLE (future) Becomes filter with next row hint Completed Add stories the team is planning to work on for the next sprint – List in priority order

27 Phoenix Performance Add stories the team is planning to work on for the next sprint – List in priority order

28 Phoenix Performance Completed
Add stories the team is planning to work on for the next sprint – List in priority order


Download ppt "Phoenix We put the SQL back in NoSQL James Taylor Demos:"

Similar presentations


Ads by Google