Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI5570 Large Scale Data Processing Systems

Similar presentations


Presentation on theme: "CSCI5570 Large Scale Data Processing Systems"— Presentation transcript:

1 CSCI5570 Large Scale Data Processing Systems
NewSQL James Cheng CSE, CUHK Slide Ack.: modified based on the slides from Joy Arulraj

2 The End of an Architectural Era (It’s time for a complete rewrite)
Michael Stonebraker, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, Pat Helland VLDB 2007

3 Outline The current state of the world
Why current architecture is aged How to beat it by a factor of 50 in every market I can think of Implications for the research community long in the tooth = old, aged

4 Motivation System R (1974) Hardware has changed a lot over 3 decades
Seminal database design from IBM First implementation of SQL Hardware has changed a lot over 3 decades Databases still based on System R’s design Includes DB2, SQL server, etc.

5 System R Architectural Features
Disk oriented storage and indexing structures Multi-threading to hide latency Locking-based concurrency control mechanisms Log-based recovery

6 Current DBMS Standard Store fields in one record contiguously on disk
Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries Conventional (row-oriented) query optimizer and executor

7 OLTP Bottlenecks

8 OLTP: Where does the time go?

9 Terminology – “Row Store”
Record 1 Record 2 Record 3 Record 4 E.g. DB2, Oracle, Sybase, SQLServer, …

10 Row Stores Can insert and delete a record in one physical write
Good for business data processing And that was what System R and Ingres were gunning for

11 Extensions to Row Stores Over the Years
Architectural stuff (Shared nothing, shared disk) Object relational stuff (user-defined types and functions) XML stuff Warehouse stuff (materialized views, bit map indexes)

12 At This Point, RDBMS is aged
There are at least 4 (non trivial) markets where a row store can be clobbered by a specialized architecture (CIDR 07 paper) Warehouses (Vertica, SybaseIQ, KX, …) Text (Google, Yahoo, …) Scientific data (MatLab, ASAP, …) Streaming data (StreamBase, Coral8, …) One Size Fits All? – Part 2: Benchmarking Results, in CIDR 2007

13 At This Point, RDBMS is aged
Leaving RDBMS with only the OLTP market (i.e., business data processing) But they are no good at that either!!!!!!

14 New OLTP Proposal First part Second part Main memory
No multi-threading Grid orientation Transient undo and no redo No knobs Second part JDBC/OCDB overhead Concurrency control Undo 2 phase commit

15 New OLTP Proposal First part Second part Main memory
No multi-threading Grid orientation Transient undo and no redo No knobs Second part JDBC/OCDB overhead Concurrency control Undo 2 phase commit

16 Main Memory Deployment
1970’s: disk Now: main memory TPC-C is 100 Mbytes per warehouse; 1000 warehouses is a HUGE operation; i.e. 100 Gbytes; i.e. main memory

17 Main Memory Deployment
1970’s: terminal operator Now: unknown client over the web Cannot allow user stalls inside a transaction!!!!! Hence, there are no user stalls or disk stalls!!!!!

18 No Multi-threading Heaviest TPC-C Xact reads/writes 400 records
Less than 1 msec!! Run all commands to completion; single threaded Dramatically simplifies DBMS No concurrent B-tree, no multi-threaded data structure No pool of file handles, buffers, threads, … => no resource governor!

19 Grid Computing Obviously cheaper
Obvious wave of the foreseeable future (replacing shared disk) Horizontally partition data Shared nothing query optimizer and executor Add/delete sites on the fly required High end OLTP has to “scale out” not “scale up”

20 High Availability 1970’s: disaster recovery was “tape shipping”
Now: 7 x 24 x 365 no matter what Some organizations today run a hot standby: a second machine sits idle waiting to take over if first one fails => only half of the resource used

21 Peer-to-Peer HA Multiple machines in a peer-to-peer configuration
OLTP load dispersed across multiple machines Inter-machine replication used for fault tolerance Redundancy (at the table level) in the grid Optimizer chooses which instance of a table to read, writes all instances (transactionally)

22 Logging Undo log for roll-back if a transaction fails, but can be deleted on transaction commit No redo log, simply recover data from an operational site

23 Recovery in a K-safe Environment
Restore dead site Query up sites for live data When up to speed, join the grid Stop if you lose K+1 sites No redo log!!!! Vertica has shown this to be perfectly workable

24 No Knobs RDBMSs have a vast array of complex tuning knobs
New design should be self-everything self-healing self-maintaining self-tuning self-…

25 Main Sources of Overhead in RDBMS
Disk I/O (gone) Resource control (gone) Synchronization (gone) Latching with multi-threaded data structure (gone) Redo log (gone) Undo log (but in main memory and discard on commit) JDBC/ODBC interface Dynamic locking for concurrency control 2 phase commit (for multi-site updates and copies)

26 New OLTP Proposal First part Second part Main memory
No multi-threading Grid orientation Transient undo and no redo No knobs Second part JDBC/OCDB overhead Concurrency control Undo 2 phase commit

27 H-Store System Architecture
A grid of computers Rows of tables are placed contiguously in main memory, with B-tree indexing Each H-Store site is single-threaded Multi-cores => multiple logical sites (one site for each core) per physical site Main memory on physical site partitioned among logical sites

28 Stored Procedures OLTP has changed
1970’s: conversational transactions Now: can ask for all of them in advance Applications use stored procedures => JDBC/ODBC overhead (gone)

29 Transaction Classes Classify transactions in OTLP applications into classes and use their properties Get all transaction classes in advance Instances differ by run-time parameters Construct a physical data base design (manually now; automatically in the future) Table partitioning Table-level replication Create a query plan for each class

30 Transaction Classes Example
Class : “Insert record in History where customer = $(customer-Id) ; more SQL statements ;” Runtime instance supplies $(customer-Id), etc. Each transaction class has certain properties Optimize concurrency control protocols And commit protocols

31 Transaction Classes Transaction classes
Constrained tree applications Single-site transactions One-shots transactions Two-phase transactions Sterile transactions Prevalent in major commercial online retail applications H-Store makes use of their properties

32 Constrained Tree Application
Every transaction has equality predicates on the primary key of the root node Customer Order Order Order Order Line Order Line Order Line Order Line Order Line Order Line Partition 1 Partition 2

33 Single-sited transactions
All queries hit same partition Every transaction run to completion at a single site Constrained Tree Application Root table can be horizontally hash-partitioned Collocate corresponding shards of child tables No communication between partitions

34 Single-sited transactions
CTAs are common in OLTP Making non-CTAs single-sited remove read-only tables in the schema from application and check if it now becomes CTA if yes, replicate these tables at all sites One-shot transactions

35 One-shot transactions
execute in parallel without requiring intermediate results to be communicated among sites no inter-query dependencies One-shot transaction can be decomposed into single-sited plans Transactions in many applications can often be made one-shot by vertical partitioning of tables among sites replicating read-only columns

36 Two-phase classes A transaction class is two-phase if
Phase 1 : read-only operations (transaction may be aborted based on the query results) Phase 2 : queries and updates can’t violate integrity A transaction class is strongly two-phase if Two phase and Phase 1 operations on all sites produce the same result wrt aborting or continuing

37 Sterile classes Two concurrent transactions commute when any interleaving of their single-site sub-plans produces the same final database state as any other interleaving (if both commit) A transaction class is sterile if Its transactions commute with transactions of all other classes (including itself)

38 Query Execution Cost-based query optimizer producing query plans based on transaction classes at transaction definition time Single-sited: dispatch to the appropriate site One-shot: decompose into a set of plans, each executed at a single site Standard run-time optimizer for general transactions as sites may communicate

39 Transaction Management
Every transaction receives a timestamp (site_id, local_unique_timestamp) Given an ordering of sites, timestamps are unique and form a total order

40 Transaction Management
Single-sited or one-shot each transaction dispatched to replica sites and executed to completion unless sterile, each execution site waits a small period of time (account for network delays) for transactions to arrive => execution in timestamp order => all replicas updated in the same order => identical outcome at each replica, all commit or all abort => no data inconsistency! no redo log, no concurrency control, no distributed commit processing!

41 Transaction Management
Two-phase no integrity violation no undo log if also single-sited/one-shot, no transaction facilities at all!

42 Transaction Management
Sterile no concurrency control needed no need of execution order of transactions no guarantee on all sites abort/continue workers respond “abort” or “continue” execution supervisor communicates the info to worker sites standard distributed commit processing needed unless transaction is strongly two-phase

43 Transaction Management
Non-sterile, non single-sited, non one-shot do not use dynamic locking (expensive for short-lived transactions) for transaction consistency, instead: first run with basic strategy if too many aborts, run intermediate strategy if still too many aborts, escalate to advanced strategy

44 Transaction Management
Basic Strategy timestamp ordering of subplan pieces wait for “small period of time” to preserve timestamp order execute the subplan if no abort, continue with next subplan if no more subplan, commit

45 Transaction Management
Intermediate Strategy increase wait time to sequence the subplans, thereby lowering abort probability Advanced Strategy track read set and write set of each transaction at each site worker site runs each subplan, and aborts (if necessary) by standard optimistic concurrency control rules

46 Results H-Store TPC-C benchmark Targets OLTP workload
Shared-nothing main memory database TPC-C benchmark All classes made two-phase → No coordination Replication + Vertical partitioning → One-shot All classes still sterile in this schema → No waits

47 Results RDBMS: a very popular commercial RDBMS
Best TPC-C: best record on TPC website

48 TPC-C Performance on a Low-end Machine
A very popular commercial RDBMS (or the elephants) 850 transactions/sec H-Store 70,416 transactions/sec Factor of 82!!!!!

49 Implications for the Elephants
They are selling “one size fits all” Which is 30 year old legacy technology that is good at nothing The elephants: a collection of OLTP systems, connected to ETL, and connected to one or more data warehouses ETL: extract-transform-load, used to convert OLTP data to a common format and load it into a data warehouse (for BI queries)

50 The DBMS Landscape – Performance Needs
in-between Other apps complex operations read-focus simple operations write-focus Data Warehouse OLTP

51 One Size Does Not Fit All
in-between Other apps Big Table, etc Elephants get only “the crevices” Open source Vertica/ C-Store H-Store Data Warehouse OLTP complex operations read-focus simple operations write-focus

52 Summary “One size fits all” databases excel at nothing H-Store
Clean design for OLTP domain from scratch


Download ppt "CSCI5570 Large Scale Data Processing Systems"

Similar presentations


Ads by Google