Introduction to NewSQL Xintao Wu Oct 20, 2015
451 Group’s Definition A DBMS that delivers the scalability and flexibility promised by NoSQL while retaining the support for SQL queries and/or ACID, or to improve performance for appropriate workloads
Stonebraker’s Definition SQL as the primary interface ACID support for transactions Non-locking concurrency control High per-node performance Parallel, shared-nothing architecture
NoSQL vs. NewSQL NoSQL NewSQL New breed of non-relational database products Rejections of fixed table schema and join operations Designed to meet scalability requirements of distributed architectures And/or schema-less data management requirements NewSQL New breed of relational database products Retain SQL and ACID Designed to meet scalability requirements of distributed architectures
CAP Theorem A distributed system can satisfy two but not three out of: Consistency – all nodes see the same data at the same time Availability – every request receives a response whether it succeeded or failed Partition tolerance – operates despite of message loss or failure of part of the system
Scale To achieve high performance and consistency we should: Scale in – execute all transactions in RAM (performance) on the same computer (consistency) Scale up – get a powerful multi-core server with a lot of RAM (performance)
Transaction Bottlenecks Disk Reads/Writes Persistent Data, Undo/Redo Logs Network Communication Intra-Node, Client-Server Concurrency Control Locking, Latching A OLTP transaction is often fast, repetitive and small.
An Ideal OLTP System Main memory only No multi-processor overhead High scalability High availability Autonomic configuration
NewSQL Needs (from Stonebraker) Needs something other than traditional record level locking Timestamp order, MVCC Needs a solution to buffer pool overhead Main memory, other ways to reduce buffer pool cost Needs a solution to latching for shared data structures Innovative use of B-trees, Single-threading Needs a solution to write-ahead logging Built-in replication and failover
Multiversion concurrency control Scenario A is reading at the same time B is writing A may see a half-written or inconsistent piece of data Lock/timestamp could be slow MVCC Each user sees a snapshot at a particular time. Any changes made by a writer will not be seen by others until the transaction has been committed. When database updates an item, it marks the old data as obsolete and adds the newer version elsewhere. Hence multiple versions are stored, but only one is the latest.
/