Operational & Analytical Database

Slides:

Advertisements

Similar presentations

Chen Zhang Hans De Sterck University of Waterloo

Advertisements

Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.

Database Architectures and the Web

Omid Efficient Transaction Management and Incremental Processing for HBase Copyright © 2013 Yahoo! All rights reserved. No reproduction or distribution.

Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.

Real-Time Big Data Use Cases John Leach CTO, Splice Machine.

Transaction Processing on Top of Hadoop Spring 2012 Aviram Rehana Lior Zeno Supervisor : Edward Bortnikov.

Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.

NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.

NoSQL Database.

CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.

Daniel Abadi Yale University. * The Big Data phenomenon is the best thing that could have happened to the database community * Despite other definitions.

SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.

How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.

Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.

Databases with Scalable capabilities Presented by Mike Trischetta.

HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.

Modern Databases NoSQL and NewSQL Willem Visser RW334.

L/O/G/O 云端的小飞象系列报告之二 Cloud 组. L/O/G/O Hadoop in SIGMOD

1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Overview – Chapter 11 SQL 710 Overview of Replication

SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School of Information IS 257: Database Management.

© 2007 IBM Corporation IBM Information Management Accelerate information on demand with dynamic warehousing April 2007.

Homework 4 Code for word count com/content/repositories/releases/com.cloud era.hadoop/hadoop-examples/

1 Multiversion Reconciliation for Mobile Databases Shirish Hemanath Phatak & B.R.Badrinath Presented By Presented By Md. Abdur Rahman Md. Abdur Rahman.

Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.

NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...

Nov 2006 Google released the paper on BigTable.

NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.

Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.

1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.

An Overview of Data Warehousing and OLAP Technology

BIG DATA/ Hadoop Interview Questions.

Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,

Ignite in Sberbank: In-Memory Data Fabric for Financial Services

1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.

CSCI5570 Large Scale Data Processing Systems

CS 405G: Introduction to Database Systems

and Big Data Storage Systems

Data Platform and Analytics Foundational Training

Database Architectures and the Web

Database Performance Measurement

Every Good Graph Starts With

Modern Databases NoSQL and NewSQL

Concurrency control in transactional systems

Introduction to NewSQL

Chapter 19: Distributed Databases

NOSQL databases and Big Data Storage Systems

CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

NoSQL Systems Overview (as of November 2011).

Massively Parallel Cloud Data Storage Systems

NoSQL Databases An Overview

Distributed File Systems

Azure's Performance, Scalability, SQL Servers Automate Real Time Data Transfer at Low Cost MINI-CASE STUDY “Azure offers high performance, scalable, and.

Concurrency Control II (OCC, MVCC)

Ch 4. The Evolution of Analytic Scalability

Power-up NoSQL with Azure Cosmos DB

Overview of big data tools

Data Warehouse.

Transaction management

Data Warehousing Concepts

Moving your on-prem data warehouse to cloud. What are your options?

Concurrency control (OCC and MVCC)

build a real time operational data lake in minutes.

The Database World of Azure

Presentation transcript:

Operational & Analytical Database Ultra-Scalable Full SQL Full ACID Operational & Analytical Database Ricardo Jimenez-Peris LeanXcale CEO & Founder

LeanXcale New database vendor Result of leading edge research in: Scalable transactional management Scalable data management Storage management Elasticity High availability Currently working with several big companies in the following verticals: Banking Telecommunications Retail Travel technology

Ultra-Scalable Transactions Solved how to scale transactions to large scale (i.e. 100 million update transactions per second) in a fully seamless way Breakthrough result of 15+ years of research by a tenacious team What is unique about our offering? We have solved a core problem in cloud data management: how to scale full ACID transactions. We have achieved this scalability in a totally seamless and transparent way to applications. We can scale to a million update transactions per second. Result of a our whole scientific careers devoted to this topic and yes, being lucky.

Problem: Lack of Scalable SQL Databases Mainframe  expensive licensing/HW Alternatives: Sharding  expensive development --------------------------------------------------------- Solution: Ultra-Scalable SQL New generation database: Ultra-Scalable  to 100s of nodes Full SQL  simplicity Full ACID  transactional consistency No Sharding  fully transparent to the applications Can replace mainframes

Scalability Evaluation without data manager/logging to see 2.35 Million transactions per second Evaluation without data manager/logging to see how much throughput can attain the transactional processing

Operational DB Data Warehouse Copy Process (ETL) Costs of ETLs represent 75% of business analytics Analytical queries on obsolete data

Blending OLTP & OLAP Making Decisions at the Right Time Analytical Queries on Operational Data OLTP OLAP Operational Database Data Warehouse OLTP + OLAP Cutting costs of business analytics by 75% Real-time Analytical Queries No more ETLs

Problem: Polyglot World Lack of queries and transactions across data stores Lack of consistency guarantees within NoSQL data stores --------------------------------------------------------- Solution: Transactional NoSQL & Global Transactions Queries across data stores  SQL, Neo4J, MongoDB, HBase Full ACID HBase Full ACID Neo4J (prototype with MVCC) Full ACID MongoDB (prototype)

Problem: Cost of Hadoop Programmatic queries (MR) or subsets of SQL (Hive, Impala) Queries do not observe operational data ETLs required every time --------------------------------------------------------- Solution: Operational Data Lake Supporting queries across Hadoop data lake and customer operational data

LeanXcale’s KiVi Storage Engine KiVi is a new storage engine from LeanXcale that is: Multi-Workload. Vectorial. Ultra-efficient. Columnar. Fully elastic Dual SQL and KV interface over relational data. Online aggregation. Inexpensive replication. Efficient distributed indexing. Efficient multi-versioning. Another pain in current enterprise is that despite they use scalable technologies (such as map-reduce) the footprint is very big. An example is Google spanner that scales but it requires more than 1 core per tps. LeanXcale is able to do 10-20 times more efficiently!!

Architecture OLTP & OLAP Query Engine Ultra-Scalable Transactions SQL Engine Ultra-Scalable Transactions Transaction Mng KiVi Key-Value Data Store Storage

An Ultra-Scalable SQL Database for Any Size and Any Workload What is LeanXcale? Real-Time Big Data Full SQL Full ACID DB OLAP over Operational Data Ultra-Scalable OLTP Non-disruptive data migration, continuous load balancing and Elastic & Ultra-Efficient Queries across SQL, HBase, MongoDB, Neo4J & Hadoop files Integration with Data Streaming Polyglot LeanXcale is the medicine for most common pains enterprises face today to manage DBs. It has four active components: OLTP, OLAP, polyglot integration, and Elasticity and Ultra-Efficiency. All with ultra-scalability vitamins. An Ultra-Scalable SQL Database for Any Size and Any Workload

What is the Magic?

Transactional Processing The transactional management provides ultra-scalability + Fully transparent: No sharding. No required a priori knowledge about rows to be accessed. Syntactically: no changes required in the application. Semantically: equivalent behavior to a centralized system. + Provides Snapshot Isolation (the isolation level provided by Oracle when set to “Serializable” isolation).

Ultra-Scalable Transactions LeanXcale Process & commits transactions in parallel Traditional systems have a single-node bottleneck Provides a consistent view vs Time Time Traditional and current solutions at some point they do some part of the transactional processing one per one txn basis, resulting in a single node bottleneck. LeanXcale processes and commits txns fully in parallel. We regulate the visibility of updated data to provide a consistent view. We are like the Iguazu falls, 3.5 km of falls. Traditional transactional DB

Snapshot Isolation vs. Serializability Serializability provides a fully atomic view of a transaction, reads and writes happen atomically at a single point in time Reads & Writes Snapshot isolation splits atomicity in two points one at the beginning of the transaction where all reads happen and one at the end of the transaction where all writes happen Reads Writes Start End

Single-node bottleneck Traditional Approach Centralized Transaction Manager Atomicity Isolation Central TM Consistency Durability Single-node bottleneck

Single-node bottleneck Traditional Approach Centralized Transaction Manager Isolation Writes Atomicity Central TM Isolation Reads Durability Single-node bottleneck

Scaling ACID Properties Atomicity Atomicity Isolation Writes Atomicity Isolation Reads Durability

Scaling ACID Properties Local TMs Conflict Managers Isolation Writes Atomicity Isolation Reads Durability Snapshot Server Commit Sequencer Loggers

Main Principles Separation of commit from the visibility of committed data Proactive pre-assignment of commit timestamps to committing transactions Detection and resolution of conflicts before commit Transactions can commit in parallel due to: They do not conflict They have their commit timestamp already assigned that will determine its serialization order Visibility is regulated separately to guarantee the reading of fully consistent states

Transactional Life Cycle: Start Get start TS Current consistent snapshot Snapshot Server The local txn mng gets the “start TS” from the snapshot server. Local Txn Manager

Transactional Life Cycle: Execution The transaction will read the state as of “start TS”. Write-write conflicts are detected by conflict managers on the fly. Get start TS Run on start TS snapshot Conflict Manager Local Txn Manager

Transactional Life Cycle: Commit Get start TS Run on start TS snapshot Commit The local transaction manager orchestrates the commit. Local Txn Manager

Transactional Life Cycle: Commit Local Txn Manager Get Commit TS Log Public Updates Report Snaps Serv Commit TS writeset writeset Commit TS Snapshot Server Data Store Logger Commit Sequencer

Transactional Life Cycle: Commit Sequence of timestamps received by the Snapshot Server TIMESTAMP 15 TIMESTAMP 12 TIMESTAMP 14 TIMESTAMP 13 TIMESTAMP 11 11 15 12 14 13 Time Evolution of the current snapshot at the Snapshot Server TIMESTAMP 11 TIMESTAMP 11 TIMESTAMP 12 TIMESTAMP 12 TIMESTAMP 15 11 11 12 12 15

Conclusions Transactional management not a bottleneck anymore. We can scale to many million of transactions per second. Combining multiple capabilities in a single database system, such as OLTP and OLAP, is what we believe it is the future of database management. We are working in this direction.

Ricardo Jimenez-Peris LeanXcale CEO & Co-Founder rjimenez@leanxcale.com www.LeanXcale.com @LeanXcale