The NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB.

Slides:



Advertisements
Similar presentations
Introduction to VoltDB
Advertisements

Real-Time Big Data Use Cases John Leach CTO, Splice Machine.
The open source database you’ll never outgrow Big Data. Fast Data. June 2011 Ryan Betts, VoltDB Engineering
VoltDB: an SQL Developer’s Perspective Tim Callaghan, VoltDB Field Engineer
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1.
A Fast Growing Market. Interesting New Players Lyzasoft.
The NewSQL database you’ll never outgrow Not Your Father’s Transaction Processing Michael Stonebraker, CTO VoltDB, Inc.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
“Turn you Smart phone into Business phone “
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Chapter 3 Database Management
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
Chapter 14 The Second Component: The Database.
Microsoft SQL Server x 46% 900+ For Hosting Service Providers
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
HOL9396: Oracle Event Processing 12c
Fraud Detection in Banking using Big Data By Madhu Malapaka For ISACA, Hyderabad Chapter Date: 14 th Dec 2014 Wilshire Software.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
NoSQL Database.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Copyright © 2014 Pearson Education, Inc. 1 It's what you learn after you know it all that counts. John Wooden Key Terms and Review (Chapter 6) Enhancing.
Real-time Stream Processing Architecture for Comcast IP Video
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Opening Keynote Presentation An Architecture for Intelligent Trading  Alessandro Petroni – Senior Principal Architect, Financial Services, TIBCO Software.
Big Data Tools Overview Avi Freedman ServerCentral Technology Executives Club November 13, 2013.
1 NETE4631 Using Google Web Services and Using Microsoft Cloud Services Lecture Notes #7.
CHAPTER 5 Data and Knowledge Management. CHAPTER OUTLINE 5.1 Managing Data 5.2 Big Data 5.3 The Database Approach 5.4 Database Management Systems 5.5.
Introduction to Hadoop and HDFS
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
IMDGs An essential part of your architecture. About me
Right In Time Presented By: Maria Baron Written By: Rajesh Gadodia
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Distributed database system
BTM 382 Database Management Chapter 2: Data models Chapter : CAP and Hadoop Chitu Okoli Associate Professor in Business Technology Management John.
1 Melanie Alexander. Agenda Define Big Data Trends Business Value Challenges What to consider Supplier Negotiation Contract Negotiation Summary 2.
Clusterpoint Margarita Sudņika ms RDBMS & NoSQL Databases & tables → Document stores Columns, rows → Schemaless documents Scales UP → Scales UP.
NOSQL DATABASE Not Only SQL DATABASE
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Chapter 1 Database Access from Client Applications.
Microsoft Cloud Solution.  What is the cloud?  Windows Azure  What services does it offer?  How does it all work?  How to go about using it  Further.
Features Scalability Manage Services Deliver Features Faster Create Business Value Availability Latency Lifecycle Data Integrity Portability.
MAR Capability Overview Deck Protean Analytics.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
Seminar: Deep Dive into Oracle NoSQL Technologies and Solutions Presenter: Zohar Elkayam, CTO, Brillix.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Managing Data Resources File Organization and databases for business information systems.
Microsoft Ignite /28/2017 6:07 PM
Connected Infrastructure
CSCI5570 Large Scale Data Processing Systems
Introduction to VoltDB
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Discovering Computers 2010: Living in a Digital World Chapter 14
Client/Server Databases and the Oracle 10g Relational Database
Connected Infrastructure
Agenda VoltDB Technical Overview Comparing VoltDB to Traditional OLTP
Taming the Big Data Fire Hose
Customer 360.
Presentation transcript:

the NewSQL database you’ll never outgrow Taming the Big Data Fire Hose John Hugg Sr. Software Engineer, VoltDB

VoltDB2 Big Data Defined  Velocity + Moves at very high rates (think sensor-driven systems) + Valuable in its temporal, high velocity state  Volume + Fast-moving data creates massive historical archives + Valuable for mining patterns, trends and relationships  Variety + Structured (logs, business transactions) + Semi-structured and unstructured

VoltDB3 Lower-frequency operations High-frequency operations Data Source Example Big Data Use Cases Capital markets Write/index all trades, store tick data Show consolidated risk across traders Call initiation requestReal-time authorizationFraud detection/analysis Inbound HTTP requests Visitor logging, analysis, alerting Traffic pattern analytics Online game Rank scores: Defined intervals Player “bests” Leaderboard lookups Real-time ad trading systems Match form factor, placement criteria, bid/ask Report ad performance from exhaust stream Mobile device location sensor Location updates, QoS, transactions Analytics on transactions

VoltDB4 Big Data and You  Incoming data streams are different than traditional business apps +You need to write data quickly and reliably, but …  It’s not just about high speed writes +You need to validate in real-time +You need to count and aggregate +You need to analyze in real-time +You need to scale on demand +You may need to transact Big Data and You

VoltDB5 Big Data Management Infrastructure Online gaming Ad serving Sensor data Internet commerce SaaS, Web 2.0 Mobile platforms Financial trade  Structured data  ACID guarantees  Relational/SQL  Real-time analytics NewSQL  Unstructured data  Eventual consistency  Schemaless  KV, document NoSQL Other OLAP data stores Analytic Datastore High VelocityHigh Volume

VoltDB6 Big Data Management Infrastructure Online gaming Ad serving Sensor data Internet commerce SaaS, Web 2.0 Mobile platforms Financial trade NewSQL NoSQL Other OLAP data stores Analytic Datastore High VelocityHigh Volume

High Velocity Data Management

VoltDB8 High Velocity DBMS Requirements  Ingest at very high speeds and rates  Scale easily to meet growth and demand peaks  Support integrated fault tolerance  Support a wide range of real-time (or “near-time”) analytics  Integrate easily with high volume analytic datastores

VoltDB9 High Speed Data Ingestion  Support millions of write operations per second at scale  Read and write latencies below 50 milliseconds  Provide ACID-level consistency guarantees (maybe)  Support one or more well-known application interfaces + SQL + Key/Value + Document

VoltDB10 Scale to Meet Growth and Demand  Scale-out on commodity hardware  Built-in database partitioning + Manual sharding and/or add-on solutions are brittle, require apps to do “heavy lifting”, and can be an operational nightmare  Database must automatically implement defined partitioning strategy + Application should “see” a single database instance  Database should encourage scalability best practices + For example, replication of reference data minimizes need for multi-partition operations

VoltDB11 A Look Inside Partitioning knife 2spoon 3fork Partition knife 2spoon 3fork Partition knife 2spoon 3fork Partition 3 table orders : customer_id (partition key) (partitioned)order_id product_id table products : product_id (replicated)product_name select count(*) from orders where customer_id = 5 single-partition select count(*) from orders where product_id = 3 multi-partition insert into orders (customer_id, order_id, product_id) values (3,303,2) single-partition update products set product_name = ‘spork’ where product_id = 3 multi-partition

VoltDB12 Integrated Fault Tolerance  Database should transparently support built-in “Tandem-style” HA + Users should be able to easily increase/decrease fault tolerance levels  Database should be easily and quickly recoverable in the event of severe hardware failures  Database should be able to automatically detect and manage a variety of partition fault conditions  Downed nodes should be “rejoinable” without the need for service windows

VoltDB13 Partition Detection & Recovery Server A Server B Server C Network fault protection  Detects partition event  Determines which side of fault to disable  Snapshots and disables orphaned node(s) Server A Server B Server C Live node rejoin  Allows “downed” nodes to rejoin live cluster  Automatically re-synchs all node data  Coordinates transactions during re-synch

VoltDB14 Real-time Analytics  Database should support a wide variety of high performance reads + High-frequency single-partition + Lower-frequency multi-partition  Common analytic queries should be optimized in the database + Multi-partition aggregations, limits, etc.  Database should accommodate a flexible range of relational data operations + Particularly relevant to structured data

VoltDB15 Integration with Analytic Datastores  Database should offer high performance, transactional export  Export should allow a wide variety of common data enrichment operations + Normalize and de-normalize + De-duplicate + Aggregate  Architecture should support loosely-coupled integrations + Impedance mismatches + Durability

VoltDB16 VoltDB Export Data Flow  Loosely-coupled, asynchronous  Queue must be durable  Bi-directional durability High Velocity Database Cluster

VoltDB17 Summary  Big Data infrastructures will usually require more than one engine + High velocity engine for “fast” data + Analytic engine for “deep” data  Data characteristics will often determine which high velocity engine to use + NewSQL is often well-suited to structured data + NoSQL is often a good fit for unstructured data  Choose solutions that suit your needs and are designed for interoperability